Page 104 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 42
P. 104

WU Wenna & BAO Xiulin / The architecture and data model of the National Thesauri Warehouse  103


               2.1  Data acquisition and transformation


               Most Chinese thesauri are influenced by CT and share common characteristics in their structures
               and description patterns. This facilitates uniform and normative description of thesauri, as well as
               data storage in a consistent format. However, due to that the thesauri belong to different scientific
               disciplines, designed and developed by different institutions, it is almost impossible to ensure that
               their macrostructure and microstructure are always consistent. Therefore, the NTW project must
               investigate and analyze the similarity and individuality of structures and description patterns of
               the Chinese thesauri to resolve problems of description inconsistency of thesaurus knowledge
               structure.
                 The major functions of the data acquisition and transformation layer include thesaurus data
               acquiring, normative data description and format transformation. This layer has three specific
               modules: thesaurus metadata registration, thesaurus data importing and verification, thesaurus
               uniform description and format transformation.
                 Thesaurus metadata registration is an operation of recording metadata of thesauri. Thesaurus
               metadata include thesaurus names, developers, publication dates, disciplines, copyright, etc.
               Thesaurus metadata registration assembles basic information of Chinese thesauri developed in
               different stages, and helps users to discover and locate useful thesaurus resources. Thesaurus data
               importing and verification are operations of importing terms and relations from the registered
               thesauri to the system, as well as verifying and controlling data quality. Thesauri in print need
               be digitized first. Digitized thesauri usually have different formats since they are obtained from
               different sources. Therefore, the system should have functions of supporting data-importing in
               different formats. Thesauri constructed in early years were confined by technical conditions at
               that days and usually have some logical problems, such as conflict, redundancy and circulation
               of relations (Wu & Wang, 2012). The verification module has functions of finding and resolving
               logical problems automatically to ensure valid logic, and reduce the loss of term information as
               low as possible. Verified thesauri data can be described according to a uniform metadata scheme
               and be stored in a uniform format.


               2.2  Storage and semantic integration


               2.2.1  Top classification and ontology
               A thesaurus usually has a category system or a classification to categorize its terms. Thesauri
               usually have different classifications. To an integration system, top classification provides
               a uniform navigation system for concepts from its source thesauri, which facilitates the
               implementation of thesauri semantic integration. NTW semantic integration system establishes two
               top classification schemes based on subjects and ontology: a top classification and a top ontology.
   99   100   101   102   103   104   105   106   107   108   109