Page 108 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 42
P. 108

WU Wenna & BAO Xiulin / The architecture and data model of the National Thesauri Warehouse  107


               specific subject; 2) tailoring knowledge systems of thesauri; 3) merging multiple thesauri. As for
               data, the NTW classification is used to determine term extracting range, then, terms in thesauri
               are extracted from basic lexicon, and terms from other sources are added according to user
               demands. Otherwise, user supplement and document extracting are complementary to determine
               terms for a specific subject. The properties of terms from different sources in basic lexicon
               could be used to evaluate the terms. For example, the source information of terms can be used to
               estimate term authorities. Frequencies and discipline distribution of terms in documents can be
               used to evaluate the importance and relevance of terms to a special subject. Substantially, Special
               thesaurus customization is a process of tailoring and merging multiple thesauri. The NTW concept
               layer supplies relations for new thesaurus construction. Knowledge fragments from the same or
               different thesauri can be connected and jointed according to specific rules. Because these thesaurus
               knowledge systems may be repetitive, intersecting and isomeric with one another, semantic
               obscurity and relation entanglement must emerge after merging. Therefore, some rules and tools
               are required to handle these problems.

               3  Macrostructure and description of thesaurus knowledge systems


               3.1  Thesaurus macrostructure


               Thesaurus macrostructure includes the inner organization structure of concept schemes in a
               thesaurus, and outer relationships between its different editions. Thesauri usually contain main
               tables, auxiliary tables, classifications and indexes. Main tables are the major parts of thesauri,
               containing concepts labeled with terms and all concept properties. Auxiliary tables usually include
               people’s names, geographical names, equipment types, product names, and so on. Classifications
               provide classifying schemes for concepts and terms in thesauri. Categories are elementary
               constructs of classifications. Category codes assigned to concepts in main table reference
               categories in classifications. Indexes are auxiliary tables necessary in years of artificial retrieval,
               which can be used to search and locate thesaurus concepts easily in different ways. They can be
               seen as navigation schemes in information systems nowadays. Index tables are generally unable
               to provide additional information not contained in main tables. Therefore, in most cases, data
               completeness is unlikely to be affected when indexes are deleted.
                 Concepts from different parts of a thesaurus have different attributes. For example, concepts
               in main tables have different attributes and description demands when compared with categories
               in classifications. Therefore, it is an easy and acceptable way to divide a thesaurus into different
               concept schemes, according to original aggregation features of concepts, which is described via
               macrostructure of a thesaurus. The detail is that thesaurus and its constituent components are
               considered as respective concept schemes. The macrostructure of a thesaurus can be revealed by
               describing relations between the concept schemes, as shown in Figure 2. Thesauri are regarded
   103   104   105   106   107   108   109   110   111   112   113