Page 108 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 42
P. 108
WU Wenna & BAO Xiulin / The architecture and data model of the National Thesauri Warehouse 107
specific subject; 2) tailoring knowledge systems of thesauri; 3) merging multiple thesauri. As for
data, the NTW classification is used to determine term extracting range, then, terms in thesauri
are extracted from basic lexicon, and terms from other sources are added according to user
demands. Otherwise, user supplement and document extracting are complementary to determine
terms for a specific subject. The properties of terms from different sources in basic lexicon
could be used to evaluate the terms. For example, the source information of terms can be used to
estimate term authorities. Frequencies and discipline distribution of terms in documents can be
used to evaluate the importance and relevance of terms to a special subject. Substantially, Special
thesaurus customization is a process of tailoring and merging multiple thesauri. The NTW concept
layer supplies relations for new thesaurus construction. Knowledge fragments from the same or
different thesauri can be connected and jointed according to specific rules. Because these thesaurus
knowledge systems may be repetitive, intersecting and isomeric with one another, semantic
obscurity and relation entanglement must emerge after merging. Therefore, some rules and tools
are required to handle these problems.
3 Macrostructure and description of thesaurus knowledge systems
3.1 Thesaurus macrostructure
Thesaurus macrostructure includes the inner organization structure of concept schemes in a
thesaurus, and outer relationships between its different editions. Thesauri usually contain main
tables, auxiliary tables, classifications and indexes. Main tables are the major parts of thesauri,
containing concepts labeled with terms and all concept properties. Auxiliary tables usually include
people’s names, geographical names, equipment types, product names, and so on. Classifications
provide classifying schemes for concepts and terms in thesauri. Categories are elementary
constructs of classifications. Category codes assigned to concepts in main table reference
categories in classifications. Indexes are auxiliary tables necessary in years of artificial retrieval,
which can be used to search and locate thesaurus concepts easily in different ways. They can be
seen as navigation schemes in information systems nowadays. Index tables are generally unable
to provide additional information not contained in main tables. Therefore, in most cases, data
completeness is unlikely to be affected when indexes are deleted.
Concepts from different parts of a thesaurus have different attributes. For example, concepts
in main tables have different attributes and description demands when compared with categories
in classifications. Therefore, it is an easy and acceptable way to divide a thesaurus into different
concept schemes, according to original aggregation features of concepts, which is described via
macrostructure of a thesaurus. The detail is that thesaurus and its constituent components are
considered as respective concept schemes. The macrostructure of a thesaurus can be revealed by
describing relations between the concept schemes, as shown in Figure 2. Thesauri are regarded