Page 99 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 42
P. 99
The architecture and data model of the National
Thesauri Warehouse ①a
①b *
WU Wenna & BAO Xiulin
Institute of Scientific and Technical Information of China, Beijing 100038, China
Abstract
National Thesauri Warehouse (NTW) is an integrated service system of Chinese thesauri. Its
architecture consists of three layers: data acquisition and conversion layer, storage and semantic
integration layer, service and application layer. The first layer concentrates on the acquisition and
specification of thesauri data as well as the conversion of thesauri data format. The second layer,
storage and semantic integration layer, adopts central thesaurus integration mode, which includes three
embedded layers: a layer of classification and ontology at the top, a conceptual layer in the middle,
and a basic lexicon layer at the bottom. For services, NTW adopts a progressive service rationale and
provides data services of different sizes and different semantic levels. Pertaining to the macrostructure,
thesauri and their components are independent and isolated conceptual schemes. The macrostructures
of thesauri are demonstrated by showing relationships between individual thesauri schemes. In terms
of the microstructure, concepts and terms are treated as individual describing objects. Terms are used
as labels of concepts. Semantic attributes are described on conceptual layer. Non-semantic attributes
are described on term layer. Finally, the metadata for thesauri is designed based on Dublin Core (DC),
whereas the metadata for concepts and terms are formulated based on SKOS+XL.
Keywords
National Thesauri Warehouse, Thesaurus structure, Knowledge description models, Metadata, Semantic
integration systems, Terminology services, SKOS+XL
0 Research background
Thesauri are a set of knowledge organization tools, which are mainly used in indexing and retrieval
of information subjects. Thesauri encompass functions of term control and concept association,
through which the description of knowledge presents formalized and structured characteristics.
Therefore, the knowledge systems of thesaurus are potentially machine-readable. Promoting the
application of thesauri can accelerate the organization, semantization and utilization of information
resources, and can improve the intelligence level of information processing.
Thus far, more than 100 thesauri have been constructed and published in China. Nonetheless,
only about 10 thesauri are properly maintained and currently in service. The majority of these
①a This article is an outcome of the project “Research on Construction Mode and Development Mechanism of the National Thesauri
Warehouse” (No.13BTQ013) supported by National Social Science Foundation of China.
* Correspondence should be addressed to WU Wenna, Email: wuwenna@isic.ac.cn, ORCID: 0000-0001-8017-3232