吴雯娜,鲍秀林.国家叙词库的体系结构与数据模型[J].中国图书馆学报,2016,42(2):81~96
The Architecture and Data Model of National Thesauri Warehouse
国家叙词库的体系结构与数据模型
Received:December 18, 2014  Revised:January 06, 2016
DOI:10.13530/j.cnki.jlis.160012
Key words:The National Thesauri Warehouse  Thesaurus structure  Knowledge description models  Metadata  Semantic integration systems  Terminology services  SKOS+XL
中文关键词:  国家叙词库  叙词表结构  知识描述模型  元数据  语义集成系统  术语服务  SKOS+XL
基金项目:本文系国家社会科学基金项目“国家叙词库构建方式与发展机制研究”(编号:13BTQ013)的研究成果之一
Author NameAffiliationE-mail
WU Wenna 中国科学技术信息研究所 北京 100038 wuwenna@istic.ac.cn 
BAO Xiulin 中国科学技术信息研究所 北京 100038  
Hits: 3167
Download times: 1851
Abstract:
It is the purpose of the National Thesauri Warehouse (NTW) project to integrate the thesauri constructed and published in China and provide services. Several integration projects at home and abroad were investigated, analyzed and compared. According to the situation of the Chinese thesauri, NTW adopted such integration mode: Chinese Thesaurus was chosen as the mapping center, while other thesauri would be mapped to it.
    NTW has a three-layer architecture: data acquisition and conversion layer (DACL), storage and semantic integration layer (SSIL), service and application layer (SAL). DACL has three modules: thesaurus metadata registration, data import and verification, unified description and format conversion. SSIL has a classification and an ontology above, a concept library in the middle, and a base term library below. The top classification is used as categories of concepts according to their disciplines, subjects or topics. The top ontology is used as categories according to their essential attributes. The top classification and ontology gather concepts from different perspectives to facilitate multi-thesauri semantic integration and concepts navigation in one thesaurus or cross-thesauri. The concept library contains concepts and their semantic attributes from member thesauri. The base term library contains terms and their non-semantic attributes, not just the terms of member thesauri, also including other term sources, such as keywords, dictionary terms, etc. In services, NTW follows the principle of gradual improvement, by using its construction achievement, which have different granularity knowledge units and different semantic levels, to provide data services. The services include thesaurus metadata services, concept services, concept group services and thesaurus customization services. According to different users, these services show different forms: query, browse, edit, download services for human users, and third party invoking services for computer agents.
    NTW data model was built based on the analysis of thesaurus’ macrostructure and microstructure. Thesaurus’ macrostructure is the conjunctive structure of concept schemes. Concept schemes in a thesaurus regularly include main tables, auxiliary tables, category tables and indexes. The macrostructure is represented by describing the relationship between concept schemes in a thesaurus, and between the versions of the thesaurus. Thesaurus’ microstructure is a frame of knowledge description. In NTW, concepts and terms are separated. Terms are labels of concepts. Concepts and terms are both objects of description. Concepts are described by using semantic attributes, such as concept definitions, categories, relations between concepts, and so on. Terms are described by using non-semantic attributes including phonetic notations, non-semantic corresponding relations between terms, such as Chinese labels/English labels, full names/acronyms, wrong spellings/correct spellings, and so on. Concepts and terms are linked through the label attribute which attaches terms to their labeling concepts. Two special relationships in thesauri were discussed, including concept coordination relationship and guiding associative relationship. These two relationships both link concepts from informal concepts to formal concepts in the same thesaurus. To separate and describe the concepts, “core concepts” and “extended concepts” were put forward. Formal concepts are core concepts while informal concepts are extended concepts. Thesaurus metadata scheme was designed based on DC. Concept and term metadata schemes were designed based on SKOS+XL. 3 figs. 3 tabs. 19 refs.
中文摘要:
      国家叙词库是我国叙词表资源集成与服务系统,体系结构包括三层:数据获取与转换层、存储与语义集成层、服务与应用层。数据获取与转换层实现叙词表数据的采集、规范描述和格式转换;语义集成采用中心词表集成模式,顶层为分类表和本体,中间为概念层,底层为基础词库;服务方面,采用渐进服务思路,提供不同颗粒度和不同语义层次的数据服务。叙词表宏观结构方面,将叙词表及其各组成部分作为独立概念体系,通过描述概念体系间的关联,表达叙词表宏观结构;微观结构方面,将概念和词汇分别作为描述对象,词汇作为概念标签,在概念层面描述语义属性,在词汇层面描述非语义属性。基于DC元数据框架设计叙词表元数据方案,基于SKOS+XL设计概念和词汇描述的元数据方案。图3。表3。参考文献19。
View Full Text   View/Add Comment  Download reader