Welcom to JLIS

夏翠娟,林海青,刘炜.面向循证实践的中文古籍数据模型研究与设计[J].中国图书馆学报,2017,43(6):16~34

Designing A Data Model of Chinese Ancient Books for Evidence-based Practice

面向循证实践的中文古籍数据模型研究与设计

Received:September 26, 2017

DOI：

Key words:Digital humanities Evidence-based practice on ancient books Data modeling

基金项目:

Author Name	Affiliation	E-mail
XIA Cuijuan	上海图书馆系统网络中心高级工程师，上海 200031	cjxia@libnet.sh.cn
LIN Haiqing	美国加州大学柏克莱分校东亚图书馆技术部主任及中文编目图书馆员,美国加州 94720
LIU Wei	上海图书馆副馆长，研究员，上海 200031

Hits: 3645

Download times: 1150

Abstract:

Ancient book catalogs and ancient literature are important sources and evidence material for many Humanities and Social Science research. Traditional research related to ancient books usually relies on experts expertise or subjective judgment. The emerging Digital Humanities can help scholars to gather relevant information as completely as possible. It can help to raise research questions from bigger spatio-temporal scenes and conduct intensive research across a variety of subjects with unprecedented perspectives. This requires developing a digital humanity platform with relatively complete data and more applicable advanced technologies. A data model that can integrate different formats of different kinds of ancient book catalog data is the basis of this platform.

In this paper, we are proposing a data model of Chinese ancient books using cutting-edge ontology and linked data technology to support researchers to accomplish a so called “evidence-based practice”. The data model is based on the knowledge of classical bibliography combining with philology, bibliology, and so on. This research also intends to explore the new methods of the use of the ancient catalog and documentation to support researches in various disciplines, such as historical research, linguistics, sociology, literature, culture and arts. Web ontology and linked data are the latest achievements of the semantic technologies. They are the most suitable and applicable technologies for developing “authority control” and “evidence-based” applications. It has the advantages of flexibility and scalability that the traditional relational database does not have. It is very important especially in the distributed environment of massive semi-or non-structured data applications. The advantage of having such data model can directly deal with semantic data (machine understandable), but also support knowledge-based queries with reasoning function.

The data model takes into account of the design method and the aspects of the data model, including the bibliographic framework, creators and contributors, classifications, seals, taboo term and so on. The bibliographic framework consists of 3 + 2 model which stands for “Work- Instance-Item”+“Annotation”+“Classification” based on the needs of evidence-based research of Chinese ancient books, with the reference of the four-tier model of FRBR's “WEMI” and three-tier model of LOC's BIBFRAME2.0. It can adapt flexibly to any kinds of ancient book catalogs and metadata schema based on MARC or DCAP; it also can integrate the full texts of ancient literature. It has an appropriate ability to represent the classification and its multiple comments of different time periods in the records of ancient books. For the description of creators and contributors, the BIBFRAME “Contribution” model is used to clarify the relationship between the responsibility and the document, the relationship between the principle responsibility and the shared responsibility. The knowledge of ancient books is structured into fine-grained semantic units in order to facilitate the machine processing.

Using this model and the vocabularies to integrate data from 14 titles of typical ancient book catalogs, including historical catalogs, official catalogs, private catalogs, large modern joint catalogs and Shanghai Library's ancient book database, the platform realizes key functions for evidence-based research of ancient books. The functions include the search of the versions and classification of ancient books, the clustering and comparing of different versions or copies of an ancient book, the relations of authors and contributors, and the statistical analysis of ancient books with a given time period, area and topic. By the construction of “Chinese Ancient Book Union Catalog Platform for Evidence-based Research”, the availability, flexibility and scalability of the data model has been verified. The paper also puts forward the problems that need to be further resolved, such as the identification of a “Work”, the establishment of the relationships between the “Instances”, the extraction of structured and fine-grained data from the content of ancient books, and so on. 6 figs. 5tabs.18 refs.

中文摘要:

在数字人文逐步成为数字图书馆建设新常态的大背景下,本文通过借鉴“循证实践”和“循证社会学”的思想,提出了“古籍循证”的概念。利用文献调研、需求分析、数据建模、实验验证等方法,调研古代目录、现代联合目录的编排体例和古籍元数据标准规范的结构框架,分析在互联网和机器智能时代,基于古籍循证的版本学、校勘学、分类学及历史人文学等特定领域的研究需求,设计一个可将不同来源、不同格式的古籍目录、元数据记录、古籍文献全文和各类古籍知识融合为一体的古籍数据模型。依托“中文古籍联合目录及循证平台”的建设,利用此模型和本体词表融合14种典型的古籍目录和古籍数据库中的数据,实现古籍的不同版本、分类和提要的聚类与比较、古籍著者和其他责任者及其相关关系的统计分析等初步的古籍循证功能,以验证该模型的可行性、开放性和可扩展性,并进一步提出需要解决的问题,探讨可能的解决方案。图6。表5。参考文献18。

View Full Text View/Add Comment Download reader