陈力.数字人文视域下的古籍数字化与古典知识库建设问题[J].中国图书馆学报,2022,48(2):36~46
Digitalization of Ancient Books and Construction of Classical Knowledge Repository from the Perspective of Digital Humanities
数字人文视域下的古籍数字化与古典知识库建设问题
Received:December 02, 2021  
DOI:
Key words:Digital humanities Ancient books collation and studies  Digitalization of ancient books  Classical knowledge repository
中文关键词:  数字人文  古籍整理研究  古籍数字化  古典知识库
基金项目:本文系国家社会科学基金重点项目“中国文献学史”(编号:19ATQ003)的研究成果之一
Author NameAffiliation
CHEN Li 四川大学历史文化学院 四川 成都 610064 
Hits: 924
Download times: 601
Abstract:
“Digital humanities” (DH) is a new paradigm of interdisciplinary research which introduces digital techniques and methods in traditional Humanities studies to study cultural phenomena in human society. At present,“automatic punctuating”,“automatic tagging”,“automatic translating” and “automatic collating” of ancient books labeled as DH in traditional cultural research are paid great attention to. Yet necessary basic conditions of the research are deficient,such as the large scale corpus and related knowledge tools. As a result,essential data of DH projects are almost self made,which is not only inefficient,but also unexchangeable and incompatible. It also contradicts ideas of big data and the interdisciplinary research that are advocated by DH. Accordingly,this paper makes some preliminary discussion of basic conditions of DH research in traditional cultural domain.
The most basic conditions of DH research are the digitalization of ancient books, which in essence is the digital conversion of physical books,and OCR of texts is the key process of digitalization. The OCR of texts is to recognize and convert images of Chinese characters using computer encodings. It involves two aspects: one is computer encodings of Chinese characters,and the other is computer image recognition and conversion to characters. In view of current techniques,we believe that apart from OCR of texts,to improve the recognition rate also needs techniques in corpus,vocabulary,knowledge repository,artificial intelligence and final manual recheck. Additionally,as to the standards and unification of characters in ancient books,a possible solution is to establish a comparison table of standardized form of Chinese characters and their variants for computers to do data analysis and character input and output.
The requirements of textual processing of digitalization of ancient books in a usual sense and of DH research are different. The former generally demands reproduction of the original content,while the latter requires processing of content,including data statistics,information and knowledge mining and so on. However,the processing of punctuations,quotations,names of persons,places and books in ancient books is quite complex and diversified,which calls for a large scale corpus to support. The knowledge repository is an intelligent system based on knowledge. The uppermost function of classical knowledge repository is to connect the ancient and modern times and help computers to understand ancient culture and literature correctly. The objectivity of “knowledge” is emphasized,which provides interpretation of the “knowledge” in accordance with the ancient thinking and culture based on the understanding of ancient culture. We should give priority to knowledge repositories from the perspectives of language,time,geography,systematization as well as attaching the past to the present. Knowledge repositories include not only those of names of persons,places,functionaries,books and renowned items that belong to “explicit knowledge” but also those that have “tacit knowledge” characteristics such as repositories of academic histories,intellectual histories,literature,art and other “indefinable” repositories. 1 fig. 1 tab. 16 refs.
中文摘要:
      “数字人文”是在传统人文研究中引入数字技术及方法来研究人类社会各种文化现象的新型跨学科研究范式。在涉及古代社会与文化研究领域,数字人文研究除了采用人工智能、大数据分析等研究方法以外,还需要一些基础条件,包括如何让计算机利用和理解古代文献和古代文化,古籍数字化和古典知识库建设就是数字人文研究所必须的基础条件。古籍数字化主要涉及两个方面的问题:一个是计算机编码汉字,尤其是异体字和异形字的编码问题;另一个是计算机图像识别并转换为字符的能力问题。古典知识库是对古籍内容进行数据统计、信息和知识挖掘的基础,需要从语言、时间、地理、体系化、联结古今等维度筹划古典知识库建设,以助力数字人文的研究。图1。表1。参考文献16。
View Full Text   View/Add Comment  Download reader