张力元,王军.基于机器学习的古籍目录互著与别裁探析[J].中国图书馆学报,2022,48(2):47~61
Research on Inter Record and Analytic Record of Classical Bibliography Based on Machine Learning
基于机器学习的古籍目录互著与别裁探析
Received:September 13, 2021  Revised:December 14, 2021
DOI:
Key words:Classical bibliography  Inter record  Analytic record  Machine learning  Digital humanities
中文关键词:  古籍目录  互著  别裁  机器学习  数字人文
基金项目:本文系国家自然科学基金国际重点合作项目“中国儒家学术史知识图谱构建研究”(编号:72010107003)的研究成果之一
Author NameAffiliation
ZHANG Liyuan 北京大学图书馆 北京 100871 
WANG Jun 北京大学图书馆 北京 100871 
Hits: 837
Download times: 666
Abstract:
Bibliography is an important tool to organize and utilize ancient books and is also the key research object of library and information scienceAs two auxiliary methods in classical bibliography,inter record and analytic record aim at recording the documents accurately and completely in the bibliography system according to the diversity of contents on the basis of the in depth analysis of the content characteristics of the documents so as to achieve the function of
“once classes are divided,academic ideas become clear”. However the traditional methods of inter record and analytic record are mainly completed by human beings,which leads to several issues such as low efficiency,high cost,weak scientificity,weak objectivity and poor reliability,etc.
This study aims at introducing machine learning to classical bibliography under the perspective of digital humanities to provide a new implementation strategy forinter record and analytic record. The study first proposes to map inter record and analytic record of classical bibliography to the problem of text classification and puts forward a method framework based on machine learning to contribute countermeasures for the multi category record of ancient books in the bibliographic system. This study takes the pre Qin schools and representative books as the object to verify the method. Two machine learning models,namely TextCNN and BERT are used to classify ten ancient books from six schools of pre Qin dynasties,and each school has one or two books correspondingly. The classification result shows that fine tuned BERT model outperformed TextCNN model and can achieve 9164% of classification accuracy. The fine tuned BERT model is then used to classify two controversial books,namely Xunziand Guanzi,and the results of inter record and analytic record of two books are obtained. It is suggested that Xunzishould be classified to Legalism category and Confucianism category,while Guanzishould only be classified to Legalism category. In particular,this study also finds that the first 26 chapters of Xunzi are more inclined to Legalism and the last six chapters are more inclined to Confucianism.
The innovations of this paper are mainly reflected in the following aspects: it puts forward the method of inter record and analytic record based on machine learning,and puts forward the method of text classification to distinguish the categories of ancient books from the particle size of books,sections and chapters and to judge the ideological tendency of scholars. It shows the value of digital technology to the study of classical bibliography,classical philology and academic history from the perspective of digital humanities. 5 figs. 7 tabs. 43 refs.
中文摘要:
      目录是组织与利用古籍资源的重要工具,也是图书情报学科的重点研究对象。互著与别裁作为古典目录学中的两种辅助方法,能在深入剖析文献内容特征的基础上,根据内容的多元性将文献准确、完整地记载于目录体系中,达到“类例既分,学术自明”的效果。将互著与别裁映射为文本挖掘中的文本分类问题,提出基于机器学习以实现互著与别裁的方法框架,为古籍在目录体系中的多类目记载提供方法。首先利用TextCNN与BERT两种机器学习模型对先秦诸子六家十部典籍文本进行分类训练,结果显示BERT优于TextCNN,可以达到9164%的分类准确率;之后用微调训练后的BERT模型对《荀子》与《管子》进行篇、章粒度的分类判断,最终得出这两部图书各篇章互著与别裁的结果。本研究展现了在数字人文视域下,数字技术对古典目录学、古典文献学以及学术史研究的应用价值。图5。表7。参考文献43。
View Full Text   View/Add Comment  Download reader