李文琦,王凤翔,孙显斌,黄芷欣,李芃蓓.历代史志目录的数据集成与可视化[J].中国图书馆学报,2023,49(1):82~98
Diachronic Data Integration and Visualization of Ancient Book Catalogs Compiled for Imperial Collections
历代史志目录的数据集成与可视化
Received:February 11, 2022  
DOI:
Key words:Ancient book catalogs  Digital scholarship  Digital humanities  Bibliography  Data integration  Visualization
中文关键词:  古籍目录  数字学术  数字人文  目录学  数据集成  可视化
基金项目:
Author NameAffiliation
LI Wenqi,WANG Fengxiang,SUN Xianbin,HUANG Zhixin,LI Pengbei 北京大学信息管理系、北京大学数字人文研究中心 北京 100871 
王凤翔 北京大学信息管理系、北京大学数字人文研究中心 北京 100871 
孙显斌 中国科学院自然科学史研究所 北京 100190 
黄芷欣 北京大学中文系博 北京 100871 
李芃蓓 中华书局文学编辑室 北京 100073 
Hits: 471
Download times: 365
Abstract:
Ancient book catalogs record and classify a large number of Chinese ancient books. They are of great academic value for studying both ancient literature and traditional knowledge organization. The development of digital scholarship shed new light on the digital preservation and reuse of these ancient book catalogs as well as the domain research supported by digital tools. Digital scholarship facilitates the digitization and datafication of ancient book catalogs. Moreover,new methods and computational tools are provided to enable the exploration of large collections,and new research questions can be raised from fresh perspectives. Recent studies have introduced computational methods to analyze the abstracts and classification systems of the ancient book catalogs. But these studies were based on only one catalog or a particular category. It is imperative to integrate the catalogs throughout the history and provide digital tools for scholars to explore and analyze them diachronically and holistically.
In this study,we selected eight representative catalogs,mostly from official histories,as data sources. They were Hanshu Yiwenzhi,Suishu Jingjizhi,Jiutangshu Jingjizhi,Xintangshu Yiwenzhi,Songshi Yiwenzhi, Mingshi Yiwenzhi,Qingshigao Yiwenzhiand Siku Quanshu Zongmu. These catalogs cover major dynasties in Chinese history with a time span of more than two thousand years. We adopted a semi automated data processing approach to integrate the book entries in eight catalogs. The whole integration process was iterated by machine pre processing and expert manual correction and contained three main steps—record splitting and field segmentation,field completion and normalization and book identification. Eventually we got more than 110 000 structured data records,and identified over 7 000 books that were recorded in at least two catalogs.
Based on the integrated data,we designed and developed an interactive visual analysis system that included features of statistics,visualization and record query. The system is designed to mainly meet two research requirements proposed by expert users. First,the system provides granular statistics and graphs that can help scholars to compare and trace the change of book volumes in different categories and catalogs. Second,it provides an interactive visualization tool that can be used to explore how different books are classified differently in each catalog,and thus manifests the changes of knowledge organization as well as the origin and evolution of academic thoughts.     In conclusion,this study provides data foundation and analytic tools for the studies of ancient book catalogs in the context of digital scholarship,which not only saves the effort on manual data collection and collation,but also provides new perspectives to identify and solve hermeneutics problems with new techniques. 8 figs. 3 tabs. 36 refs.
中文摘要:
      古籍目录及其分类体系具有重要的学术价值,数字学术的发展为古籍目录的数字化保存和利用以及开展数字工具支持的目录学研究提供了新的契机。本文以时间跨度两千多年的八种史志目录为数据源,以机器预处理与专家校对相结合的人机迭代方式对数据进行记录拆分和字段抽取、数据补全、规范化以及书目认同,最终完成11万余条书目记录的结构化、规范化集成。在此数据集的基础上,从领域专家的研究需求出发,结合统计、可视化、检索等方法,利用人机交互技术构建了一个历代古籍目录可视化分析系统。该系统包括书目统计以及分类演化分析两个主要部分:一方面可对书目数据进行细粒度统计和可视化呈现,以帮助学者清晰地比较、追踪类目的消长;另一方面可对所有典籍在历代目录中的分类演变轨迹以及各类目所收典籍的源流进行可视化分析,以更好地实现类目分合转化的模式识别。本研究为数字学术背景下的目录学研究提供了数据基础和分析工具,不仅为学者省去了大量数据收集、整理的时间,还通过新的技术和视角助力分析、比较等解释性研究。图8。表3。参考文献36。
View Full Text   View/Add Comment  Download reader