徐雷,秦翠玉,李娇.科技文献数据化及组织呈现路径研究[J].中国图书馆学报,2022,48(3):25~42
Datafication,Organization and Manifestation of Scientific Literature
科技文献数据化及组织呈现路径研究
Received:July 30, 2021  
DOI:
Key words:Scientific literature  Scientific knowledge  Datafication  Organization and manifestation  Scientific communication
中文关键词:  科技文献  科学知识  数据化  组织呈现  科学交流
基金项目:本文系教育部规划基金项目“科学出版物语义组织模式及其实现路径研究”(编号:20A10486066)和武汉大学自主科研项目“图情档领域科学本体构建与应用研究”的研究成果之一
Author NameAffiliation
XU Lei 武汉大学信息管理学院 湖北 武汉 430072 
QIN Cuiyu 武汉大学语义出版与知识服务重点实验室 湖北 武汉 430072 
LI Jiao 武汉大学语义出版与知识服务重点实验室 湖北 武汉 430072 
Hits: 938
Download times: 609
Abstract:
In order to facilitate scientific communication,there is an increasing number of research practices that digitize and reorganize scientific knowledge from the growing body of scientific literature. This paper summarizes the main approaches of current scientific knowledge acquisition,including all kinds of academic databases and academic search engines,social network platforms of science,and open access academic platforms,etc. With the convenience of scientific knowledge access,we also face new kinds of scientific communication difficulties,such as quite low efficiency of reading comprehension of large scale of literature.
In order to break this new scientific communication dilemma,datafication,organization and manifestation of scientific literature as the mainstream practices are carried out widely. This paper makes a survey about this research field systematically from aspects of related concepts,application scenarios and implementation technologies. It includes six dimensions,namely meta datamation of scientific literature,extraction of scientific vocabularies,recognition of scientific entities and their relations,recognition of discourse function structure,semantic organization of scientific literature,presentation and intelligent application of scientific literature. Recognition and extraction of scientific vocabularies,named entities and their relations,discourse structure such as knowledge units and summary of scientific literature,are the mainstream practices in this research field. A large number of data models and scientific knowledge graphs have been constructed based on scientific literature content,and studies on distributed representation of scientific literature have also been increased gradually. However,the degree of automatic indexing of literature by using current data models needs to be strengthened,and the granularity of scientific knowledge graphs needs to be deepened. In the aspect of intelligent applications about scientific literature,higher intelligent form of literature service platforms has appeared. More and more attentions are payed to this research direction,but it is still not enough to overturn the current form of scientific publications which are mainly based on text narration. There is no unified datafication framework for scientific literature processing,and cognitive differences,different granularity and data organization norms for datafication practice. And there are two directions of datafication and presentation of scientific literature do not fully merge and form a complete closed loop.
Aiming at the main problems existing in this research field,this paper designs a framework of the datafication,organization and manifestation of scientific literature,which realizes the closed loop from machine oriented datafication to human oriented knowledge service,including different granularity of datafication,main forms of data organization and main scenarios. Key technologies supporting this framework,such as recognition and extraction technology,semantic organization technology,analysis and reasoning technology,display and interaction technology of scientific literature content are also elaborated in details.
This paper concludes with a summary of future research and practice directions and potential challenges in this research field,such as automated scientific knowledge acquisition,scientific data quality and trustworthy scientific communication,and interactive experience of scientific knowledge. In the future,it is still necessary to strengthen cooperation among various parties to realize narrative production and transformation of scientific knowledge based on high quality scientific data. 4 figs. 3 tabs. 69 refs.
中文摘要:
      文本型科技文献是当前科学知识表达以及科学交流的主要形态。为了促进科学交流,对日益增长的科技文献中的科学知识进行数据化及组织呈现的研究和实践逐渐增多。本文对科技文献数据化及组织呈现方法、应用场景、实现技术进行了系统梳理,包括科技文献的元数据化、科学词汇抽取、领域实体及其关系识别、篇章功能结构识别、科技文献语义组织以及科技文献呈现与智能化应用六个维度,总结目前该研究领域存在的主要问题;在此基础上设计了科技文献数据化及组织呈现的整体框架,阐述了该框架实现的四个核心技术:识别抽取技术、语义组织技术、分析推理技术以及展陈交互技术;最后归纳总结了该领域面临的挑战,如科学知识自动获取、科学数据质量及信任性、科学知识交互体验等。未来需要加强各方合作,以高质量的科学数据为基础,实现科学知识的叙事生产和转化。图4。表3。参考文献69。
View Full Text   View/Add Comment  Download reader