陈涛,刘炜,朱庆华.中文百科概念术语服务平台SinoPedia的构建研究[J].中国图书馆学报,2018,44(4):4~18
SinoPedia: An Unified Chinese Terminology Service Platform Based on Linked Data
中文百科概念术语服务平台SinoPedia的构建研究
Received:June 26, 2018  
DOI:
Key words:SinoPedia  Linked Data  Knowledge graph  Digital Humanities  Knowledge discovery
中文关键词:  SinoPedia  关联数据  知识图谱  数字人文  知识发现
基金项目:本文系国家社会科学基金重大项目“面向大数据的数字图书馆移动视觉搜索机制及应用研究”(编号:15ZDB126)的研究成果之一
Author NameAffiliationE-mail
CHEN Tao 上海图书馆/上海科学技术情报研究所 上海 200031 tchen@libnet.sh.cn 
LIU Wei 上海图书馆/上海科学技术情报研究所 上海 200031  
ZHU Qinghua 南京大学信息管理学院教授博士生导师。江苏 南京 210023。  
Hits: 2884
Download times: 1133
Abstract:

    With the development of “Web of data”, the content of the World Wide Web is no longer purely text but a collection of entities that can express and simulate events and their interrelationships. It is very important to specify entity names, attributes, and vocabularies on the World Wide Web. Europe and the United States have formed extensive Linked Open Data (LOD) services. However, the lack of Chinese conceptual terms has severely hindered the standardization and promotion of ontology in Chinese Knowledge Maps and Chinese domains. The SinoPedia platform proposed in this paper uses RDF triples to assign unique URIs with respect to the current public domain encyclopedia terminology and persist resources. It follows the Linked Data of W3C that will publish the resources by four publishing principles. Moreover, the SinoPedia, acts as a publishing center of resources and can provide Linked data related services to access external Linked Data sets (SPARQL Endpoint). The SinoPedia is composed of SOOOPA retrieval module, LODVIEW publish module and LODLIVE discovery module. It has been associated with DBPedia, WikiData and the Shanghai Librarian Name Authority File using the SOOOPA module to provide search services, and self built resource entries. SinoPedia can store RDF data using OpenLink Virtuoso database. The search module of SOOOPA can retrieve words, multi words, simplified and traditional Chinese characters and resource URIs, which can make intelligent ranking of search results. The retrieval results also give a link to other open resources, and the relevant information of the entries can be seen in other data sources in these results.
In addition to these search services, SinoPedia also provides Linked Data publishing services that can act as Linked Data distribution centers (Hubs). The SinoPedia provides a unified RDF data publication and content negotiation service for different Linked Data sites accessed by SPARQL Endpoints. Our platform extends the system of LODVIEW to support SPARQL Endpoint configurations with multiple external data sources. Resources from different sources are re assigned in SinoPedia to obtain a uniform resource URI address, and, these resources can be redirected to the origin resource. The raw data of this resource are published using the new URI address of SinoPedia platform.
The SinoPedia integrates the LODLIVE system to realize the discovery and integration of Linked Data between different resources. The unified publication of different data sets achieves the unity of data syntax layer (RDF structuring). The links of different data sets achieve the unity of the data semantic layer, that is, the integration of multi source data is realized through association. LODLIVE's Discovery Module displays the Linked Data from different sources in the form of knowledge graph. This Discovery Module also implements semantic extension and knowledge discovery services for resources through correlation.
At present, SinoPedia currently contains 5.54 million triplet data that includes people, places and institutions, and 730 000 instances. SinoPedia also provides API interface and SPARQL Endpoint calls. Finally, SinoPedia endpoint will also be registered in the Linked Open Data (LOD) cloud to make up the deficiency of knowledge base of Chinese encyclopedia in the LOD. In the future, SinoPedia can be used as a data link center in the digital humanities field to get more resource information by connecting to SinoPedia, and promote the development of digital humanities research. 7 figs. 3 tabs. 20 refs.

中文摘要:
      随着“数据的网络”的兴起,万维网的内容已不再是纯粹的文本,而是表达和模拟多种事物及事件之间相互关系的实体集合,其中实体名称、属性及取值词表的规范十分重要。国外已形成覆盖广泛的“关联开放数据(LOD)”服务。中文概念术语的缺乏已严重阻碍中文知识图谱和中文领域本体的标准化和推广应用。本文提出的SinoPedia平台采用RDF三元组对目前公共领域的百科概念术语赋予唯一的URI进行资源的持久化,并通过SOOOPA模块提供检索服务。同时,自建的资源词条已与DBPedia、WikiData、上海图书馆人名规范档等多个开放资源做了实体关联。除检索服务外,SinoPedia还提供了关联数据发布服务,可以充当关联数据发布中心(Hub)。通过扩展LODVIEW系统为不同关联数据站点(SPARQL Endpoint)提供统一的关联数据发布和内容协商服务。此外,SinoPedia集成了LODLIVE系统,能够实现不同数据集之间关联数据的发现与融合。目前SinoPedia包括了554万条三元组数据,并提供API接口和SPARQL Endpoint两种数据调用方式,下一步将申请接入LOD云图。SinoPedia将来可以作为数字人文领域的数据链接中心,推动数字人文研究的快速发展。图7。表3。参考文献20。
View Full Text   View/Add Comment  Download reader