夏翠娟,刘炜,陈涛,张磊.家谱关联数据服务平台的开发实践[J].中国图书馆学报,2016,42(3):27~38
A Genealogy Data Service Platform Implemented with Linked Data Technology
家谱关联数据服务平台的开发实践
Received:March 14, 2016  
DOI:10.13530/j.cnki.jlis.160014
Key words:Genealogy  Data service  Linked data  Open data
中文关键词:  家谱  数据服务  关联数据  开放数据
基金项目:本文系国家社会科学基金青年项目“W3C的RDB2RDF标准规范在关联数据服务构建中的应用”(编号:13CTQ008)的研究成果之一
Author NameAffiliationE-mail
XIA Cuijuan 上海图书馆系统网络中心 上海 200031 cjxia@libnet.sh.cn 
LIU Wei 上海图书馆 上海 200031  
CHEN Tao 中国科学院上海生命科学信息中心上海 200031  
ZHANG Lei 上海图书馆系统网络中心 上海 200031  
Hits: 4613
Download times: 1260
Abstract:
The description of digital library resources has followed the traditional standard (such as MARC) in the past twenty years. The information, such as title, author, publication information, carrier information,etc, has been well described. However, in this way, it is difficult to directly meet the query requirements of the knowledge implicated in the content. Linked data technologies via building relationships among resources can provide a better way for knowledge organization, description, navigation and retrieval. By reusing and connecting with the open data, linked data technologies can help enrich the relationships among data, expand data using scene, release the potential energy of the data, and build the architecture of data service on the Web. Shanghai Library is trying to use linked data technologies to reorganize the traditional library resources in order to meet the requirements of data sharing, reusing, and also bibliographic control in the internet environment. And at the same time, try to build the historical data services platform which can meet differentiated users service needs. Firstly, we designed an ontology based on Bibliographic Framework (BIBFRAME). Secondly, we extracted the surname, person, place, time, event and other entities from the metadata records according to the ontology. Thirdly, we cleaned the data by merging, disambiguation and standardization, and supplemented information for some important properties (e.g. headstream of the surnames and GIS information of the places).
Then, we assigned HTTP URI for each entity and described the entities based on the RDF abstract data model. By using the RDB2RDF data conversion tools which support W3C R2RML standards and the data processing tools called OpenRefine, we transformed the data format from RDB to RDF, and loaded the RDF data into RDF store called Virtuoso. Finally, we designed the system based on the four principles of linked data, and developed the system based on semantic technologies such as Jena, SPARQL, and other data visualization tools. So the system can support bibliographic control in internet environment. That means users can know the genealogy documents location information about nearly 600 organizations all over the world. The open access to all RDF data for the machines is based on simple technologies such as content-negotiation and Restful API. There are easy-to-use search services for those who just want to know about the stories of the surname and family, and advanced search services for those who want professional data mining and knowledge discovering. Most importantly, the platform allows authenticated users to contribute content by submiting comments and suggestions, or modify data directly.
After other experts confirm, the modifications would be published openly. All comments and modifications would be recorded automatically. Linked genealogy data is the first project to provide open data services based on linked open data technologies in the area of libraries in China. There are some innovation meanings in the methodology of implementation, the process of development and the usage of technological tools. But it is just a starting point for Shanghai Library. There would be lots of work to do about the authority data, which is still insufficient. And there are more external data sets such as Geonames, DBPedia, VIAF and so on need to mashup with the local data. Finally, there are some unresolved problems such as geographical names authority control in a historical view. 5 figs. 11 refs.
中文摘要:
      数字图书馆对馆藏的揭示,沿袭传统的描述标准(如MARC),多以文献特征为主,很难直接满足广大读者对文献知识内容进行查询的需求。关联数据技术通过构建关系明确的语义本体,能够很好地提供基于文献知识内容的揭示、导航和检索,通过开放数据重用和与外部数据的互联,丰富了数据的关联性,扩展了数据利用场景,释放了数据的潜能,为基于互联网的数据服务提供了一种基础设施。这是未来数字图书馆进行知识服务的应有之义。上海图书馆以家谱数据作为起点,尝试利用关联开放数据技术重组图书馆传统资源,构建历史文献数据服务平台。该平台经过基于BIBFRAME的本体设计,从RDB到RDF的数据转换,基于关联数据四原则的系统设计和基于语义技术框架的系统开发,支持面向万维网的书目控制,提供针对普通用户的寻根搜索服务和针对专业人士的数据挖掘服务。图5。参考文献11。
View Full Text   View/Add Comment  Download reader