文章摘要

高劲松,周习曼,梁艳琪.面向关联数据的实体链接发现方法研究[J].中国图书馆学报,2016,42(6):85~101
面向关联数据的实体链接发现方法研究
Linked Data-Oriented Method of Entity Linking Discovery
投稿时间:2016-07-22  修订日期:2016-09-05
DOI:
中文关键词: 关联数据  实体链接  数据链接  链接发现
英文关键词: Linked data  Entity linking  Data linking  Linking discovery
基金项目:本文系国家社会科学基金一般项目“基于关联数据的知识创造中知识外化和融合机制研究”(编号:12BTQ039)的研究成果之一
作者单位
高劲松 华中师范大学信息管理学院硕士研究生 湖北 武汉 430079 
周习曼 华中师范大学信息管理学院硕士研究生 湖北 武汉 430079 
梁艳琪 华中师范大学信息管理学院硕士研究生 湖北 武汉 430079 
摘要点击次数: 2947
全文下载次数: 1041
中文摘要:
      随着关联数据应用的不断深入,已有众多的数据集发布在网上,但目前已发布的关联数据集之间关联很少,为数据的共享使用带来不便。本研究提出一种基于统计学习方法进行关联数据集间实体识别及链接构建的方法。首先进行数据集间的实体匹配,采用基于K中心点聚类算法实现属性的聚合及关系发现,对具有高相关度的属性进行匹配关系描述,降低实体匹配时的属性匹配计算次数;其次对已匹配的属性进行实体属性值的相似度比较计算,实现实体间相似度的比较,在SILK框架下实现实体的链接构建工作,以达到实体链接发现的目的;最后通过实验验证,这一方法能降低数据集间实体匹配计算次数,提高实体链接的正确率,具有可行性及实用性。图12。表4。参考文献19。
英文摘要:

    The World Wide Web has been developed into a global data space,which links web data and database data. Linked data is one of the best tools to achieve this information evolution. Linked data publish data in a structured form to interlink resources. With the depth of linked data being deeply applied,more and more data are published on the web as linked data. The published web information also has been transformed into linked data in automatic or semi-automatic ways. Practically,there are still only a few connections between the released linked dataset,and it is inconvenient to share data. So based on the entity linking discovery,we can discover the real relation between entities,build the entity linking according to the publishing standard,realize the goal of discovering potential entity linking,enhance the interlinking between datasets,and then increase the accuracy of published linked data.

     In this thesis,a statistical learning method is proposed to recognize entities and build links across different linked datasets. Before the entities comparing computation,first,the method finds class correspondences to classify related entity attributes correspondences across datasets. It gives a matching relationship description for the high correlation attributes and reduces the calculation times to match entity attributes. Second,our method compares the similarity of entities based on calculating the similarity of the matched attributes,and builds entities linking to complete the goal of linking discovery across different datasets. When to cluster the attributes correspondences,we use K-medoids clustering algorithm to discover the potential attributes correspondences. K-medoids clustering algorithm is mainly aimed at classifying property concepts and corresponding attributes that represent the same expression meanings between datasets. At last,the attributes can be compared and matched in groups. Then EDOAL language is used to define the clustered attributes and describe the correspondences relation between those attributes. According to the matching relation,we compare and calculate the similarity between entity attributes. Finally our method works out the linking under the SILK framework:mapping the property relationship to SILK scripts,building entities linking between datasets according to a preset confidence value,endowing entities with RDFs properties,and realizing entity links discovery between datasets. The thesis testifies different open linked datasets on the basis of linked data entity linking discovery method. The datasets mainly include IM@OAIE2014(dataset Abox3)、CKAN(dataset EUROSTAT)and GADM-RDF(dataset GADM),and data are used to cluster matched attributes and interlink entities. Through twice entity linking discovery process of experimental verification,experimental results show that K-medoids clustering algorithm calculates the similarity of entities matching between dissimilar properties can increase the number of entities links. The method already reaches the high accuracy rate and F values. So the proposed method can reduce the calculation times of matching entities across different datasets and improve the accuracy of physical links. It has high feasibility and practicability to solve this problem. 12 figs. 4 tabs. 19 refs.

查看全文   查看/发表评论  下载PDF阅读器