欧石燕,唐振贵.面向图书馆关联数据的自动问答技术研究[J].中国图书馆学报,2015,41(6):44~60
A Question Answering Method over Library Linked Data
面向图书馆关联数据的自动问答技术研究
Received:June 24, 2015  
DOI:10.13530/j.cnki.jlis.150030
Key words:Question answering  Linked Data  RDF dataset  SPARQL query  Semantic annotation  Ontology
中文关键词:  自动问答  关联数据  RDF数据集  SPARQL查询  语义标注  本体
基金项目:本文系国家社科基金项目“基于SOA架构的术语注册和服务系统构建与应用研究”(编号:11BT0023)的研究成果之一
Author NameAffiliationE-mail
OU Shiyan 燕南京大学信息管理学院 江苏 南京 210023 oushiyan@nju.edu.c 
TANG Zhengui 燕南京大学信息管理学院 江苏 南京 210023  
Hits: 3557
Download times: 1690
Abstract:
Since the advent of Linked Data,more and more structured data have been published on the Web in Linked Data format,including a large amount of bibliographic data,academic information and controlled vocabularies from libraries and other related institutions. Therefore,the issue of how to effectively access these interlinked RDF data becomes of crucial importance. SPARQL provides a standard way to query RDF data; however,it is very difficult for ordinary users to construct SPARQL queries. Question answering,which can provide an easy to use natural language interface,is undoubtedly an ideal solution. Earlier question answering research on the Semantic Web is oriented to a single RDF dataset. With the growth of interlinked RDF datasets on the Web,there is an urgent need to extend question answering from a single RDF dataset to multiple RDF datasets,which thus causes more problems and challenges in semantic annotation and answer integration.
This paper proposes a novel question answering method over Library Linked Data,which transforms a natural language question into a structured SPARQL query to retrieve answers from five interlinked RDF datasets in libraries,including bibliographic data,thesauri,events,people/organizations and locations. The question answering procedure includes three main steps:1) Index construction:extract instance names (ie. named entities) from RDF data and the lexical labels of ontology classes and properties from OWL files,and offline construct two indexes (one for named entities and one for ontology terms) using the open source information retrieval toolkit LUCENE; 2) Question preprocessing:perform Chinese word segmentation,named entity recognition,and semantic annotation based on the constructed indexes,categorize questions into two categories,ie. simple questions involving a single RDF dataset and complex questions involving multiple RDF datasets,according to the number of the involved ontologies and the number of the classes and their relationships,and furthermore categorize simple questions into two types,ie. the A type querying attributes and the B type querying names; 3) Question answering:for a simple question,construct a SPARQL query based on the pre defined rules; for a complex question,decompose it into several simple sub questions,process each sub question using the simple question method,and then combine the results of the sub questions to construct a SPARQL query for the whole complex question.
The innovation of this proposed question answering method lies in transforming question answering over multiple RDF datasets into the one over a single RDF dataset in order to facilitate the construction of SPARQL queries and answer integration,by decomposing a complex question into several simple questions based on its dependency parsing result. The experiment results show that this is an effective question answering method which greatly simplifies the processing of complex questions and obtains an answer accuracy of 88% for complex questions and 91% for both simple and complex questions. However,this method can only be used to answer the questions which are stated explicitly in RDF datasets,and is not able to answer the questions which require reasoning and computing,for example,those containing “more” and “the most”.
Question answering provides a straightforward and easy to use manner of accessing Linked Data. It is a key step in the application of Linked Data in the real world. Thus,the research content of this paper has a very significant value to facilitate the application of Linked Data in libraries. It is an earlier study about Chinese question answering over Linked Data,and also an earlier study focusing on Library Linked Data. 5 figs. 5 tabs. 27 refs.
中文摘要:
      早期针对语义网的自动问答主要是面向单一RDF数据集,随着网络上相互关联数据集的急速增加,迫切需要将自动问答扩展到多个RDF数据集,但同时在语义标注、答案整合方面也带来了更大的难度与挑战。本文提出了一种面向图书馆关联数据的自动问答新方法,通过将自然语言提问转换为结构化的SPARQL查询,从图书馆领域相互关联的五个RDF数据集中提取特定答案。该方法的创新点在于,将问句分为涉及一个数据集的简单句和涉及多个数据集的复杂句分别进行处理,又将简单句分为查询属性和查询实例两种类别分别制定SPARQL查询构建规则,将复杂句分解成若干个简单句进行处理,有利于SPARQL查询的构建和答案的整合。 通过实验测评,100个问句的回答精确率达到91%,表明这是一种行之有效的问答方法,对于促进关联数据在图书馆中的应用具有重要意义。图5。表5。
View Full Text   View/Add Comment  Download reader