Page 266 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 266
Extended English abstracts of articles published in the Chinese Edition of Journal of Library Science in China 2015 Vol.41 265
controlled vocabularies from libraries and other related institutions. Therefore, the issue of how to
effectively access these interlinked RDF data becomes of crucial importance. SPARQL provides
a standard way to query RDF data; however, it is very difficult for ordinary users to construct
SPARQL queries. Question answering, which can provide an easy-to-use natural language
interface, is undoubtedly an ideal solution. Earlier question answering research on the Semantic
Web is oriented to a single RDF dataset. With the growth of interlinked RDF datasets on the Web,
there is an urgent need to extend question answering from a single RDF dataset to multiple RDF
datasets, which thus causes more problems and challenges in semantic annotation and answer
integration.
This paper proposes a novel question answering method over library Linked Data, which
transforms a natural language question into a structured SPARQL query to retrieve answers from
five interlinked RDF datasets in libraries, including bibliographic data, thesauri, events, people/
organizations and locations. The question answering procedure includes three main steps: 1) Index
construction: extract instance names (i.e. named entities) from RDF data and the lexical labels of
ontology classes and properties from OWL files, and offline construct two indexes (one for named
entities and one for ontology terms) using the open source information retrieval toolkit LUCENE;
2) question preprocessing: perform Chinese word segmentation, named entity recognition, and
semantic annotation based on the constructed indexes, categorize questions into two categories,
i.e. simple questions involving a single RDF dataset and complex questions involving multiple
RDF datasets, according to the number of the involved ontologies and the number of the classes
and their relationships, and furthermore categorize simple questions into two types, i.e. the A type
querying attributes and the B type querying names; 3) question answering: for a simple question,
construct a SPARQL query based on the pre-defined rules; for a complex question, decompose it
into several simple sub-questions, process each sub-question using the simple question method,
and then combine the results of the sub-questions to construct a SPARQL query for the whole
complex question.
The innovation of this proposed question answering method lies in transforming question
answering over multiple RDF datasets into the one over a single RDF dataset in order to facilitate
the construction of SPARQL queries and answer integration, by decomposing a complex question
into several simple questions based on its dependency parsing result. The experiment results
show that this is an effective question answering method which greatly simplifies the processing
of complex questions and obtains an answer accuracy of 88% for complex questions and 91%
for both simple and complex questions. However, this method can only be used to answer the
questions which are stated explicitly in RDF datasets, and is not able to answer the questions which
require reasoning and computing, for example, those containing “more” and “the most”.
Question answering provides a straightforward and easy-to-use manner of accessing Linked
Data. It is a key step in the application of Linked Data in the real world. Thus, the research content
of this paper has a very significant value to facilitate the application of Linked Data in libraries. It