Page 266 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 266

Extended English abstracts of articles published in the Chinese Edition of Journal of Library Science in China 2015 Vol.41  265


               controlled vocabularies from libraries and other related institutions. Therefore, the issue of how to
               effectively access these interlinked RDF data becomes of crucial importance. SPARQL provides
               a standard way to query RDF data; however, it is very difficult for ordinary users to construct
               SPARQL queries. Question answering, which can provide an easy-to-use natural language
               interface, is undoubtedly an ideal solution. Earlier question answering research on the Semantic
               Web is oriented to a single RDF dataset. With the growth of interlinked RDF datasets on the Web,
               there is an urgent need to extend question answering from a single RDF dataset to multiple RDF
               datasets, which thus causes more problems and challenges in semantic annotation and answer
               integration.
                 This paper proposes a novel question answering method over library Linked Data, which
               transforms a natural language question into a structured SPARQL query to retrieve answers from
               five interlinked RDF datasets in libraries, including bibliographic data, thesauri, events, people/
               organizations and locations. The question answering procedure includes three main steps: 1) Index
               construction: extract instance names (i.e. named entities) from RDF data and the lexical labels of
               ontology classes and properties from OWL files, and offline construct two indexes (one for named
               entities and one for ontology terms) using the open source information retrieval toolkit LUCENE;
               2) question preprocessing: perform Chinese word segmentation, named entity recognition, and
               semantic annotation based on the constructed indexes, categorize questions into two categories,
               i.e. simple questions involving a single RDF dataset and complex questions involving multiple
               RDF datasets, according to the number of the involved ontologies and the number of the classes
               and their relationships, and furthermore categorize simple questions into two types, i.e. the A type
               querying attributes and the B type querying names; 3) question answering: for a simple question,
               construct a SPARQL query based on the pre-defined rules; for a complex question, decompose it
               into several simple sub-questions, process each sub-question using the simple question method,
               and then combine the results of the sub-questions to construct a SPARQL query for the whole
               complex question.
                 The innovation of this proposed question answering method lies in transforming question
               answering over multiple RDF datasets into the one over a single RDF dataset in order to facilitate
               the construction of SPARQL queries and answer integration, by decomposing a complex question
               into several simple questions based on its dependency parsing result. The experiment results
               show that this is an effective question answering method which greatly simplifies the processing
               of complex questions and obtains an answer accuracy of 88% for complex questions and 91%
               for both simple and complex questions. However, this method can only be used to answer the
               questions which are stated explicitly in RDF datasets, and is not able to answer the questions which
               require reasoning and computing, for example, those containing “more” and “the most”.
                 Question answering provides a straightforward and easy-to-use manner of accessing Linked
               Data. It is a key step in the application of Linked Data in the real world. Thus, the research content
               of this paper has a very significant value to facilitate the application of Linked Data in libraries. It
   261   262   263   264   265   266   267   268   269   270   271