文章摘要

欧石燕,唐振贵,苏翡斐.面向信息检索的术语服务构建与应用研究[J].中国图书馆学报,2016,42(2):32~51
面向信息检索的术语服务构建与应用研究
Construction and Usage of Terminology Services for Information Retrieval
投稿时间:2015-11-30  
DOI:10.13530/j.cnki.jlis.160009
中文关键词: 信息检索  叙词表  术语注册  术语服务  REST架构  有用性  可用性  层次分析法
英文关键词: Information retrieval  Thesaurus  Terminology registry  Terminology services  REST architecture  Usefulness  Usability  Analytic Hierarchy Process
基金项目:本文系国家社会科学基金项目“基于SOA架构的术语注册与服务系统构建与应用研究”(编号:11BT0023)的研究成果之一
作者单位E-mail
欧石燕 南京大学信息管理学院 江苏 南京 210023 oushiyan@nju.edu.cn 
唐振贵 南京大学信息管理学院 江苏 南京 210023  
苏翡斐 南京大学信息管理学院 江苏 南京 210023  
摘要点击次数: 3468
全文下载次数: 1417
中文摘要:
      在信息检索中,叙词表作为一种辅助有效检索的工具变得日益重要。术语注册与术语服务是在网络环境下对叙词表进行维护和应用的一种理想方式,能够极大促进叙词表在信息检索中的应用。本文的研究重点是面向信息检索的术语服务构建与应用。首先,以《汉语主题词表》为来源词表,采用语义网技术和REST架构构建术语服务。其次,以“获取同义词”服务为例,对术语服务在图书馆OPAC系统和百度搜索引擎中的有用性进行测评,结果表明,采用这一服务使OPAC系统的F值提高13%,使百度搜索引擎的P@5查准率提高16%。最后,设计了四种术语服务在信息检索系统中的应用方式,即复选扩检式、单选替换式、混合式和自动扩检式,并采用用户测评法对这四种应用方式进行可用性测评,结果表明,复选扩检式是可用性最佳的术语服务应用方式。图13。表9。参考文献34。
英文摘要:
In information retrieval, thesauri have increasingly become vital as an aid to effective retrieval. However, the traditional way of incorporating one or more thesauri into an information retrieval system has its big limitation because it has to separately maintain the thesauri and develop access interfaces in individual systems. In recent years, terminology services become an ideal way to use thesauri in the networked environment. While calling terminology services in an information retrieval system, users need to interact with the system, which makes an information retrieval system become an interactive one. Usefulness and usability are two different but closely related aspects of evaluating an interactive system. The overall purpose of this study is to investigate the construction and usage of terminology services. On the one hand, it explores how to construct terminology services with appropriate Web service architecture and emerging Semantic Web technologies; on the other hand, it studies the usefulness and usability of terminology services in information retrieval systems and intends to find out the best usage mode.
    In this study, taking Chinese Thesaurus as a source vocabulary, a system was built with the use of the SKOS and RDF technologies and REST architecture, which consists of two parts:a terminology registry and a set of terminology services. The terminology registry is to provide an authoritative, continually updated source of various vocabularies, and contains three components:metadata registration, vocabulary uploading and vocabulary validation. Six basic terminology services, including SearchConceptByKeyword, getBroaderConcept, getNarrowerConcept, getRelatedConcept, getSynonym, and getEnglishTranslation, were built with Sun’s Jersey, a reference implementation to develop RESTful Web Services based on JAXRS.To show the application of terminology services, two kinds of clients were developed:one is a Web client which provides a graphic interface for human users, and the other is an embedded client which can be integrated into a specific application system.
    Afterwards, this study investigated the usage of the constructed terminology services in information retrieval systems based on the embedded client. Firstly, taking the getSynonym service as an example, we tested the usefulness of this terminology service with an experiment of 30 user queries to the library OPAC system and Baidu search engine. OPAC’s experiment results showed that the average recall increased 26.7% while the average precision decreased 4.6%, which means that using the getSynonym service to expand user queries can greatly improve the completeness of the retrieval results while there is a little reduction in recall. Baidu’s experiment results showed that the P@5 precision increased 16%, which means that there are more related items among the top retrieval results returned by the search engine. Next, the way of using terminology services in information retrieval systems was studied. Four usage modes were designed, including multiple choice for query expansion, single choice for query replacement, hybrid of query expansion and replacement, and automatic query expansion. To evaluate their usability, a user testing was carried out in OPAC and Baidu with 24 human subjects based on four usability criteria (effectiveness, efficiency, user satisfaction and learnability). The evaluation results showed that multiple choice for query expansion was the best usage mode which is more effective, easier to use, easier to learn, and has higher user satisfaction.
     This study is one of few practical efforts on the construction of terminology services in China and thus has realistic significance. Furthermore, it fills the research gap on the usage of terminology services and thus has an important role to facilitate their application in information retrieval. In the future work, we plan to construct more complicated terminology services based on more vocabularies and expand them to other applications. 13 figs. 9 tabs. 34 refs.
查看全文   查看/发表评论  下载PDF阅读器