Page 132 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 42
P. 132
OU Shiyan, TANG Zhengui & SU Feifei / Construction and usage of terminology services for information retrieval 131
3 Usefulness of terminology services for information retrieval
In theory, terminology services are no doubt useful for information retrieval and can improve the
performance of an information retrieval system. For example, it can improve the recall of retrieval
results by expanding query terms with synonyms. However, it is always not tested quantitatively
to what extent terminology services can improve retrieval results. In this study, we quantitatively
measured the impact of the “getSynonyms” service on information retrieval results using a library
OPAC system and Baidu Search Engine as testing targets and Chinese Thesaurus as the source
vocabulary of terminology services, and compared the retrieval results with or without the use of
terminology services to prove the usefulness of terminology services for information retrieval.
Why we selected the “getSynonyms” service to test is because that it is very general to expand
query terms with synonyms in information retrieval, whereas how to use other terminology
services depends more on users’ own choice and thus has bigger individual difference.
3.1 Experimental setup
We selected 30 terms from Chinese Thesaurus to do information retrieval experiments. Some terms
are descriptors, and some are non-descriptors. Each term has one or more synonyms. We carried
out two-round retrieval experiments. In the first round, we did retrieval experiments respectively
in two information retrieval systems using each term as a query term. In the second round, we
obtained the synonyms of each term through the “GetSynonym” terminology service, combined
each term with its synonyms with the logical operator “OR” to form a new query and then did
retrieval experiments with these new queries. The retrieval results were measured with precision,
recall and F value. However, for in-house information retrieval systems and open Web search
engines, the calculation of precision and recall is a little different.
3.2 Experiment results
For the library OPAC system and Baidu Search Engine, the results of the retrieval experiments and
analyses are reported in detail as follows.
(1) The experimental results of the library OPAC system
For an OPAC system, the number of retrieved documents and the number of relevant documents
among them are both finite, and thus it is easy to accurately calculate precision. However, the
total number of relevant documents in a document collection cannot be determined directly, and
thus it is impossible to accurately calculate recall. Our solution is to obtain the corresponding
classification number of Chinese Library Classification for each query term and then browse
relevant documents in the OPAC system according to the classification number to obtain the total