Page 270 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41

P. 270

Extended English abstracts of articles published in the Chinese Edition of Journal of Library Science in China 2015 Vol.41 269

The dataset contains metadata records of nearly 1.2 million materials most widely held in libraries.
The metadata contains approximately 80 million linked data triples, which can help users find the
linked resources easily on the Web. For the corpus of news articles, we collect the news articles of
Yahoo! news from RSS feeds, dated from the 5th of April to 7th of July, 2014, totally 95 days. In
order to get an objective observation of the performance, we randomly selected 500 news articles
(about 10% of the news articles set) for evaluation. The results are evaluated with TOP10 recall hit
rate, from which we can see WMF has better performance than LSA and NMF.
This newswire-library linking offers a number of unique advantages to both libraries and
information seekers: the up-to-dateness, the extensive coverage and comprehensiveness, the rich
description. Using newswires as a complementary information resource in library catalogues
addresses users’ information need by offering a vast pool of everyday life subject headings to
complement the traditional library vocabularies constructed mainly by experts knowledge.
For future work, we will involve library users in the evaluation of the system and make necessary
improvements.

Methods of text theme identification based on graph mining

Hongmei GUO & Zhixiong ZHANG 1 ∗

With the development of the internet, electronic text is booming. These text resources, especially
scientific journal papers, contain rich semantic and linked information. How to demonstrate the
core topics quickly and accurately to assist researchers and improve research efficiency has been
an urgent issue in text mining. Nodes and edges of graph can represent terms and their relations
of texts, so many researchers tried to combine graph mining with natural language processing
to identify text theme. This paper investigated and analyzed the studies and summarized their
advantages and disadvantages in order to provide a reference for further research.
At present, the studies focus on textual representation of relation graph, theme identification
based on centrality and subgraph detection or clustering. The method of theme identification based
on cohesive subgraph detection mainly is to recognize clique or quasi-clique subgraph to represent
the core content of the texts. Theme identification based on graph mining uses two methods: one
is according to the graph topological structure, and the other considers graph topological structure
and node attributes simultaneously. We mainly analyzed the clustering model, algorithm and
evaluation criterion of clustering result. The methods of frequency statistics and external dictionary
are relatively mature and often used as benchmark. Centrality methods have been greatly
improved, but the algorithm efficiency still needs to be improved. The methods based on graph
mining have already shown advantages and are worth deeper exploration.

* Correspondence should be addressed to Zhixiong ZHANG, Email: zhangzhx@mail.las.ac.cn, ORCID: 0000-0003-1596-7487

265 266 267 268 269 270 271 272 273 274