Page 270 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 270

Extended English abstracts of articles published in the Chinese Edition of Journal of Library Science in China 2015 Vol.41  269


               The dataset contains metadata records of nearly 1.2 million materials most widely held in libraries.
               The metadata contains approximately 80 million linked data triples, which can help users find the
               linked resources easily on the Web. For the corpus of news articles, we collect the news articles of
               Yahoo! news from RSS feeds, dated from the 5th of April to 7th of July, 2014, totally 95 days. In
               order to get an objective observation of the performance, we randomly selected 500 news articles
               (about 10% of the news articles set) for evaluation. The results are evaluated with TOP10 recall hit
               rate, from which we can see WMF has better performance than LSA and NMF.
                 This newswire-library linking offers a number of unique advantages to both libraries and
               information seekers: the up-to-dateness, the extensive coverage and comprehensiveness, the rich
               description. Using newswires as a complementary information resource in library catalogues
               addresses users’ information need by offering a vast pool of everyday life subject headings to
               complement the traditional library vocabularies constructed mainly by experts knowledge.
                 For future work, we will involve library users in the evaluation of the system and make necessary
               improvements.



               Methods of text theme identification based on graph mining

               Hongmei GUO & Zhixiong ZHANG 1 ∗

               With the development of the internet, electronic text is booming. These text resources, especially
               scientific journal papers, contain rich semantic and linked information. How to demonstrate the
               core topics quickly and accurately to assist researchers and improve research efficiency has been
               an urgent issue in text mining. Nodes and edges of graph can represent terms and their relations
               of texts, so many researchers tried to combine graph mining with natural language processing
               to identify text theme. This paper investigated and analyzed the studies and summarized their
               advantages and disadvantages in order to provide a reference for further research.
                 At present, the studies focus on textual representation of relation graph, theme identification
               based on centrality and subgraph detection or clustering. The method of theme identification based
               on cohesive subgraph detection mainly is to recognize clique or quasi-clique subgraph to represent
               the core content of the texts. Theme identification based on graph mining uses two methods: one
               is according to the graph topological structure, and the other considers graph topological structure
               and node attributes simultaneously. We mainly analyzed the clustering model, algorithm and
               evaluation criterion of clustering result. The methods of frequency statistics and external dictionary
               are relatively mature and often used as benchmark. Centrality methods have been greatly
               improved, but the algorithm efficiency still needs to be improved. The methods based on graph
               mining have already shown advantages and are worth deeper exploration.

               * Correspondence should be addressed to Zhixiong ZHANG, Email: zhangzhx@mail.las.ac.cn, ORCID: 0000-0003-1596-7487
   265   266   267   268   269   270   271   272   273   274