Page 188 - Journal of Library Science in China, Vol.45, 2019
P. 188

187
                           Extended English abstracts of articles published in the Chinese edition of Journal of Library Science in China, Vol.45, 2019  187


               totaling 104 million, with a time span of 2008 to 2017. In this study, two data subsets on library
               and information science and philosophy were respectively selected as empirical cases.




               Large-scale people Social Network Extraction and analysis based on online
               encyclopedia
               LIN Zefei & OU Shiyan 〇a ∗

               Social Network Extraction (SNE) is an emerging research field which focuses on automatic
               extraction of hidden social networks from a wide variety of information sources. The articles
               of online encyclopedia contain massive information about persons as well as their interpersonal
               relationships, from which a people social network can be extracted and used for the research of
               digital humanities and social computing. The extracted people social network involves both real
               persons who may span thousands of years and virtual persons who may come from a large number
               of literary works. However, most of people social network extraction methods ignore the types
               and spatio-temporal characteristics of persons, and only consider text similarity or other related
               features to measure the degree of relevance between persons. This may result in restrictions on the
               accuracy and application field of the extracted people social networks.
                 This study explored the automatic extraction of a large-scale people social network from Chinese
               online encyclopedia for the first time by taking Baidu Encyclopedia as an example. It proposed
               a new method of social network extraction, which distinguishes the types and spatio temporal
               characteristics of extracted persons and more accurately measures the weight of interpersonal
               relationships based on multiple relevance features. This method contains three phrases—generating
               an initial people social network, computing the relationship strength between different persons and
               analyzing the spatio-temporal characteristics of persons. In the first phase, the articles on persons
               (hereinafter referred to as “person articles”) were identified from Baidu Encyclopedia, and then an
               initial undirected and unweighted people social network containing more than 0.54 million nodes
               and 2.22 million edges were generated based on the links between person articles. In the second
               phase, the strength of the relationships between persons in the initial network was calculated as
               a ranking task. It was solved with a supervised learning to rank (L2R) method to combine five
               similarity features for measuring the relevance degree between persons. Based on this method, the
               initial unweighted people network was then transformed to a weighted network in which person
               nodes are across time and space. In the third phase, the living time-space of each person in the
               people network was estimated. For a real person, his/her living time-space was estimated based on
               the years (including reign titles) occurring in the article on him/her, whereas for a virtual person,
               his/her living time-space was one or more works depicting him/her. In this way, a time-space


               * Correspondence should be addressed to OU Shiyan, Email: oushiyan@nju.edu.cn, ORCID: 0000-0001-8617-6987
   183   184   185   186   187   188   189   190   191   192   193