Page 188 - Journal of Library Science in China, Vol.45, 2019
P. 188
187
Extended English abstracts of articles published in the Chinese edition of Journal of Library Science in China, Vol.45, 2019 187
totaling 104 million, with a time span of 2008 to 2017. In this study, two data subsets on library
and information science and philosophy were respectively selected as empirical cases.
Large-scale people Social Network Extraction and analysis based on online
encyclopedia
LIN Zefei & OU Shiyan 〇a ∗
Social Network Extraction (SNE) is an emerging research field which focuses on automatic
extraction of hidden social networks from a wide variety of information sources. The articles
of online encyclopedia contain massive information about persons as well as their interpersonal
relationships, from which a people social network can be extracted and used for the research of
digital humanities and social computing. The extracted people social network involves both real
persons who may span thousands of years and virtual persons who may come from a large number
of literary works. However, most of people social network extraction methods ignore the types
and spatio-temporal characteristics of persons, and only consider text similarity or other related
features to measure the degree of relevance between persons. This may result in restrictions on the
accuracy and application field of the extracted people social networks.
This study explored the automatic extraction of a large-scale people social network from Chinese
online encyclopedia for the first time by taking Baidu Encyclopedia as an example. It proposed
a new method of social network extraction, which distinguishes the types and spatio temporal
characteristics of extracted persons and more accurately measures the weight of interpersonal
relationships based on multiple relevance features. This method contains three phrases—generating
an initial people social network, computing the relationship strength between different persons and
analyzing the spatio-temporal characteristics of persons. In the first phase, the articles on persons
(hereinafter referred to as “person articles”) were identified from Baidu Encyclopedia, and then an
initial undirected and unweighted people social network containing more than 0.54 million nodes
and 2.22 million edges were generated based on the links between person articles. In the second
phase, the strength of the relationships between persons in the initial network was calculated as
a ranking task. It was solved with a supervised learning to rank (L2R) method to combine five
similarity features for measuring the relevance degree between persons. Based on this method, the
initial unweighted people network was then transformed to a weighted network in which person
nodes are across time and space. In the third phase, the living time-space of each person in the
people network was estimated. For a real person, his/her living time-space was estimated based on
the years (including reign titles) occurring in the article on him/her, whereas for a virtual person,
his/her living time-space was one or more works depicting him/her. In this way, a time-space
* Correspondence should be addressed to OU Shiyan, Email: oushiyan@nju.edu.cn, ORCID: 0000-0001-8617-6987