Page 183 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 183
182 Journal of Library Science in China, Vol. 7, 2015
1.2 Main path analysis
Scientific revolutions, that is, sudden paradigmatic changes resulting from new insights, can be
reflected by abrupt changes in the citation network. Main path analysis can find the scientific
development process hidden in the citation network, also may spot the articles that influence the
research for some time.
Theoretical premise of the main path analysis is taking a citation network as a system of
channels which transport scientific knowledge or information. An article, which integrates existing
knowledge from previous articles as well as adds substantial new knowledge to it, might receive
many citations after its publication and make direct citations to those previous articles seem
redundant. As a consequence, such an article will be an important junction in the information
transportation system, and a great deal of knowledge flows through it. In this case, a citation link
appearing between many articles is more crucial than the hardly appeared one. A set of important
citation links constitute one or more main paths, which are the backbones of a research tradition
(De Nooy, Mrvar, & Batagelj, 2011).
Main concepts and indices of main path analysis can be defined as follows:
1) Source vertex: a vertex with zero indegree in an acyclic network.
2) Sink vertex: a vertex with zero outdegree in an acyclic network.
3) Traversal weight: the proportion of all paths between source and sink vertices that contain
this arc or vertex. Here, nodes represent documents, and arcs represent citation relations.
4) Main path: the path from a source vertex to a sink vertex with the highest traversal weights
on its arcs.
5) Main path component: the path from a source vertex to a sink vertex with the traversal
weights on its arcs greater than or equal to the threshold value. The threshold value is usually set as
the lowest traversal weight on the main paths.
Main path analysis calculates the extent to which a particular citation or article is needed
for linking articles, which is called traversal weight of a citation or article. First, the procedure
counts all paths from each source to each sink and it counts the number of paths that include a
particular citation. Next, it divides the number of paths that use a citation by the total number of
paths between source and sink vertices in the network. This proportion is the traversal weight of a
citation. In a similar way, the traversal weight of each article can be obtained.
According to traversal weights, the main paths and the main path components can be extracted.
Several methods have been proposed to extract main paths from the network of traversal weights.
The method we follow here consists of choosing the source vertex (or vertices) incident with the
arc(s) with the highest weight, selecting the arc(s) and the head(s) of the arc(s), and repeating this
step until a sink vertex is reached. In order to extract the main path component, we need to choose
a cutoff value between zero and 1, and remove all arcs from the network with traversal weights
below this value. The components in the extracted networks are called main path components. To