Page 125 - Journal of Library Science in China, Vol.45, 2019
P. 125
124 Journal of Library Science in China, Vol.11, 2019
(2) Calculation of citation intensity
It can be seen from Table 1, the total number of citations in all disciplines was higher than the
number of citing literatures. Some scholars have pointed out that the more times a literature was
cited in a citing literature, it would be more important (Hassan et al., 2017; Jurgens et al., 2018).
Hence, this paper used citation intensity metric to measure the importance of a given literature.
The calculation formula is as follows:
(1)
S book =Q citation /Q reference
Where, S book represents the citation intensity of a book, Q citation represents the number of times a
book is cited, and Q reference represents the number of the citing literatures of a book. A higher value
of S book means a higher citation intensity of the book.
(3) Calculating the length of citation content and annotating the citation sentiment information
According to the citation content corpus, we took the string length of each citation sentence
as the length of citation content. We divided the citation sentiments of citation content into three
categories: positive citations, neutral citations and negative citations. We obtained the sentiment
information of the citing literatures manually. According to the citation contents and their contexts,
we classified citation sentiments of the citation content, and finally obtained the sentiment
categories. Table 3 shows examples of sentiment information classification.
Citation sentiment annotations of citation content obtained manually have a certain degree of
subjectiveness. In order to ensure the accuracy of manual annotation results, three annotators
completed sentiment annotation of citation content independently, and we evaluated the
consistencies of three annotators’ annotation results with Kappa coefficients (Warrens, 2011). The
formula (2) is used to calculate Kappa coefficient is:
K= P (A) -P (E) (2)
1-P (E)
Where, P (A) represents the actual observed value of the consistency of the annotation results, and
P (E) represents the expected value of the consistency of the annotation results. If K is higher than
0.8, the annotation results are reliable. If 0.8>K>0.67, the annotation results are relatively reliable
(Carletta, 1996). We calculated the Kappa values between results of the three annotators, the highest
consistency results in each discipline were: computer science, 0.633; literature, 0.726; law, 0.723;
medicine, 0.827 and sport science, 0.805. It indicated that the results of the annotation consistency
in different disciplines were different. The consistency in four disciplines was higher than the trusted
consistency standard (i.e. K=0.69), while the consistency in computer science was only 0.633,
which may be caused by the small number of citations in computer science. As Kappa coefficient
is more sensitive to objects with low annotation frequency, little inconsistency in annotation results
will reduce the consistency of annotation results (W. Lu, Meng, & X.B. Liu, 2014).
This paper chose the annotation results of the two annotators with the highest Kappa coefficient
in each discipline, then, we discussed and analyzed the inconsistent annotations. Finally, we
obtained the citation sentiment annotation results of the citation content.