Page 181 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 43
P. 181
181
Extended English abstracts of articles published in the Chinese edition of Journal of Library Science in China 2017 Vol.43 181
Co-word analysis: Limitations and solutions
LI Gang & BA Zhichao〇
〇a*
Co-word analysis is a content analysis technique based on the assumption that the subject of a
paper can be summarized in a limited number of key terms. If two terms co-occur within one paper,
the two research topics they represent are related, and the higher frequency of the co-word means
stronger correlation in terms pairs. However, the basic work of co-word analysis is still words and
extremely sensitive to the selection of terms, and the quality of co-word analysis depends on a
variety of factors, such as the quality of terms and indexes, the high-frequency terms extraction,
and the adequacy of statistical methods. Therefore, it is necessary to delve into the limitations of
co-word analysis at different stages to improve and optimize it.
The co-word analysis conducted in the present study involved six sequential steps: determination
of problem analysis, term source selection, high-frequency terms extraction, relevance calculation
of terms, multivariate statistical analysis, and visual presentation of results. This paper focuses
on those six key issues to analyze and demonstrate the main problems based on the induction and
summarization of the existing relevant research. Results indicate the following conclusions. 1)
In the term source selection, solely making use of keywords and index words, which is called
“indexer effect” by researchers, is the biggest problem of early co-word analysis. Keywords are
uncontrolled words, and problems of homonyms and synonyms will be brought out. Meanwhile,
terms expression differences exist among different parts of analysis units, and some errors of
co-word analysis will be induced if those differences are ignored. In order to solve the above
problems, the textual semantic structure and the phenomenon of different quality with different
quantity of terms can be considered. 2)Researchers engaged in co-word analysis have never been
out of the pattern that adopts high-frequency term to develop the multivariate statistical analysis.
The extraction of high-frequency terms not only makes low-frequency terms more marginalized,
but also causes isolation of high-frequency terms that have low correlation with clusters.
Considering the discipline and multi-semantic types of terms to distinguish the representation
capabilities of subject areas, we can have a comprehensive and in-depth understanding of the
research characteristics of this field. 3)Two co-occurrence terms may correlate each other directly
or indirectly, but these semantic relationships between co-occurrence terms are not considered
at all, which may affect the soundness of the results of co-word analysis ultimately. Thus we
summarize the existing calculation methods of semantic correlation and point out the limitations
of each method. 4)Finally, in the multivariate statistical analysis, taking the co-word clustering and
co-word association analysis method as example, we discuss the problems of their application in
the new data environment and put forward the improvement method and suggestion.
Co-word analysis has been most commonly utilized in mapping or tracing patterns and trends in
* Correspondence should be addressed to BA Zhichao, Email: bazhichaoty@126.com, ORCID: 0000-0001-5626-5604