Page 149 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 42
P. 149
148 Journal of Library Science in China, Vol. 8, 2016
use decision tree generalization optimization to generate a decision tree which can concisely and
accurately describe the knowledge learnt from neural networks. Gao Yang et al. (2003) proposed a
self-adaption probabilistic programming rule extraction algorithm. They first acquired an optimized
state-action log function using reinforced learning. Then beam search algorithm is adopted to learn
knowledge that meets the condition of probabilistic programming .
We can see from the previous research that as the development of cognitive theory and the
improvement of natural language processing, the extraction and mining on literature full-text
attracts more attention. The research is mainly from two areas.
(1) Library and information Science. In this area, researchers explored and studied the problem
from various perspectives including theory and method, technology and model, and application.
They studied academic definition and innovation point extraction. However, the study on
method extraction and mining is not deep enough. The problem of using technology to construct
knowledge element description rules is not solved.
(2) In Artificial Intelligence, there are many studies on automatic rule extraction. However, the
rules are formative rules from intelligent learning and reasoning, rather than rules extracted from
original literature.
Therefore, this paper focuses on types and description rules of knowledge elements in academic
papers. We generalize types of knowledge elements and use thesemi-automatic method to construct
knowledge element description rules. The result can be considered as both theoretical foundation
and resource support.
2 Research methodology
2.1 Data and resource
We chose 17 core journals in CSSCI in the category of Library and Information Science(LIS)
and then the bibliographic data was acquired from Wanfang Data, CNKI, and CQVip. We did
data cleansing and aggregation, and then counted keywords which resulted in a LIS keyword list
containing 63,023 items. 1,302 of them were recognized as method terms. They formed a method
term list (Hua, 2013). We used the method term list to do recognition on all full-text papers
from Journal of the China Society for Scientific and Technical Information. 2,707 sentences that
describe methods were recognized from all 18,686 sentences. We then did rule recognition and
construction on these sentences.
2.2 Process and methodology
Every full-text paper was read, and method terms were used to detect sentences that contain
methods. Then a Chinese dictionary and LIS keyword list were used to do word segmentation.