Page 149 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 42
P. 149

148 Journal of Library Science in China, Vol. 8, 2016


            use decision tree generalization optimization to generate a decision tree which can concisely and
            accurately describe the knowledge learnt from neural networks. Gao Yang et al. (2003) proposed a
            self-adaption probabilistic programming rule extraction algorithm. They first acquired an optimized
            state-action log function using reinforced learning. Then beam search algorithm is adopted to learn
            knowledge that meets the condition of probabilistic programming .
              We can see from the previous research that as the development of cognitive theory and the
            improvement of natural language processing, the extraction and mining on literature full-text
            attracts more attention. The research is mainly from two areas.
              (1) Library and information Science. In this area, researchers explored and studied the problem
            from various perspectives including theory and method, technology and model, and application.
            They studied academic definition and innovation point extraction. However, the study on
            method extraction and mining is not deep enough. The problem of using technology to construct
            knowledge element description rules is not solved.
              (2) In Artificial Intelligence, there are many studies on automatic rule extraction. However, the
            rules are formative rules from intelligent learning and reasoning, rather than rules extracted from
            original literature.
              Therefore, this paper focuses on types and description rules of knowledge elements in academic
            papers. We generalize types of knowledge elements and use thesemi-automatic method to construct
            knowledge element description rules. The result can be considered as both theoretical foundation
            and resource support.


            2  Research methodology

            2.1  Data and resource


            We chose 17 core journals in CSSCI in the category of Library and Information Science(LIS)
            and then the bibliographic data was acquired from Wanfang Data, CNKI, and CQVip. We did
            data cleansing and aggregation, and then counted keywords which resulted in a LIS keyword list
            containing 63,023 items. 1,302 of them were recognized as method terms. They formed a method
            term list (Hua, 2013). We used the method term list to do recognition on all full-text papers
            from Journal of the China Society for Scientific and Technical Information. 2,707 sentences that
            describe methods were recognized from all 18,686 sentences. We then did rule recognition and
            construction on these sentences.



            2.2  Process and methodology

            Every full-text paper was read, and method terms were used to detect sentences that contain
            methods. Then a Chinese dictionary and LIS keyword list were used to do word segmentation.
   144   145   146   147   148   149   150   151   152   153   154