化柏林.学术论文中方法知识元的类型与描述规则研究[J].中国图书馆学报,2016,42(1):30~40
Types and Description Rules of Knowledge Elements About Method in Academic Papers
学术论文中方法知识元的类型与描述规则研究
Received:December 22, 2015  
DOI:10.13530/j.cnki.jlis.160003
Key words:Academic papers  Knowledge element of method  Knowledge mining  Pattern recognition
中文关键词:  学术论文  方法知识元  知识挖掘  模式识别
基金项目:本文系中国博士后科学基金项目“学术论文中方法知识元的挖掘与发现研究”(编号:2014M560857)的研究成果之一
Author NameAffiliationE-mail
HUA Bolin 北京大学信息管理系 北京 100871 huabolin@pku.edu.cn 
Hits: 2894
Download times: 1812
Abstract:
In academic papers there are many knowledge elements about method. In order to construct a structural method knowledge base, we need to extract these elements. These elements form a key data source for a method system. The extraction of knowledge elements about methods is an important research topic in deepening knowledge organization research in the direction of finer granularity. With a knowledge base on methods, not only can we draw method tree diagram and development map, but also it can be embedded into a decision support system or an intelligent system in which it acts as a source of method selection. This will facilitate the method usage standard and development.
We select 17 LIS core journals from CSSCI and meta data of articles in these journals are downloaded from CNKI, WanFang Data and CQVip. After data fusion and cleansing, we do a statistical analysis on keywords and get a keyword term list which includes 63 203 keywords. 1 302 of these keywords are identified as method terms. Thus we have a method term list. Then method term list is used to do a full text recognition in all papers in Journal of The China Society For Scientific and Technical Information, 2012. Among all the 18 686 sentences, 2 707 are recognized as a description of a method. We do word segmentation on all these sentences. The word lists we use for word segmentation are a Chinese dictionary and a LIS domain keyword list. Both the keyword list and a domain subject term list are used for filtering domain words in these sentences. Every sentence forms a linear structure, which is a syntactic structure on method knowledge elements. For example, we have “the method of… is a ”, “the method of … has certain disadvantages” , “… method has been adopted to solve …”. After we have a list of such structures on hands, we do a manual inspection on the type of the method knowledge element. At last, we end up with a method knowledge elements rule base.
We have identified 5 types of method knowledge elements, i.e. method definition knowledge element, method relation knowledge element, method feature knowledge element, method process knowledge element and method functionality knowledge element. Method relation knowledge element includes static spatial relation and dynamic evolution temporal relation. Method feature knowledge element includes both the strength and weakness of a method. Some of the feature descriptions are discussions on a single method while others are comparisons on a one-to-one basis or a one-to-many basis.
The result shows that different types of method knowledge elements have different features and rule descriptions. Method definition and functionality knowledge elements are comparatively simple, so are the sentence rules. Most method feature knowledge elements use comparative sentence in the comparison with another method or several other methods. Their syntactic structures are comparatively complex. There are not so many static relation descriptions between methods. But the descriptions of the rule are complex. Meanwhile, there are many dynamic relation descriptions; however, the descriptions of the rule are not complex. Compared with a single sentence, a sentence group or a paragraph is more suitable in the description of method process knowledge element. It is difficult for us to construct rules for such knowledge elements. The study presented in this paper has its weakness which can be further studied. On one hand, there is no test dataset on method extraction,so it is difficult to evaluate our result. On the other hand, the rules we constructed cannot cover all the situations and the applicability of the study. In the future we will expand the scale of the raw corpus and construct a method knowledge element test dataset which can be used to evaluate the performance of method knowledge element extraction. 3 figs. 7 tabs. 17 refs.
中文摘要:
      学术论文中有很多方法知识元的描述,如何把这些方法知识元抽取出来,形成结构化的方法知识库,是细粒度知识组织的重要研究内容之一。本文通过对大量的文献进行内容分析,把方法知识元归结为方法定义知识元、方法关系知识元、方法特点知识元、方法流程知识元和方法功能知识元五种类型。对论文中关于方法描述的句子进行抽取,通过过滤句子中的领域关键词形成句子描述结构,在此基础上经过人工审核与合并归类,形成方法知识元的描述规则,为后续的方法知识元抽取提供支撑。图3。表7。参考文献17。
View Full Text   View/Add Comment  Download reader