Welcom to JLIS

曹树金,李洁娜,王志红.面向网络信息资源聚合搜索的细粒度聚合单元元数据研究[J].中国图书馆学报,2017,43(4):74~92

Research on the Meta-data Schema for Fine-grained Aggregation Units of Internet Resources

面向网络信息资源聚合搜索的细粒度聚合单元元数据研究

Received:March 16, 2017

DOI：

Key words:Internet resources Information aggregation Fine-grained Aggregation unit Genre analysis Meta-data

基金项目:本文系国家社会科学基金重大项目“基于特定领域的网络资源知识组织与导航机制研究”(编号:12&ZD222)的研究成果之一

Author Name	Affiliation	E-mail
CAO Shujin	中山大学资讯管理学院广东广州 510006
LI Jiena	中山大学资讯管理学院广东广州 510006
WANG Zhihong	中山大学资讯管理学院广东广州 510006	wangzh629@163.com,wangzh629@163.com

Hits: 2650

Download times: 1074

Abstract:

In the big data era,the Internet is increasingly indispensable for people to access academic or work related information. However,facing with decentralized distribution of Internet resources and lacking of in-depth description and correlation of their contents and relationships,people have to spend massive time to look through the whole search results returned and assemble the relevant information from different sources. Therefore,this paper aims to develop a meta-data schema for fine-grained aggregate units of Internet resources to reveal deeply and correlate the scattered and various kinds of information snippets,so as to meet the complex information needs of users,improve the effectiveness of retrieval and support better knowledge services.

First and foremost,this paper firstly extracted three types of free Internet resources in the field of Library and Information Science,including OA papers,online encyclopedia,and blogs. Then,a general framework to split these resources was developed from the perspectives of logical structure and formal structure of text manually. In the aspect of logical structure analysis,it was divided into four levels: chapter level which is a whole document,section level based on the chapter title given by authors,sentence group level including macro analysis and micro analysis and chart level. The components of the whole document were fragmented by macro analysis based on the genre theory. And the information snippets revealing rhetorical intentions and semantic functions were identified using micro analysis further. The relationships between aggregate units of different levels were analyzed. Moreover,characteristics and attributes of aggregate units were depicted and classified,including 14 elements of access attributes,3 elements of physical attributes and 2 elements of semantic attributes. Corresponding to the categories,a metadata schema was developed. Lastly,to examine the effectiveness of metadata schema,Access 2013 was used to design and develop a database,and five search tasks from genre level,section level,sentence group level and chart level were set up.

The research results conclude that the logical structures which are implications of the author's intention,have some similarities among different types of Internet resources if they have the same topics. It is feasible to apply the logical structures of the journal papers to other Internet genres. DC and LOM metadata frameworks can be reused in the metadata schema for fine-grained aggregate units of Internet resources,while there are special characteristics needed to be revealed. More importantly,search experiments implicate that it is effective to reveal and correlate aggregate units scattered in various sources and different granular when using the aggregated search database based on the metadata framework proposed in this paper. Aggregated search can support information aggregation and maintain at the same time the whole context of entire piece of information. Therefore,users can judge the relevance of search results more quickly and find the required content more effectively.

Via apreliminary study of metadata schema of fine-grained aggregation units,this research is a useful attempt to apply linguistic theories and methods to organization of Internet resources,and also a significant step toward the rising interdisciplinary research field.

The future researches are to improve the fine-grained aggregation units framework and metadata schema through analyzing other emerging Internet genres. Furthermore,vocabulary and syntactic features of aggregated units need to be analyzed so as to implement fine-grained aggregation search intelligently and construct knowledge repository automatically. 7 figs. 6 tabs. 58 refs.

中文摘要:

由于相关信息片段分散分布在海量且复杂多样的网络信息资源中,用户往往需要花费大量时间浏览、查询和收集所需信息。面向聚合搜索的细粒度聚合单元元数据可以深入揭示信息特征及其关联关系,促进知识发现并提升知识服务效率。因此,有必要构建细粒度聚合单元的元数据描述框架。本文以图书情报领域开放获取期刊论文、在线百科、博客等网络信息资源为数据源,采用逻辑结构分析和形式结构分析方法建立聚合单元划分框架,包括篇章层级的标题、著者等外部特征,以及节段、句群、图表单元中的话语意图和语义功能等特征；通过分析聚合单元的属性特征及复用DC、LOM元数据元素,构建描述聚合单元访问信息、物理信息和语义信息的元数据框架；设计检索数据库并采用实验法对聚合单元元数据框架进行验证。实验表明,该元数据框架可支持多类型网络信息资源、各层级细粒度聚合单元的检索,可为细粒度信息聚合与搜索提供理论基础与实践指导。图7。表6。参考文献58。

View Full Text View/Add Comment Download reader