Page 180 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 43
P. 180
180
180 Journal of Library Science in China, Vol.9, 2017
for fine-grained aggregate units of Internet resources to reveal deeply and correlate the scattered
and various kinds of information snippets, so as to meet the complex information needs of users,
improve the effectiveness of retrieval and support better knowledge services.
First and foremost, this paper firstly extracted three types of free Internet resources in the field
of Library and Information Science, including OA papers, online encyclopedia, and blogs. Then, a
general framework to split these resources was developed from the perspectives of logical structure
and formal structure of text manually. In the aspect of logical structure analysis, it was divided into
four levels: chapter level which is a whole document, section level based on the chapter title given
by authors, sentence group level including macro analysis and micro analysis and chart level. The
components of the whole document were fragmented by macro analysis based on the genre theory.
And the information snippets revealing rhetorical intentions and semantic functions were identified
using micro analysis further. The relationships between aggregate units of different levels were
analyzed. Moreover, characteristics and attributes of aggregate units were depicted and classified,
including 14 elements of access attributes, 3 elements of physical attributes and 2 elements of
semantic attributes. Corresponding to the categories, a metadata schema was developed. Lastly,
to examine the effectiveness of metadata schema, Access 2013 was used to design and develop a
database, and five search tasks from genre level, section level, sentence group level and chart level
were set up.
The research results conclude that the logical structures which are implications of the author’s
intention, have some similarities among different types of Internet resources if they have the
same topics. It is feasible to apply the logical structures of the journal papers to other Internet
genres. DC and LOM metadata frameworks can be reused in the metadata schema for fine-
grained aggregate units of Internet resources, while there are special characteristics needed to be
revealed. More importantly, search experiments implicate that it is effective to reveal and correlate
aggregate units scattered in various sources and different granular when using the aggregated
search database based on the metadata framework proposed in this paper. Aggregated search can
support information aggregation and maintain at the same time the whole context of entire piece of
information. Therefore, users can judge the relevance of search results more quickly and find the
required content more effectively.
Via apreliminary study of metadata schema of fine-grained aggregation units, this research is a
useful attempt to apply linguistic theories and methods to organization of Internet resources, and
also a significant step toward the rising interdisciplinary research field.
The future researches are to improve the fine-grained aggregation units framework and metadata
schema through analyzing other emerging Internet genres. Furthermore, vocabulary and syntactic
features of aggregated units need to be analyzed so as to implement fine-grained aggregation
search intelligently and construct knowledge repository automatically.