Page 117 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 117
116 Journal of Library Science in China, Vol. 7, 2015
2.2 Define core knowledge objects as the basis of automatic identification
After determining the basic organization model, it is necessary to collect and sort some knowledge
objects as the instances of the historical ontology, and also as the basis for automatic identification.
To do these, we extracted structured metadata from historical information resources, for example,
events, conferences, persons, documents, etc. which occur in the titles of text items. In this
way, text items can be linked with knowledge objects. Moreover, we also used existing subject
headings including person names, institution names, political parties and geography names.
Most importantly, historical experts systemized major historical events and conferences since
the founding of the People’s Republic of China, and found out the hierarchies and associations
among them. We regarded these data as core knowledge objects for further representation and
organization.
The normalized knowledge objects include 1 685 events, 761 conferences, 3 508 persons, 2 621
institutions, 155 social groups, 107 special communities, and 1 861 hierarchical relations between
events or conferences, etc. These knowledge objects are respectively populated into the ontology
as instances with URI, standard label, and alternative label, and their relationships are represented
by RDF triple statements. In next step, we used these objects as the corpus for text mining.
2.3 Implement fact discovery using text mining technologies
Focusing on the core knowledge objects, “Mining down” method is applied to historical text items
to find out their relevant facts for populating the properties and relations of their corresponding
instances. Thereby the knowledge hidden in text can be revealed and become explicit and
calculable. Since historical texts are rich in content, manual knowledge object recognition and
relationship discovery need take a lot of time and labors. To solve it, we applied text mining
technologies to realize automatic processing on text items, helping historical expert establish
semantic associations between knowledge objects.
(1) Identify knowledge objects
With the use of the knowledge object names, we implemented semantic annotation by
identifying knowledge object names or alternative names appearing in text items. In addition, we
defined named entity recognition rules to discover new knowledge objects such as time, person,
institution, conference, etc. and then recommended them to historical experts.
(2) Discover facts about knowledge objects
Furthermore, we detected the relevant facts of knowledge objects and refined fact sentences
for historical experts through automatic relation extraction. For example, a text item with the
title of “The Third Plenary Session of Eleventh Central Committee of the Communist Party of
China” in “Encyclopedia of the national history of the People’s Republic of China” describes