Page 117 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 117

116   Journal of Library Science in China, Vol. 7, 2015



            2.2  Define core knowledge objects as the basis of automatic identification


            After determining the basic organization model, it is necessary to collect and sort some knowledge
            objects as the instances of the historical ontology, and also as the basis for automatic identification.
            To do these, we extracted structured metadata from historical information resources, for example,
            events, conferences, persons, documents, etc. which occur in the titles of text items.  In this
            way, text items can be linked with knowledge objects. Moreover, we also used existing subject
            headings including person names, institution names, political parties and geography names.
            Most importantly, historical experts systemized major historical events and conferences since
            the founding of the People’s Republic of China, and found out the hierarchies and associations
            among them. We regarded these data as core knowledge objects for further representation and
            organization.
               The normalized knowledge objects include 1 685 events, 761 conferences, 3 508 persons, 2 621
            institutions, 155 social groups, 107 special communities, and 1 861 hierarchical relations between
            events or conferences, etc. These knowledge objects are respectively populated into the ontology
            as instances with URI, standard label, and alternative label, and their relationships are represented
            by RDF triple statements. In next step, we used these objects as the corpus for text mining.



            2.3  Implement fact discovery using text mining technologies

            Focusing on the core knowledge objects, “Mining down” method is applied to historical text items
            to find out their relevant facts for populating the properties and relations of their corresponding
            instances. Thereby the knowledge hidden in text can be revealed and become explicit and
            calculable. Since historical texts are rich in content, manual knowledge object recognition and
            relationship discovery need take a lot of time and labors. To solve it, we applied text mining
            technologies to realize automatic processing on text items, helping historical expert establish
            semantic associations between knowledge objects.
               (1) Identify knowledge objects
               With the use of the knowledge object names, we implemented semantic annotation by
            identifying knowledge object names or alternative names appearing in text items. In addition, we
            defined named entity recognition rules to discover new knowledge objects such as time, person,
            institution, conference, etc. and then recommended them to historical experts.
               (2) Discover facts about knowledge objects
               Furthermore, we detected the relevant facts of knowledge objects and refined fact sentences
            for historical experts through automatic relation extraction. For example, a text item with the
            title of “The Third Plenary Session of Eleventh Central Committee of the Communist Party of
            China” in “Encyclopedia of the national history of the People’s Republic of China” describes
   112   113   114   115   116   117   118   119   120   121   122