Page 89 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 89

088   Journal of Library Science in China, Vol. 7, 2015



            2.2  Implementation of the Web edition of classification scheme


            The methods and techniques to implement the Web edition of CLC v4.0 in this paper consist of the
            following aspects.
              1) Preparing data source: based on the NKOS Research Office’s related research achievements
            and Knowledge Organization Standards of NLC (X. H. Zeng, 2012; J. Wang & Bu, 2012), the
            electronic documents in HTML format or CLCMARC format of the CLC v4.0 were automatically
            converted to a RDF ontology file in CNKOS format by computer programs. Taking the data
            in CLCMARC format as an example, each class entry’s MARC record was extracted from the
            CLCMARC data at first, then the record was translated to a byte array. According to the structure
            characteristic of MARC data, the Leader and Directory could be obtained in turn. Based on the
            initial position and length of every field provided by each directory item in the Directory, the
            information of each field in the data area could be obtained circularly. Then the information of each
            subfield was extracted through the subfield codes, and all messages should not be missing. Finally,
            based on the correspondence between CLCMARC format and SKOS or CKOS vocabularies, the
            CNKOS data of each class and the relationships between them were generated. The source data
            includes the information of the CCT v2.0; therefore the auto-generated CNKOS ontology holds the
            relationships between CLC and CCT, which means achieving the classification-subject integration.
              The conversion from CLCMARC format to SKOS or CKOS vocabularies requires complex
            judgment and processing. Taking field 250 as an example, its concrete conversion rules can be
            found in Table 3. In the process of conversion, the problems that need to take into consideration
            include URI solutions for the common subdivision tables and specific subdivision tables, the
            representation and handling of special notations (such as “/”, “+”, “-”), preventing the loss of
            semantic information between associated fields, completing relevant semantic data in accordance
            with the existing information, etc.
              2) Data processing: CLC ontology file is organized with the attribute-centered RDF triples, then
            it is indexed to build an indexed triples database by the Lucene full-text search engine, and the
            concrete implementation methods can be found in X. H. Zeng, H. J. Huang, and Lin (2010). The
            mapping from keywords to formal subject terms (i.e. descriptors) can be added manually from the
            Web interface or batch imported by computer programs.
              3) The development and experimental environment: all services in this paper use Eclipse
            and Tomcat development environment, where the taxonomy interface is mainly realized
            through jsp pages with jstree and ExtJs plugins; the main business functions use a Struts
            framework and are written with Java, and the system configuration, parameter info and
            classification indexing data etc. are stored in a MS SQL Server. Later in this paper, Web
            service and Linked Data service are developed using the Axis SOAP engine and the URL
            Rewrite plugin respectively.
   84   85   86   87   88   89   90   91   92   93   94