Page 89 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 89
088 Journal of Library Science in China, Vol. 7, 2015
2.2 Implementation of the Web edition of classification scheme
The methods and techniques to implement the Web edition of CLC v4.0 in this paper consist of the
following aspects.
1) Preparing data source: based on the NKOS Research Office’s related research achievements
and Knowledge Organization Standards of NLC (X. H. Zeng, 2012; J. Wang & Bu, 2012), the
electronic documents in HTML format or CLCMARC format of the CLC v4.0 were automatically
converted to a RDF ontology file in CNKOS format by computer programs. Taking the data
in CLCMARC format as an example, each class entry’s MARC record was extracted from the
CLCMARC data at first, then the record was translated to a byte array. According to the structure
characteristic of MARC data, the Leader and Directory could be obtained in turn. Based on the
initial position and length of every field provided by each directory item in the Directory, the
information of each field in the data area could be obtained circularly. Then the information of each
subfield was extracted through the subfield codes, and all messages should not be missing. Finally,
based on the correspondence between CLCMARC format and SKOS or CKOS vocabularies, the
CNKOS data of each class and the relationships between them were generated. The source data
includes the information of the CCT v2.0; therefore the auto-generated CNKOS ontology holds the
relationships between CLC and CCT, which means achieving the classification-subject integration.
The conversion from CLCMARC format to SKOS or CKOS vocabularies requires complex
judgment and processing. Taking field 250 as an example, its concrete conversion rules can be
found in Table 3. In the process of conversion, the problems that need to take into consideration
include URI solutions for the common subdivision tables and specific subdivision tables, the
representation and handling of special notations (such as “/”, “+”, “-”), preventing the loss of
semantic information between associated fields, completing relevant semantic data in accordance
with the existing information, etc.
2) Data processing: CLC ontology file is organized with the attribute-centered RDF triples, then
it is indexed to build an indexed triples database by the Lucene full-text search engine, and the
concrete implementation methods can be found in X. H. Zeng, H. J. Huang, and Lin (2010). The
mapping from keywords to formal subject terms (i.e. descriptors) can be added manually from the
Web interface or batch imported by computer programs.
3) The development and experimental environment: all services in this paper use Eclipse
and Tomcat development environment, where the taxonomy interface is mainly realized
through jsp pages with jstree and ExtJs plugins; the main business functions use a Struts
framework and are written with Java, and the system configuration, parameter info and
classification indexing data etc. are stored in a MS SQL Server. Later in this paper, Web
service and Linked Data service are developed using the Axis SOAP engine and the URL
Rewrite plugin respectively.