Page 99 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 99
098 Journal of Library Science in China, Vol. 7, 2015
strict data cleaning was carried out on journal titles. The following steps of data cleaning were
implemented in sequence. By classifying titles according to their original language, Chinese
periodicals’ English titles were replaced by their original Chinese names. Titles whose original
language is neither English nor Chinese were translated to their English title. Titles with special
characters and unverifiable ISSN were processed by manual work. English words were unified as
half-width words and uppercased. The “&” in titles were replaced by “AND”. If titles begin with
“THE”, then the “THE” and the following blank space after them were deleted. Series titles were
unified to standard format artificially. Duplication of journal name was converted to the form like
“title - explanation” (such as Injury - International Journal of the Care of the Injured) and “title -
publication place” (such as Oncology - New York). All the non-alphabet-characters were substituted
by “-” conditioning that just one “-” lying between English words, and no “-” starting or ending
the titles. Finally, the table of sample citations was constructed including four fields as title, ISSN,
eISSN (should this field was missing be noted as “null”), and publishing year.
1.2 Methodology
Citation checking (sometimes referred to as “citation checklist” or “citation-based checking”) is
a method that can bridge the gap between users’ needs and collection development of the library.
Scholars regarded citation checking as a simple, economical (Mosher, 1984), reliable and valid
(Sylvia, 1998) approach which is “really appropriate for the evaluation of collections intended
to support research” (Lancaster, 1988). It is a widely-used method described by Nisonger (1983)
as “the method can be used to assess the collections of a single library or group of libraries, a
discipline or subdiscipline, or a single format or unit of a collection”.
In practice, checking the holdings against checklist is a time-consuming work because of reference
source type difference, publishing diversity, catalogue irregularity, accessibility disparity, citation format
variety and many other negative factors. In order to overcome these shortcomings, sampling techniques
with an aim of reduction in quantity (Lopez, 1983) or with a minimum threshold (Gao & Yu, 2005)
were utilized by researchers commonly. Nisonger (1980) argued that the result may vary much with
different sampling techniques. Huang, Yu, and Lai (2014) researched on a large sample with setting a
lower threshold. Their results could be better convinced if they had a procedure of data cleaning. Based
on the data cleaning of title, Yang (2007) proposed a simplified expression as “title + publishing year”
when matching the library catalogue with the sample. This pioneering study may be more reasonable if
Yang had considered the situation of title reduplication, alteration, and irregularity of journals.
Summing up the above, the authors put forward a matching format with three (or four) fields
as “title + ISSN (or eISSN) + publishing year”. This matching format conforms to the library
collection management system (taking ExLibris as example), whose publishing date were scaled
by year. Furthermore, this matching format also reduces labor costs of publishing regularity