Page 99 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2015 Vol. 41
P. 99

098   Journal of Library Science in China, Vol. 7, 2015



            strict data cleaning was carried out on journal titles. The following steps of data cleaning were
            implemented in sequence. By classifying titles according to their original language, Chinese
            periodicals’ English titles were replaced by their original Chinese names. Titles whose original
            language is neither English nor Chinese were translated to their English title. Titles with special
            characters and unverifiable ISSN were processed by manual work. English words were unified as
            half-width words and uppercased. The “&” in titles were replaced by “AND”. If titles begin with
            “THE”, then the “THE” and the following blank space after them were deleted. Series titles were
            unified to standard format artificially. Duplication of journal name was converted to the form like
            “title - explanation” (such as Injury - International Journal of the Care of the Injured) and “title -
            publication place” (such as Oncology - New York). All the non-alphabet-characters were substituted
            by “-” conditioning that just one “-” lying between English words, and no “-” starting or ending
            the titles. Finally, the table of sample citations was constructed including four fields as title, ISSN,
            eISSN (should this field was missing be noted as “null”), and publishing year.


            1.2  Methodology


            Citation checking (sometimes referred to as “citation checklist” or “citation-based checking”) is
            a method that can bridge the gap between users’ needs and collection development of the library.
            Scholars regarded citation checking as a simple, economical (Mosher, 1984), reliable and valid
            (Sylvia, 1998) approach which is “really appropriate for the evaluation of collections intended
            to support research” (Lancaster, 1988). It is a widely-used method described by Nisonger (1983)
            as “the method can be used to assess the collections of a single library or group of libraries, a
            discipline or subdiscipline, or a single format or unit of a collection”.
              In practice, checking the holdings against checklist is a time-consuming work because of reference
            source type difference, publishing diversity, catalogue irregularity, accessibility disparity, citation format
            variety and many other negative factors. In order to overcome these shortcomings, sampling techniques
            with an aim of reduction in quantity (Lopez, 1983) or with a minimum threshold (Gao & Yu, 2005)
            were utilized by researchers commonly. Nisonger (1980) argued that the result may vary much with
            different sampling techniques. Huang, Yu, and Lai (2014) researched on a large sample with setting a
            lower threshold. Their results could be better convinced if they had a procedure of data cleaning. Based
            on the data cleaning of title, Yang (2007) proposed a simplified expression as “title + publishing year”
            when matching the library catalogue with the sample. This pioneering study may be more reasonable if
            Yang had considered the situation of title reduplication, alteration, and irregularity of journals.
              Summing up the above, the authors put forward a matching format with three (or four) fields
            as “title + ISSN (or eISSN) + publishing year”. This matching format conforms to the library
            collection management system (taking ExLibris as example), whose publishing date were scaled
            by year. Furthermore, this matching format also reduces labor costs of publishing regularity
   94   95   96   97   98   99   100   101   102   103   104