Page 123 - Journal of Library Science in China, Vol.45, 2019
P. 123

122   Journal of Library Science in China, Vol.11, 2019



            2.2  Data acquisition


            The metadata information of books and the metadata information of citing literatures of books
            in this paper were collected from Amazon.cn and Baidu Scholar , respectively . Currently,
                                                   〇a ①
                                                                                 〇c ③
                                                                    〇b ②
            mainstream Chinese full-text databases, such as CNKI , WanFang  and WeiPu , missed parts of
                                                                             〇f ⑥
                                                         〇d ④
                                                                   〇e ⑤
            literatures. Compared with using a single Chinese full-text database as the retrieval entrance, Baidu
            Scholar is more likely to cover the information of all citation literatures. In order to find all citation
            literature information of books, this study took Baidu Scholar as the retrieval entrance, and used
            the metadata information of Chinese books as the retrieval keywords to obtain the citing literature
            information of books. To identify book disciplines, we firstly matched the first-class category
            of Chinese books provided by Amazon and the Chinese discipline category. Meanwhile, we
            considered differences between natural disciplines and humanities and social disciplines. Finally,
            we identified five disciplines, including computer science, law, literature, medicine and sport
            science. We then obtained the citation content corpus of Chinese books from the full-text databases
            through the following two steps.
               (1) We selected the books based on three rules: 1) More than 1 review in Amazon.cn; 2) More
            than 1 citation in Baidu Scholar; 3) Must contain tables of contents. We obtained 6,006 books in
            the five disciplines.
               (2) In order to ensure the accuracy of the citation contents, we obtained the citation sentence and
            their contexts (i.e. the former two sentences and the latter two sentences of the citation content)
            of these books by manual annotation. Meanwhile, we considered the high cost of the manual
            method, and the distribution differences of the 6,006 books. For example, citations were cited
            more often between 0 and 5 times, while relatively few citations were cited more than 15 times.
            Hence, to make the citation content data more representative, we extracted books in each citation
            interval according to the distribution proportion of book citations (i.e., the distribution ratios of
            6,006 books in each citation interval). For selecting data, we analyzed the distribution of citations,
            and finally selected 0-5, 6-10, 11-15, 16-20 and more than 20 as the intervals. Since the full-
            texts of some literatures cannot be obtained and there were no citation marks in the full texts, we
            finally obtained 399 Chinese books and their citation contents in the citing literatures. The specific
            citation distributions of books are shown in Table 1.









            ① Available at: https://www.amazon.cn/.
            ② Available at: http://xueshu.baidu.com/.
            ③ The data collection was completed in November 2016.
            ④ Available at: http://www.cnki.net/.
            ⑤ Available at: http://www.wanfangdata.com.cn.
            ⑥ Available at: http://www.cqvip.com/.
   118   119   120   121   122   123   124   125   126   127   128