Page 193 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 42
P. 193

192 Journal of Library Science in China, Vol. 8, 2016


            Visual analysis and exploration of ancient texts for digital humanities
            research
            OUYANG Jian ①a *

            Digital humanity,a new research pattern, brings consequently a new way of research for traditional
            humanity and social sciences for traditional development and utilization mode of the ancient
            literature resources that no longer fit the requirements of humanity researches. This paper aims
            at the deep development and utilization of ancient literature resources by using new information
            technology and method of digital humanity with the ancient Chinese literatures as to construct a
            new platform for real-time textual statistic analysis of linguistics, studies of historical literature and
            historical geography etc.
              This study adopts a big data concept, and applies sorting and labeling to Chinese ancient texts
            for the construction of a corpus of more than 40 000 kinds of ancient texts. This study also adopts
            means of dictionary superposition of piecewise and Bigram model to carry out word segmentation
            of Chinese ancient texts and also with the application of Grubbs method for data denoising and the
            maximum elimination of problematic data. With word frequency statistical analysis as the research
            focus base on ancient corpus, we use time window unit analytical computing to analyze the word
            frequency, apply the idea of memory real-time computing to solve the bottleneck problem of
            reading big data. The results of the statistics and analysis are displayed by the micro-level scatter
            plot and the macro-level curve graph based on the time axis as the main line. With the author of
            the ancient books as the main line, we use the Geographic Information System (GIS) technology
            to integrate and display digital ancient books, and with the retrieval of the ancient literature as a
            clue to show the geographical distribution of the authors. This study improves the efficiency of
            real-time inquiry and realizes the visualization of the scatter diagram and curve graph of the word
            frequency according to the years. A statistical and analytical platform of ancient literatures and
            documents in linguistics, history and historical geography will be established based on the new
            methods and pattern.
              The study not only extends the research paradigm and method of the humanities, but also
            enriches the research tools of the humanities research. This research broadens the dimension of the
            utilization and development of ancient literature and texts, and expands the scope of humanities
            materials. The platform has a vast application prospect in linguistics, history and historical
            geography.
              This research is a new attempt in the further development and utilization of ancient texts and
            documents by means of digital humanity within the scope of big data. First of all, this study builds
            a large-scale ancient text corpus of more than 40 000 kinds of ancient books. Secondly, this study
            uses statistical methods and superposition of word segmentation method to implement word
            segmentation in ancient texts. Finally, with the help of big data technique, this study improves the
            * Correspondence should be addressed to OUYANG Jian, Email: oyjjj@163.com, ORCID: 0000-0001-5867-2852
   188   189   190   191   192   193   194   195   196   197   198