Page 193 - JOURNAL OF LIBRARY SCIENCE IN CHINA 2018 Vol. 42
P. 193
192 Journal of Library Science in China, Vol. 8, 2016
Visual analysis and exploration of ancient texts for digital humanities
research
OUYANG Jian ①a *
Digital humanity,a new research pattern, brings consequently a new way of research for traditional
humanity and social sciences for traditional development and utilization mode of the ancient
literature resources that no longer fit the requirements of humanity researches. This paper aims
at the deep development and utilization of ancient literature resources by using new information
technology and method of digital humanity with the ancient Chinese literatures as to construct a
new platform for real-time textual statistic analysis of linguistics, studies of historical literature and
historical geography etc.
This study adopts a big data concept, and applies sorting and labeling to Chinese ancient texts
for the construction of a corpus of more than 40 000 kinds of ancient texts. This study also adopts
means of dictionary superposition of piecewise and Bigram model to carry out word segmentation
of Chinese ancient texts and also with the application of Grubbs method for data denoising and the
maximum elimination of problematic data. With word frequency statistical analysis as the research
focus base on ancient corpus, we use time window unit analytical computing to analyze the word
frequency, apply the idea of memory real-time computing to solve the bottleneck problem of
reading big data. The results of the statistics and analysis are displayed by the micro-level scatter
plot and the macro-level curve graph based on the time axis as the main line. With the author of
the ancient books as the main line, we use the Geographic Information System (GIS) technology
to integrate and display digital ancient books, and with the retrieval of the ancient literature as a
clue to show the geographical distribution of the authors. This study improves the efficiency of
real-time inquiry and realizes the visualization of the scatter diagram and curve graph of the word
frequency according to the years. A statistical and analytical platform of ancient literatures and
documents in linguistics, history and historical geography will be established based on the new
methods and pattern.
The study not only extends the research paradigm and method of the humanities, but also
enriches the research tools of the humanities research. This research broadens the dimension of the
utilization and development of ancient literature and texts, and expands the scope of humanities
materials. The platform has a vast application prospect in linguistics, history and historical
geography.
This research is a new attempt in the further development and utilization of ancient texts and
documents by means of digital humanity within the scope of big data. First of all, this study builds
a large-scale ancient text corpus of more than 40 000 kinds of ancient books. Secondly, this study
uses statistical methods and superposition of word segmentation method to implement word
segmentation in ancient texts. Finally, with the help of big data technique, this study improves the
* Correspondence should be addressed to OUYANG Jian, Email: oyjjj@163.com, ORCID: 0000-0001-5867-2852