Page 182 - Journal of Library Science in China, Vol.47, 2021
P. 182
Sentiment term extraction and application of Chinese
ancient poetry text for digital humanities
〇a ∗
ZHANG Wei, WANG Hao , DENG Sanhong & ZHANG Baolong
School of Information Management, Nanjing University, Nanjing 210023, China
Abstract
Under the paradigm of interdisciplinary knowledge, the scope of digital humanities is becoming more
extensive. It is of great significance to regain the “humanity” and “computability” characteristics of
discipline by taking key semantic technology to parse sentiment knowledge in humanistic objects.
Taking the ancient poetry text as example, this paper is the first time to automatically extract and
analyze the large-scale humanistic sentiment term for Chinese poetry and appreciation. Firstly, a “Cold
Start” automatic citation method for character sequences is proposed to obtain learning corpus. Based
on Char2Vec, Chinese character linguistics feature (radicals, pinyin) and BERT linguistic model were
introduced into machine learning and deep learning. And we define new term recognition rules from
the perspective of knowledge discovery. It was found that the integration of modern appreciation into
the ancient poetry significantly optimizes the breadth and depth of sentiment knowledge, and the
domain terms were effectively labeled by method proposed in this paper. The trained BERT-BiLSTM-
CRFs model outperformed CRFs model, the best F1 and F1_distinct can reach 95.63% and 85.43%.
At the same time, the introduction of Chinese character features also improves the effect of traditional
CRFs, with the field feature and the constraint radical feature (“shuxinpang” and “xinzidi”) are optimal.
Compared with the long new terms extracted by machine learning, deep learning expands more new
imagery words that repose sentiment. The sentiment term derived from poetry and appreciation provides
reference for sentiment analysis and knowledge service of literary information resources (humanity),
and the extraction scheme based on the linguistic knowledge provides inspiration for the deepening of
natural language processing technology in the Chinese domain (computability).
Keywords
Digital humanities, Ancient poetry, Sentiment term extraction, Chinese character linguistics feature,
Char2Vec, BERT
0 Introduction
Digital humanities (also called “humanities computing”) is an interdisciplinary research field
formed by the integration of humanities knowledge, computer infrastructure, data mining
[1]
technology, and other theories and practices . In recent years, the application of multidimensional
digital technology in humanities has greatly expanded the knowledge system of digital
[2]
[3]
humanities . However, the broad research scope has slightly blurred its characteristics —
* Correspondence should be addressed to WANG Hao,Email:ywhaowang@nju.edu. cn,ORCID:0000-0002-0131-0823.