Page 182 - Journal of Library Science in China, Vol.47, 2021
P. 182

Sentiment term extraction and application of Chinese

               ancient poetry text for digital humanities


                                       〇a ∗
               ZHANG Wei, WANG Hao , DENG Sanhong & ZHANG Baolong
               School of Information Management, Nanjing University, Nanjing 210023, China



               Abstract
               Under the paradigm of interdisciplinary knowledge, the scope of digital humanities is becoming more
               extensive. It is of great significance to regain the “humanity” and “computability” characteristics of
               discipline by taking key semantic technology to parse sentiment knowledge in humanistic objects.
               Taking the ancient poetry text as example, this paper is the first time to automatically extract and
               analyze the large-scale humanistic sentiment term for Chinese poetry and appreciation. Firstly, a “Cold
               Start” automatic citation method for character sequences is proposed to obtain learning corpus. Based
               on Char2Vec, Chinese character linguistics feature (radicals, pinyin) and BERT linguistic model were
               introduced into machine learning and deep learning. And we define new term recognition rules from
               the perspective of knowledge discovery. It was found that the integration of modern appreciation into
               the ancient poetry significantly optimizes the breadth and depth of sentiment knowledge, and the
               domain terms were effectively labeled by method proposed in this paper. The trained BERT-BiLSTM-
               CRFs model outperformed CRFs model, the best F1 and F1_distinct can reach 95.63% and 85.43%.
               At the same time, the introduction of Chinese character features also improves the effect of traditional
               CRFs, with the field feature and the constraint radical feature (“shuxinpang” and “xinzidi”) are optimal.
               Compared with the long new terms extracted by machine learning, deep learning expands more new
               imagery words that repose sentiment. The sentiment term derived from poetry and appreciation provides
               reference for sentiment analysis and knowledge service of literary information resources (humanity),
               and the extraction scheme based on the linguistic knowledge provides inspiration for the deepening of
               natural language processing technology in the Chinese domain (computability).


               Keywords
               Digital humanities, Ancient poetry, Sentiment term extraction, Chinese character linguistics feature,
               Char2Vec, BERT



               0 Introduction


               Digital humanities (also called “humanities computing”) is an interdisciplinary research field
               formed by the integration of humanities knowledge, computer infrastructure, data mining
                                                   [1]
               technology, and other theories and practices . In recent years, the application of multidimensional
               digital technology in humanities has greatly expanded the knowledge system of digital
                         [2]
                                                                                           [3]
               humanities . However, the broad research scope has slightly blurred its characteristics —

               * Correspondence should be addressed to WANG Hao,Email:ywhaowang@nju.edu. cn,ORCID:0000-0002-0131-0823.
   177   178   179   180   181   182   183   184   185   186   187