Page 192 - Journal of Library Science in China, Vol.47, 2021
P. 192

ZHANG Wei, WANG Hao, DENG Sanhong & ZHANG Baolong / Sentiment term extraction   191
                                                       and application of Chinese ancient poetry text for digital humanities


               and the sequence level likelihood function is used to calculate its maximum a posteriori probability,
               thereby improving the accuracy of tag prediction; Finally, the sequence with the highest predicted
               total score at the decoding stage is taken as the optimal output sequence.
                 (2) BERT-BiLSTM-CRFs algorithm embeds the BERT linguistic model into the BiLSTM-
               CRFs deep learning model to achieve automatic extraction of deep features. Among them, BERT
               embeds Chinese characters, sentences and positions as text, and captures the semantic features of
               Chinese characters better through Char2Vec driven neural network; The BiLSTM layer is used
               to bi-directional encode the word vectors input by the BERT layer to represent context sensitive
               semantic information; The CRFs layer can decode the semantic vector trained by the neural
               network to output the tag sequence with the highest probability, and finally obtain the character
               role space to extract emotional terms.


               2.6 Evaluation indicators for emotional terms extraction and identification rules for
               new terms

                 (1) The evaluation indicators of emotional term extraction mainly include recall rate (R),
               accuracy rate (P) and harmonic average (F measure ). The original terms are calculated as follows:
                                         2
               =  TP  , P=  TP  , F measure = (n +1)RP . FN, TP, FP represent the number of terms not recognized,
                                            2
                TP+FN     TP+FP         R+n P
               correctly recognized and incorrectly recognized respectively. In this study, R and P are considered
               equally important, so n is set as 1. However, because the duplicate items in the original terms
               will make the recognition rate unrealistically high, the distinctive term indicators, namely non-
               repetitive terms, are additionally introduced to evaluate the model’s ability to distinguish different
               terms, which can be obtained by removing duplicates of the original terms, and then calculate each
               evaluation standard.
                 (2) New term recognition rules. Different from term matching with word set, machine learning
               model can further extract new terms in the field, laying the foundation for emotional knowledge
               discovery. The author observed the relativity of the labeled and predicted sequences through the
               pre experiment, and found that: 1) The predicted sequence of the labeled sequence “悼/O/伤/O”
               is {B/E}, which can be defined as a newly discovered emotional term; 2) The prediction sequence
               marked with “凄/S/戚/O” is {B/E}, which can be defined as a new term of increment; 3) The
               emotional units labeled with the sequence “深/B/情/E/厚/S/志/O” include “深情” and “厚”, while
               the prediction sequence is {B/M/M/E}, which can be merged into the compound term “深情厚志”.
               Therefore, this paper summarizes the rule of new term recognition: for the terms extracted from
               the prediction sequence, if the first character role of the labeling sequence is B/S/O, and the last
               character role is E/S/O, and the labeling sequence is not equal to the prediction sequence, it will be
               regarded as a candidate for new terms in the field.
   187   188   189   190   191   192   193   194   195   196   197