Page 192 - Journal of Library Science in China, Vol.47, 2021
P. 192
ZHANG Wei, WANG Hao, DENG Sanhong & ZHANG Baolong / Sentiment term extraction 191
and application of Chinese ancient poetry text for digital humanities
and the sequence level likelihood function is used to calculate its maximum a posteriori probability,
thereby improving the accuracy of tag prediction; Finally, the sequence with the highest predicted
total score at the decoding stage is taken as the optimal output sequence.
(2) BERT-BiLSTM-CRFs algorithm embeds the BERT linguistic model into the BiLSTM-
CRFs deep learning model to achieve automatic extraction of deep features. Among them, BERT
embeds Chinese characters, sentences and positions as text, and captures the semantic features of
Chinese characters better through Char2Vec driven neural network; The BiLSTM layer is used
to bi-directional encode the word vectors input by the BERT layer to represent context sensitive
semantic information; The CRFs layer can decode the semantic vector trained by the neural
network to output the tag sequence with the highest probability, and finally obtain the character
role space to extract emotional terms.
2.6 Evaluation indicators for emotional terms extraction and identification rules for
new terms
(1) The evaluation indicators of emotional term extraction mainly include recall rate (R),
accuracy rate (P) and harmonic average (F measure ). The original terms are calculated as follows:
2
= TP , P= TP , F measure = (n +1)RP . FN, TP, FP represent the number of terms not recognized,
2
TP+FN TP+FP R+n P
correctly recognized and incorrectly recognized respectively. In this study, R and P are considered
equally important, so n is set as 1. However, because the duplicate items in the original terms
will make the recognition rate unrealistically high, the distinctive term indicators, namely non-
repetitive terms, are additionally introduced to evaluate the model’s ability to distinguish different
terms, which can be obtained by removing duplicates of the original terms, and then calculate each
evaluation standard.
(2) New term recognition rules. Different from term matching with word set, machine learning
model can further extract new terms in the field, laying the foundation for emotional knowledge
discovery. The author observed the relativity of the labeled and predicted sequences through the
pre experiment, and found that: 1) The predicted sequence of the labeled sequence “悼/O/伤/O”
is {B/E}, which can be defined as a newly discovered emotional term; 2) The prediction sequence
marked with “凄/S/戚/O” is {B/E}, which can be defined as a new term of increment; 3) The
emotional units labeled with the sequence “深/B/情/E/厚/S/志/O” include “深情” and “厚”, while
the prediction sequence is {B/M/M/E}, which can be merged into the compound term “深情厚志”.
Therefore, this paper summarizes the rule of new term recognition: for the terms extracted from
the prediction sequence, if the first character role of the labeling sequence is B/S/O, and the last
character role is E/S/O, and the labeling sequence is not equal to the prediction sequence, it will be
regarded as a candidate for new terms in the field.