Page 191 - Journal of Library Science in China, Vol.47, 2021
P. 191
190 Journal of Library Science in China, Vol.13, 2021
characters in Chinese dictionaries. Compared with Chinese characters, radical is a more fine-
grained representation of Chinese semantics; 2) Pinyin is the pronunciation component of Chinese
characters, in which pronunciation also has strong emotional color; 3) The emotional feature is
the emotional tendency of Chinese characters. The emotional word set constructed in this paper
can be used as the knowledge element to distinguish the commonly used emotional character
levels, which can be used as the basis for judging the emotional tendency of Chinese characters;
4) The morpheme feature refers to the part of speech tendency of Chinese characters in a specific
context; 5) Domain features refer to the classification of commonly used Chinese characters in
domain corpora; 6) Common character features refer to the classification of common Chinese
characters; 7)The surname feature is used to judge the relationship between Chinese characters
and surname characters; 8) Classification features are the abstract basis for Chinese character
building. On the basis of this corpus, the author introduces the characteristics of the above
Chinese characters as an experimental comparison to explore the robustness of the expansion
of Chinese character language features. It should be noted that the feature B/P/M/U/N/T is
obtained by external dictionary matching, while E/F is obtained by word frequency statistics and
subsequent experimental tuning.
(2) Feature space constraints. The benchmark model based on character sequence is formed
by adding all Chinese character features to the initial observation sequence. There are horizontal
and vertical constraints in the machine learning process. In the feature space example in Table 1,
the former (horizontal dotted line) refers to the word length window, and there are two common
schemes: three characters and five characters; The latter (vertical dotted line) aims to constrain the
feature combination of observation sequence. For this reason, the author obtains multiple feature
sequences by positive superposition on the basis of single features, records the single features that
exceed the benchmark model results as positive features, and combines and retrains the positive
feature sets obtained. If the combined test results exceed the positive features, continue to expand
the combined feature dimensions, so as to achieve multi-level progression. The horizontal and
vertical sequences are the training results based on the context information and language feature
constraint model, and the specific effect of the model needs to be demonstrated by subsequent
experiments.
2.5 Emotional term extraction algorithm based on machine learning and deep learning
model
Under the guidance of Char2Vec of BERT model, this paper constructs CRFs machine learning
model and BERT-BiLSTM-CRFs deep learning model incorporating Chinese character language
features. Its core algorithm is as follows.
(1) CRFs algorithm. For the input word sequence, output the role sequence and sequence score;
Then, all sequence paths are normalized to obtain the probability distribution of the role sequence,