Page 189 - Journal of Library Science in China, Vol.47, 2021
P. 189
188 Journal of Library Science in China, Vol.13, 2021
annotation {OSBME} to map the text sequence space, such as {心/O 期/S 仙/O 诀/O 意/B 无/M
穷/E}, aiming to sequence the Chinese character structure of terms through external features such
as initials, middles, endings, and extra-words.
(3) Automatic tagging of word sequences based on cold start, which parses unstructured text
into word sequences and role sequences through automatic tagging logic, including term matching
and sequence mapping. The former mainly uses the emotional word set and the cold corpus text
as the experimental input. On the basis of traversing the emotional vocabulary in the word set,
the paragraphs in the text and their initial categories (all 0), the term matching is realized through
the index of the emotional words in the text paragraph, and the text category of the successfully
matched vocabulary is marked as 1, and then the text segment and its category are output; the
latter takes the text segment and its category as the experimental input, realizes the word sequence
mapping by intercepting the single-word interval in the text segment, and then outputs the word
sequence and role Sequences, and finally automatically obtain learning corpus with emotional
annotation sets in the field of ancient poetry in a cold environment.
2.4 Involvement of Chinese language features and construction of CRFs model driven
by Char2Vec
In the initial condition, the learning corpus contains character sequences (C) and labeled role
sequences (R). The BERT linguistic model fully excavates Chinese language knowledge by
analyze Char2Vec’s single word vector. Inspired by this, whether the traditional machine learning
represented by CRF can expand the semantic feature space based on Chinese character structure
and Chinese knowledge to improve the performance of the model remains to be explored in depth.
In this regard, the author focuses on the core language features of Chinese characters, as shown in
Figure 3.
It can be seen from Figure 3 that the author has defined eight Chinese character attributes such as
radical feature, pinyin feature, emotional feature and domain feature for Chinese characters from
the linguistic perspective as extensions to optimize model learning. Their feature marks and values
are shown in Table 1.
Figure 3. Analysis of Chinese language features of ancient poetry text
(1) Chinese character feature extension. 1) Radical is the basis for the classification of Chinese