Page 189 - Journal of Library Science in China, Vol.47, 2021
P. 189

188   Journal of Library Science in China, Vol.13, 2021



            annotation {OSBME} to map the text sequence space, such as {心/O 期/S 仙/O 诀/O 意/B 无/M
            穷/E}, aiming to sequence the Chinese character structure of terms through external features such
            as initials, middles, endings, and extra-words.
              (3) Automatic tagging of word sequences based on cold start, which parses unstructured text
            into word sequences and role sequences through automatic tagging logic, including term matching
            and sequence mapping. The former mainly uses the emotional word set and the cold corpus text
            as the experimental input. On the basis of traversing the emotional vocabulary in the word set,
            the paragraphs in the text and their initial categories (all 0), the term matching is realized through
            the index of the emotional words in the text paragraph, and the text category of the successfully
            matched vocabulary is marked as 1, and then the text segment and its category are output; the
            latter takes the text segment and its category as the experimental input, realizes the word sequence
            mapping by intercepting the single-word interval in the text segment, and then outputs the word
            sequence and role Sequences, and finally automatically obtain learning corpus with emotional
            annotation sets in the field of ancient poetry in a cold environment.


            2.4 Involvement of Chinese language features and construction of CRFs model driven
            by Char2Vec

            In the initial condition, the learning corpus contains character sequences (C) and labeled role
            sequences (R). The BERT linguistic model fully excavates Chinese language knowledge by
            analyze Char2Vec’s single word vector. Inspired by this, whether the traditional machine learning
            represented by CRF can expand the semantic feature space based on Chinese character structure
            and Chinese knowledge to improve the performance of the model remains to be explored in depth.
            In this regard, the author focuses on the core language features of Chinese characters, as shown in
            Figure 3.
              It can be seen from Figure 3 that the author has defined eight Chinese character attributes such as
            radical feature, pinyin feature, emotional feature and domain feature for Chinese characters from
            the linguistic perspective as extensions to optimize model learning. Their feature marks and values
            are shown in Table 1.












                            Figure 3. Analysis of Chinese language features of ancient poetry text
              (1) Chinese character feature extension. 1) Radical is the basis for the classification of Chinese
   184   185   186   187   188   189   190   191   192   193   194