Page 191 - Journal of Library Science in China, Vol.47, 2021
P. 191

190   Journal of Library Science in China, Vol.13, 2021



            characters in Chinese dictionaries. Compared with Chinese characters, radical is a more fine-
            grained representation of Chinese semantics; 2) Pinyin is the pronunciation component of Chinese
            characters, in which pronunciation also has strong emotional color; 3) The emotional feature is
            the emotional tendency of Chinese characters. The emotional word set constructed in this paper
            can be used as the knowledge element to distinguish the commonly used emotional character
            levels, which can be used as the basis for judging the emotional tendency of Chinese characters;
            4) The morpheme feature refers to the part of speech tendency of Chinese characters in a specific
            context; 5) Domain features refer to the classification of commonly used Chinese characters in
            domain corpora; 6) Common character features refer to the classification of common Chinese
            characters; 7)The surname feature is used to judge the relationship between Chinese characters
            and surname characters; 8) Classification features are the abstract basis for Chinese character
            building. On the basis of this corpus, the author introduces the characteristics of the above
            Chinese characters as an experimental comparison to explore the robustness of the expansion
            of Chinese character language features. It should be noted that the feature B/P/M/U/N/T is
            obtained by external dictionary matching, while E/F is obtained by word frequency statistics and
            subsequent experimental tuning.
              (2) Feature space constraints. The benchmark model based on character sequence is formed
            by adding all Chinese character features to the initial observation sequence. There are horizontal
            and vertical constraints in the machine learning process. In the feature space example in Table 1,
            the former (horizontal dotted line) refers to the word length window, and there are two common
            schemes: three characters and five characters; The latter (vertical dotted line) aims to constrain the
            feature combination of observation sequence. For this reason, the author obtains multiple feature
            sequences by positive superposition on the basis of single features, records the single features that
            exceed the benchmark model results as positive features, and combines and retrains the positive
            feature sets obtained. If the combined test results exceed the positive features, continue to expand
            the combined feature dimensions, so as to achieve multi-level progression. The horizontal and
            vertical sequences are the training results based on the context information and language feature
            constraint model, and the specific effect of the model needs to be demonstrated by subsequent
            experiments.


            2.5 Emotional term extraction algorithm based on machine learning and deep learning
            model

            Under the guidance of Char2Vec of BERT model, this paper constructs CRFs machine learning
            model and BERT-BiLSTM-CRFs deep learning model incorporating Chinese character language
            features. Its core algorithm is as follows.
              (1) CRFs algorithm. For the input word sequence, output the role sequence and sequence score;
            Then, all sequence paths are normalized to obtain the probability distribution of the role sequence,
   186   187   188   189   190   191   192   193   194   195   196