Page 185 - Journal of Library Science in China, Vol.47, 2021
P. 185

184   Journal of Library Science in China, Vol.13, 2021



                 [18]
            terms ; the representation learning method identifies sentiment word by calculating the similarity
            between terms through word vectors. However, the process of Chinese word segmentation often
                                              [19]
            mis-segments domain unregistered words . In general, it is difficult for the existing technology
            to achieve a relatively complete and accurate extraction of sentiment terms when targeting ancient
            poetry texts with significant domain characteristics, which will greatly restrict the quality of
            sentiment lexicon and the accuracy of subsequent sentiment analysis. For this reason, this paper
            focuses on the effectiveness and optimization scheme of the sentiment term extraction method in
            ancient poetry texts.


            1.3 Methods and techniques of sentiment term extraction


            At present, the research on the effectiveness of sentiment term extraction is mostly combined
                                                                   [20]
                                                                                      [21]
            with fine-grained sentiment analysis tasks such as opinion mining , attribute extraction , and
            sentiment label extraction [22] . Among them, YU Shengwei et al. [23]  used the sequence labeling
            model to extract elements such as opinion objects and sentiment terms, and achieved good
            recognition results by expanding text features. However, this research focuses more on the mining
            of opinion objects, the exploration of sentiment terms is not yet in-depth. Therefore, adopting a
            supervised learning model to extract sentiment terms in the field of ancient poetry can realize the
            automatic sequence labeling through the experience learning of past data by the model. While
            overcoming the limitations of language rules, it can also make up for the inability of statistics to
            identify low-frequency terms.
                                                                         [24]
              CRFs is currently the most classic algorithm for sequence labeling tasks , but the shortcoming
            is that there is still a lack of mature learning corpus in the field of ancient poetry. In addition, how
            to effectively extract domain text features is a key issue for model optimization. In recent years, the
                       [25]
            rise of BERT  has challenged traditional machine learning algorithms. Its bidirectional language
            encoding, masking and self-attention mechanism can obtain higher-quality semantic vectors , and
                                                                                      [26]
            is suitable for mainstream deep learning models (such as BiLSTM-CRFs) , which had achieved
                                                                        [27]
            excellent performance in entity recognition in many fields. Furthermore, since BERT’s labeling
            unit in the Chinese field is characters , it can avoid the term fragmentation problem caused by
                                           [28]
            Chinese word segmentation technology and ensure the integrity of the recognized terms. More
            importantly, the character tagging model can give full play to the linguistic knowledge of Chinese
            characters in domain texts through character vector mapping (Char2Vec), which provides a model
            optimization idea based on Chinese character features for traditional machine learning.
              To sum up, this paper will build a machine learning and deep learning model based on
            character tagging to realize the extraction and application of sentiment terms in ancient poetry
            and its appreciation texts, and focus on exploring the influence of Chinese language features on
            the extraction of sentiment terms based on the CRFs algorithm. It lays the foundation for the
            construction of domain sentiment lexicon and the term-level sentiment analysis of ancient poetry.
   180   181   182   183   184   185   186   187   188   189   190