Page 185 - Journal of Library Science in China, Vol.47, 2021
P. 185
184 Journal of Library Science in China, Vol.13, 2021
[18]
terms ; the representation learning method identifies sentiment word by calculating the similarity
between terms through word vectors. However, the process of Chinese word segmentation often
[19]
mis-segments domain unregistered words . In general, it is difficult for the existing technology
to achieve a relatively complete and accurate extraction of sentiment terms when targeting ancient
poetry texts with significant domain characteristics, which will greatly restrict the quality of
sentiment lexicon and the accuracy of subsequent sentiment analysis. For this reason, this paper
focuses on the effectiveness and optimization scheme of the sentiment term extraction method in
ancient poetry texts.
1.3 Methods and techniques of sentiment term extraction
At present, the research on the effectiveness of sentiment term extraction is mostly combined
[20]
[21]
with fine-grained sentiment analysis tasks such as opinion mining , attribute extraction , and
sentiment label extraction [22] . Among them, YU Shengwei et al. [23] used the sequence labeling
model to extract elements such as opinion objects and sentiment terms, and achieved good
recognition results by expanding text features. However, this research focuses more on the mining
of opinion objects, the exploration of sentiment terms is not yet in-depth. Therefore, adopting a
supervised learning model to extract sentiment terms in the field of ancient poetry can realize the
automatic sequence labeling through the experience learning of past data by the model. While
overcoming the limitations of language rules, it can also make up for the inability of statistics to
identify low-frequency terms.
[24]
CRFs is currently the most classic algorithm for sequence labeling tasks , but the shortcoming
is that there is still a lack of mature learning corpus in the field of ancient poetry. In addition, how
to effectively extract domain text features is a key issue for model optimization. In recent years, the
[25]
rise of BERT has challenged traditional machine learning algorithms. Its bidirectional language
encoding, masking and self-attention mechanism can obtain higher-quality semantic vectors , and
[26]
is suitable for mainstream deep learning models (such as BiLSTM-CRFs) , which had achieved
[27]
excellent performance in entity recognition in many fields. Furthermore, since BERT’s labeling
unit in the Chinese field is characters , it can avoid the term fragmentation problem caused by
[28]
Chinese word segmentation technology and ensure the integrity of the recognized terms. More
importantly, the character tagging model can give full play to the linguistic knowledge of Chinese
characters in domain texts through character vector mapping (Char2Vec), which provides a model
optimization idea based on Chinese character features for traditional machine learning.
To sum up, this paper will build a machine learning and deep learning model based on
character tagging to realize the extraction and application of sentiment terms in ancient poetry
and its appreciation texts, and focus on exploring the influence of Chinese language features on
the extraction of sentiment terms based on the CRFs algorithm. It lays the foundation for the
construction of domain sentiment lexicon and the term-level sentiment analysis of ancient poetry.