Page 183 - Journal of Library Science in China, Vol.47, 2021
P. 183
182 Journal of Library Science in China, Vol.13, 2021
“humanity” and “computability”. The former emphasizes the humanistic emotion and connotation
of thinking inherited from cultural resources, while the latter advocates the deep calculation of data
to mine the core knowledge implicit in humanistic materials. Since text is the fundamental carrier
of humanistic knowledge, textual knowledge mining driven by semantic parsing technology and
humanistic emotion is more and more important.
Among Chinese cultural heritages, ancient poetry, which occupies an extremely important literary
position, contains the ancient people’s knowledge of things such as political background, historical
events and folk customs, and covers a wealth of emotional knowledge. At present, the emotional
connotation of domain-specific texts has not been effectively mined and utilized, so it is with
the knowledge organization of humanistic emotional information, whose sources and structures
vary, from the perspective of semantic association. In this regard, some scholars have attempted
[4]
to classify the emotional polarity of ancient poems automatically , but chapter-level polarity
analysis is still insufficient for the mining of fine-grained emotional knowledge. At present, there
is no perfect emotion dictionary in the field of Chinese ancient poetry, and it is hard to say that the
[5]
emotion terms constructed by the existing Chinese emotion dictionaries is complete or accurate .
In order to achieve a more accurate sentiment analysis of ancient poetry, it is necessary to first
realize the automatic extraction of sentiment terms in domain-specific texts and their performance
optimization, where the sentiment terms involved are sentiment words with humanistic
connotations in the domain of ancient poetry, including Chinese single words, words and phrases.
The key problems include: 1) Since the texts of ancient poems are mostly original Chinese texts
based on single characters, the highly condensed style limits the scale of term extraction, the
sentiment granularity and the learning effect of text features, so it is necessary to extend the content
of the ancient poetry text; 2) the large-scale term extraction is mainly realized by machine learning,
but the lack of labeled corpus in the ancient poetry domain makes it difficult to start the learning
task; 3) The rise of BERT has challenged the traditional ML algorithms represented by CRFs, but
its word vector mapping (Char2Vec) enlightens the latter to introduce linguistic features on the
structure and content of Chinese characters to optimize the extraction effect of sentiment terms.
Based on the core emotional content (humanity) in the ancient poetry text, this study adopts key
semantic techniques (computability) in information extraction. On the one hand, we introduce a
modern appreciation method. We integrate the appreciation text into the poetry it evaluates, and
extend the emotional knowledge in the original text through the appreciation text to form the
ancient poetry text within the scope of this study. On the other, we propose a method for automatic
annotation of word sequences to obtain the learning corpus in a cold environment (no learning
corpus). On this basis, linguistic knowledge in the domain of ancient poetry is integrated into the
CRFs ML model to focus on the effectiveness of linguistic features of Chinese characters on the
extraction of sentiment terms. We also compare this method with DL model BERT-BiLSTM-CRFs,
and finally generalize the optimal model for the automated extraction and application of large-scale
sentiment terms in ancient poetry texts.