Page 187 - Journal of Library Science in China, Vol.47, 2021
P. 187
186 Journal of Library Science in China, Vol.13, 2021
model to map the text into a character-role space to explore the features of Chinese characters;
finally divide the tagged corpus to start the follow-up learning assignment. 2)Machine learning
and deep learning models. In terms of machine learning, this paper introduces the language
features of Chinese characters into the feature space to train the CRFs algorithm in order to
improve the model; in terms of deep learning, it aims to integrate BERT language knowledge
to optimize the neural network, including the Char2Vec mapping deep features of domain
Chinese character, BiLSTM context information encoding and network parameter training, CRFs
decoding constraint label order and other processes. Finally, the labeled sequences are obtained
by using the model to predict the test set. 3)Digital humanities application. Map the predicted
sequence to the role-character space, extract domain terms and new terms, and integrate them
into a humanistic sentiment terminology in the field of ancient poetry, and then explore the digital
application of sentiment terminology in terms of term retrieval, granularity mining, and poet
portraits.
2.2 Data sources and preprocessing
The corpus of ancient poetry texts in this paper mainly consists of poems and their appreciation.
In Chinese poetry culture, Tang Dynasty poetry has an important influence on the politics,
people, customs and culture of the later generations [29] .This paper takes the Dictionary of
Appreciation of Tang Poetry as the source material for analysis, which is a literary research
result including the original text of ancient poetry and the analysis of modern appreciation, and
can map the implicit emotional knowledge in the poetry more comprehensively and accurately.
The digital text of this book is from the Literature 100 website (http://www.wenxue100.com), as
shown in Figure 2.
Figure 2. Corpus of ancient poetry text and terms of Emotional Knowledge (Part)
Through data cleaning and structural organization of the text, a total of 1, 374 valid poems and