Page 193 - Journal of Library Science in China, Vol.47, 2021

P. 193

192 Journal of Library Science in China, Vol.13, 2021

3 Experiments

In this paper, the emotional word set constructed in the field of ancient poetry is used as the term
set (40, 173 terms), and the ancient poetry text that integrates poetry and appreciation (poetry:
appreciation = 1:14) is used as the corpus. Since the terms extracted in this paper are intended to be
applied to the sentiment analysis of existing ancient poems, no external poetry corpus is introduced
in the main experiment. A total of 298, 400 emotional terms were matched by the cold-start
method, and the learning corpus was distributed in a 4:1 ratio to form a training set and a test set,
and machine learning and deep learning models were constructed using Python 3.7 tools, CRFsuite
module, and TensorFlow1.15 framework, respectively, to realize the automated extraction of large-
scale humanistic emotional terms in the field of ancient poetry.

3.1 Extraction of humanistic sentiment terms based on Chinese character feature
extension

(1) Single feature expansion. Firstly, pre-experiments are conducted based on three-character
and five-character long windows, and the former P, R, and F1 are 95.98%, 95.02%, and 95.50%,
which are higher than the latter 95.21%, 94.01%, and 94.60%, indicating that larger word length
will introduce some noise features, so the three-character code is selected and based on the Chinese
character features in Table 1, the CRFs algorithm is trained, as shown in Figure 4.

(a) original term extraction results calculation (b) discriminative term extraction results calculation
Figure 4. Sentiment term extraction result calculation based on single feature extension

In Figure 4, the set of C with any of the features in B/P/M/E/F/U/N/T represents the
experimental scheme of single-feature expansion based on word sequences. 1)The F1 value of
the original terms for baseline (C) is as high as 95.50%, which has reached a very good level,
although the F1 value of the distinguished terms is only 83.66%, indicating that the repetitive
terms optimize the model performance effect. 2)In terms of P-value, the accuracy of feature

188 189 190 191 192 193 194 195 196 197 198