Page 198 - Journal of Library Science in China, Vol.47, 2021
P. 198

ZHANG Wei, WANG Hao, DENG Sanhong & ZHANG Baolong / Sentiment term extraction   197
                                                       and application of Chinese ancient poetry text for digital humanities


                 From Figure 7, we can see that: 1)On the whole, as the number of migrations increases, the
               model performs better and better in sentiment term extraction, and exceeds the effect of baseline
               in different criteria. 2)Analysis of the original term extraction results shows that: in terms of
               P-value, baseline always remains the highest; in terms of R-value, it has been higher than baseline
               since TL1; in terms of F1-value, the model surpasses baseline from TL4 and then remains stable
               at a high level (95.63%). Since then, the deep learning model has significantly outperformed the
               machine learning model in terms of the level of raw term extraction. 3)In terms of R-value, the
               model’s TL1 result is much better than CF, rising by 7.36% to 88.35%, breaking all records in
               this paper’s experiments; in terms of F1-value, the model still outperforms CF at TL1 with the
               offsetting P-value and R-value, and steadily improves to TL6. CF, and steadily improved to TL6
               (85.43%). 4)TL5 reached the peak F1 value under different criteria, while TL6 results were slightly
               inferior compared to TL5, which indicated that the model had an overfitting trend, and therefore
               TL5 was the optimal model.
                 (3) Variability analysis of different ancient poetry text corpus term extraction. In this paper,
               the test results of the optimal model TL5 are split, from which the poetic and appreciation
               sequences are extracted separately and then various metrics are calculated; in addition, to verify
               the effectiveness of the corpus in this paper, the author retrains the deep learning model on a new
               training set constructed from 17365 external poems and tests it on the split poetic sequences to
               achieve the optimal extraction results through twenty rounds of iterations, as shown in Table 5.


               Table 5. Calculation of the extraction results of affective terms for different ancient poetry text corpus
                                               Original evaluation criteria  Discriminative evaluation criteria
                   Train set     Test set                    New term                   New term
                                            P/%   R/%   F1/%           P/%   R/%   F1/%
                                                              number                     number
                Poetry+appreciation Poetry+appreciation 95.92  95.35  95.63  941  81.94%  89.23% 85.43%  836
                Poetry+appreciation  Appreciation  95.85  95.26  95.56  888  82.32%  89.34% 85.68%  805

                Poetry+appreciation  Poetry  96.69  96.25  96.46  53  90.74%  92.61% 91.66%  38
                Poetry+appreciation  Poetry  98.47  98.14  98.31  27  95.81%  94.18% 94.99%  25


                 From Table 5, we can see that: 1)the original terms in the poetry and appreciation texts are
               similar, and both maintain good recognition efficiency; while for the distinguishing terms, the
               effect of the poetry is significantly better than that of the appreciation, and the difference is mainly
               in the accuracy rate (8.42%), which indicates that the condensed nature of the poetry corpus is
               more advantageous for the distinguishing terms. 2)Compared with the corpus of this paper, the
               model training of external poetry achieves better accuracy, which shows that the more accurate
               corpus environment is more suitable for the extraction of emotion terms in poetry; the shortcoming
               is that the new term recognition effect is slightly inferior, which indicates that the richer text
   193   194   195   196   197   198   199   200   201   202   203