Page 195 - Journal of Library Science in China, Vol.47, 2021
P. 195
194 Journal of Library Science in China, Vol.13, 2021
As can be seen from Figure 5, after establishing domain and emotion constraints on Chinese
characters for B/P, the model performance effects both rise to a level comparable to that of
baseline. In terms of P, after the experimental tuning of the parameter comparison, the top 25
frequencies in domain Chinese pinyin as domain common pinyin (F_P) and the top 60 frequencies
in emotional Chinese pinyin as emotional common pinyin (E_P), the F1 values of the model all
exceeded baseline, which verified the positive effect of pinyin features; In addition, counting
the Chinese character radicals in the domain text and emotional word set, the top 115 domain
radicals are used as domain common radicals (F_B) and the top 20 emotional radicals are used as
emotional common radicals (E_B) after parameter adjustment, and the optimal F1 value is still
slightly inferior to baseline. In this regard, the number of extracted terms were analyzed to explore
the reason, as shown in Table 2.
Table 2. Extraction results of the number of sentiment terms based on constrained feature extension
Original evaluation criteria Discriminative evaluation criteria
Feature
Original Recognized Correct New term Original Recognized Correct New term
term term term term term term
C 63 151 62 519 60 005 796 7 045 6 409 5 628 647
F_P 63 151 62 618 60 057 865 7 045 6 560 5 682 742
E_P 63 151 62 592 60 051 893 7 045 6 584 5 698 747
F_B 63 151 62 601 60 039 894 7 045 6 598 5 702 752
E_B 63 151 62 632 60 057 912 7 045 6 608 5 706 761
As can be seen from Table 2, the model using the F/E to constrain the P/B features produced
a significant improvement in the recall of term extraction. 1) Under both evaluation metrics, the
number of correct recognition and new recognition of F_B and E_B is generally better than that of
F_P and E_P, indicating that the radical features can recall emotional terms and discover new terms
better. 2)E_B outperforms F_B in both original term recall and new term recognition, indicating
that the constraint of emotional radicals is more effective. Therefore, twenty Chinese radicals in
the E_B set were observed, and the experiment was launched with {“忄”, “心”} as the constraint
feature inspiringly. The F1 value is as high as 95.54%, jumping to the second in the overall
experiment, just after the domain feature; F1_distinct also reaches 83.69%, remarkably higher than
baseline, B, F_B and E_B, verifying the effectiveness of the radical feature. On this basis, another
18 radicals were added to the set for incremental experiments with {“忄”, “心”} as the benchmark,
and the results are shown in Figure 6.