Page 205 - Journal of Library Science in China, Vol.47, 2021
P. 205
204 Journal of Library Science in China, Vol.13, 2021
5 Conclusion
Based on the “humanistic” and “computational” characteristics of digital humanities, this paper
proposes an automatic tagging method of character sequences based on “cold start” in view of the
lack of learning corpus of ancient poetry and appreciation texts, which introduces Chinese language
knowledge into machine learning and deep learning models, and realizes automatic extraction of
large-scale emotional terms and humanistic applications. The research found that domain, emotion,
radical, pinyin and other Chinese character features in machine learning effectively improved the
model effect. Domain features and radical constraint features based on “vertical heart side” and “heart
character bottom” are the best, while the performance of deep learning is better, especially the recall
effect of differentiated terms. In addition, compared with the long new terms extracted by machine
learning, the deep learning expanded more new ideographic words that repose emotional knowledge,
and formed 14, 599 distinctive terms after integration with domain terms, laying a foundation for the
construction of the emotional dictionary and emotional analysis in the field of ancient poetry.
There are still some aspects of this study that can be deepened. Firstly, the emotional word set
used in automatic tagging corpus is a limited scale knowledge base, which is difficult to avoid
inaccurate and insufficient tagging. Incremental iteration and remote knowledge base inspection
will be introduced to optimize it later; Secondly, the expansion of radical features in this paper
is only a preliminary exploration. In the future, we will try to organize knowledge about radicals
and verify their effectiveness in feature expansion based on different semantic connotations of
radicals; Thirdly, emotional terms include many categories, such as “joy”, “anger”, “sorrow”,
“thinking”, “praise”, “irony” and “surprise”, and so on, and will be more abundant in the field of
ancient poetry. This paper focuses on the breakthrough of extracting the overall emotional terms
of ancient poetry texts from 0 to 1 under the “cold environment”, and does not subdivide the
emotional categories. Subsequent research will focus on the automatic classification of emotional
terms and be devoted to the construction of emotion dictionary in the field of ancient poetry and
the application of emotion analysis.
Acknowledgements: This article is an outcome of the project “Research on Semantic Parsing
and Humanities Computing of Chinese Intangible Cultural Heritage Text Driven by Linked
Data”(No.72074108) supported by the National Natural Science Foundation of China and the
project “The Semantic Analysis and Knowledge Graph Research of Local Chronicle Text Oriented
to Humanistic Computing”(No.010814370113) supported by the Fundamental Research Funds for
the Central Universities.
Reference
[1 ] POOLE A H, GARWOOD D A. Interdisciplinary scholarly collaboration in data-intensive, public-