Page 205 - Journal of Library Science in China, Vol.47, 2021
P. 205

204   Journal of Library Science in China, Vol.13, 2021



            5 Conclusion


            Based on the “humanistic” and “computational” characteristics of digital humanities, this paper
            proposes an automatic tagging method of character sequences based on “cold start” in view of the
            lack of learning corpus of ancient poetry and appreciation texts, which introduces Chinese language
            knowledge into machine learning and deep learning models, and realizes automatic extraction of
            large-scale emotional terms and humanistic applications. The research found that domain, emotion,
            radical, pinyin and other Chinese character features in machine learning effectively improved the
            model effect. Domain features and radical constraint features based on “vertical heart side” and “heart
            character bottom” are the best, while the performance of deep learning is better, especially the recall
            effect of differentiated terms. In addition, compared with the long new terms extracted by machine
            learning, the deep learning expanded more new ideographic words that repose emotional knowledge,
            and formed 14, 599 distinctive terms after integration with domain terms, laying a foundation for the
            construction of the emotional dictionary and emotional analysis in the field of ancient poetry.
              There are still some aspects of this study that can be deepened. Firstly, the emotional word set
            used in automatic tagging corpus is a limited scale knowledge base, which is difficult to avoid
            inaccurate and insufficient tagging. Incremental iteration and remote knowledge base inspection
            will be introduced to optimize it later; Secondly, the expansion of radical features in this paper
            is only a preliminary exploration. In the future, we will try to organize knowledge about radicals
            and verify their effectiveness in feature expansion based on different semantic connotations of
            radicals; Thirdly, emotional terms include many categories, such as “joy”, “anger”, “sorrow”,
            “thinking”, “praise”, “irony” and “surprise”, and so on, and will be more abundant in the field of
            ancient poetry. This paper focuses on the breakthrough of extracting the overall emotional terms
            of ancient poetry texts from 0 to 1 under the “cold environment”, and does not subdivide the
            emotional categories. Subsequent research will focus on the automatic classification of emotional
            terms and be devoted to the construction of emotion dictionary in the field of ancient poetry and
            the application of emotion analysis.


              Acknowledgements: This article is an outcome of the project “Research on Semantic Parsing
            and Humanities Computing of Chinese Intangible Cultural Heritage Text Driven by Linked
            Data”(No.72074108) supported by the National Natural Science Foundation of China and the
            project “The Semantic Analysis and Knowledge Graph Research of Local Chronicle Text Oriented
            to Humanistic Computing”(No.010814370113) supported by the Fundamental Research Funds for
            the Central Universities.


            Reference


            [1 ] POOLE A H, GARWOOD D A. Interdisciplinary scholarly collaboration in data-intensive, public-
   200   201   202   203   204   205   206   207   208   209   210