沈思,李成名,吴鹏.基于时态语义的Web信息检索实践进展与研究综述[J].中国图书馆学报,2018,44(4):109~129
Review of Web Information Retrieval Based on Temporal Semantics
基于时态语义的Web信息检索实践进展与研究综述
Received:October 12, 2017  Revised:March 20, 2018
DOI:
Key words:Information retrieval  Temporal semantic  Retrieval model  Timestamp  Temporal ranking model
中文关键词:  信息检索  时态语义  检索模型  时间戳  时间排序模型
基金项目:本文系国家自然科学基金项目“基于时间感知模型的学术主题检索与演化挖掘研究”(编号:71503124)的研究成果之一
Author NameAffiliationE-mail
SHEN Si 南京理工大学经济管理学院信息管理系 江苏 南京 210094 sszcgfss@gmail.com 
LI Chengming 南京师范大学文学院硕士研究生 江苏 南京 210000  
WU Peng 南京理工大学经济管理学院信息管理系 江苏 南京 210094  
Hits: 2483
Download times: 1352
Abstract:

    Web information retrieval based on temporal semantics is widely applied in the retrieval scenarios of dynamic temporal information mining, collective memory and temporal question answering system. By fully utilizing the temporal information contained in the Web documents, we cannot only retrieve the set of texts related to specified lexical semantics at a certain time point but also identify the pattern of theme evolution of contents in each independent information carrier over time. A total of 94 studies related to this topic are retrieved and analyzed. The Chinese search words are “时态语义”, “时间信息检索”, “时间信息抽取”, “检索模型”, and “深度学习”; the English search words are “temporal semantic”, “temporal information retrieval”, “temporal information extraction”, “retrieval model” and “deep learning”. The databases searched are Wanfang Database, CNKI, PQDD ProQuest Digital Dissertation, ISI Web of Knowledge, Elsevier ScienceDirect, Academic Source Premier EBSCO, LISA, SpringerLink and EI Engineering Village 2. The extraction and labeling of text's temporal information and the retrieval model incorporated with temporal information are used. Existing studies are classified by comparison and content analysis and are reviewed in terms of each step of information retrieval in the narrow sense. Related researches in three aspects, namely, implicit temporal intention analysis in information requirement, construction of retrieval model incorporated with temporal factors, and promotion of retrieval result generation by temporal information are comprehensively reviewed under the framework of temporal semantics technology. Firstly, basic researches on automatic extraction of text's temporal information and retrieval model incorporated with temporal information are reviewed. A special focus is placed on studies related to temporal perception, retrieval and cognition, temporal query intent algorithms, methods to determine the temporal dimension of literature, temporal-based text similarity computing, and temporal ranking model. We also summarize four major future research directions in the field of Web information retrieval based on temporal semantics: multi-dimension time expression extraction, acquisition of implicit temporal intent, similarity computing in deep level temporal retrieval model, and temporal intent classification based on deep learning. The major limitation of the present review is that besides the overall analysis of theories on Web information retrieval based on temporal semantics, there is also a need for systematic review of relevant theories and concepts. A systemic and comprehensive review of existing studies about Web information retrieval based on temporal semantics from the perspective of key technology is not only beneficial for knowing the research progress and status quo of this field but also helpful to grasping the future development trend of temporal retrieval in the era of big data and artificial intelligence. Such review can help us develop ways to incorporate the technique, methodology and concept of deep learning into information retrieval based on temporal semantics. Originality of this paper is manifested in two aspects. Firstly, key technology involved in Web information retrieval based on temporal semantics is taken as the entry point. Then we perform a systematic and comprehensive review on studies related to information extraction, organization, mining and presentation based on temporal semantics. Secondly, the future development trend of information retrieval based on temporal semantics is pointed out after literature review and a brief analysis on the latest development in this field is carried out. 91 refs.

中文摘要:
      基于时态语义的Web信息检索在动态时间信息挖掘、群体记忆、时间问答系统等检索情景中具有相对广泛的应用。在大数据和人工智能迅猛发展的大趋势下,对基于时态语义的Web信息检索从关键技术的角度进行系统而全面的综述,不仅有利于了解该领域研究的整体状况,而且有益于把握时态检索的未来发展趋势。本文在引入文本时间信息的抽取和标注并融合时间信息的检索模型的基础上,以时态语义的技术为整体脉络,从三个方面综述研究情况:信息需求中隐含的时间意图分析,加入时间因素的检索模型构建,时间对提升检索结果的生成。以时态语义检索的本源问题和其在学术文献上的相关应用为切入点,提出时态语义检索未来的发展趋势:识别多源异构信息下的时间表达,构建能识别查询的时间预测模型,搭建能精准检索时间意图的检索平台和开发基于深度学习的隐含时间意图自动分类模型。参考文献91。
View Full Text   View/Add Comment  Download reader