朝乐门,张晨,孙智中.数据科学进展:核心理论与典型实践[J].中国图书馆学报,2022,48(1):77~93
Development of Theoretical Studies and Practical Applications in Data Science
数据科学进展:核心理论与典型实践
Received:March 26, 2021  
DOI:
Key words:Data science  Core theories  Typical practices  State of the arts
中文关键词:  数据科学  核心理论  典型实践  研究现状
基金项目:本文系国家自然科学基金项目“预测性分析结果的数据故事化描述方法及关键技术”(编号:72074214)的研究成果之一
Author NameAffiliation
CHAO Lemen 数据工程与知识工程教育部重点实验室、中国人民大学信息资源管理学院 北京 100872 
ZHANG Chen 数据工程与知识工程教育部重点实验室、中国人民大学信息资源管理学院 北京 100872 
SUN Zhizhong 数据工程与知识工程教育部重点实验室、中国人民大学信息资源管理学院 北京 100872 
Hits: 839
Download times: 776
Abstract:
Since Peter Naur,a Turing Award winner,coined the term of “Data Science” in 1974,data science has been developed for nearly 50 years. In addition to the previous studies of data science,there are a series of deeper questions that need to be answered:What are the core theories and representative practices of data science? How did the theories and practices develop? What are the problems and challenges in data science research? What is the future development trend of data science? Therefore,from the two dimensions of theory and practice,the development of data science is divided into three stages:the introduction period,the growth period,and the maturity period. Further,latest achievements,main challenges,and emerging trends in data science are discussed. In the introduction period (1974-2009),the theoretical research of data science was mainly conducted by a few computer scientists and statisticians. They proposed the necessity of data science from the perspective of social development as well as the criticism of traditional disciplines and discussed some basic technologies and main research topics of data science. The quality of related theoretical studies and practical applications is pretty high though the number of them is relatively low. Most of the findings in the introduction stage have deep inferences on the evolution of data science. In the growth period (2010-2014),the number of participants in data science research and industry has grown rapidly,and their research interest exceeded the discussion of the concepts or its indispensability and has turned to explore deeper topics such as the distinct methods,technologies,tools,and applications in data science. The number of related research findings and practical applications has greatly increased,but the quality is not balanced. In the maturity period (2015-),Data science is evolving into an independent discipline from statistics as well as computer science. At this stage,the main concerns on data science not only focuses on academic research but also involves more topics such as industrial application,higher education,discipline construction,and data product development. Furthermore,the knowledge system of data science is gradually formed at this stage and people have reached a consensus on some basic issues of data science.
Besides,four typical categories of challenges in the academic research or practical application of data science are discussed. First,the studies on underline topics in data science itself are ignored;second,the discussions are not focused on emerging challenges and novel theories in data science;third,some essential topics on data science are missing;fourth,the number of practical applications in data science,which were implemented by data science theory,are small. Further,the emerging trends of data science are mainly concentrated in five aspects:to focus on distinct research questions and social challenges in the big data era,to establish the data science theoretical systems,to industrialize data science applications,to propose novel methodologies and to expend the theoretical foundations.
In conclusion,the theoretical studies and practical applications of data science have to concentrate on the unique research questions of data science itself,such as solving the data challenges in the new era,solving the contradiction between new data and old knowledge,and positioning the role of data science in the scientific knowledge system,taking advantages of data science methods and tools,enhancing the problem solving capabilities of data science,and ensuring the continues development of data science. 1 fig. 66 refs.
中文摘要:
      数据科学经历了近50年的发展与变革,对知识创新和社会进步产生了深远的影响。现阶段对数据科学的研究,除应探讨数据科学的内涵和外延之外,还需要回答一系列深层次的问题:数据科学的核心理论和代表性实践是什么,它们是如何演化而来的,仍存在的问题和挑战是什么,未来的发展趋势是什么。为此,本文从核心理论和典型实践两个维度,将数据科学的发展历程划分为萌芽期、快速发展期和逐步成熟期三个阶段并进行总结分析。现阶段数据科学的理论研究和实践应用主要存在四类问题:一是对数据科学本身的系统研究不足;二是对数据科学领域的新问题聚焦不够;三是对数据科学领域的核心问题研究不足;四是在数据科学理论的直接指导下完成的实践应用不多。对此,提出了五点对策建议:聚焦数据科学特有的新问题和主要矛盾,健全数据科学理论研究的体系,加快数据科学实践应用的产业化,推进研究方法论的突破性创新,拓展数据科学的基础理论。图1。参考文献66。
View Full Text   View/Add Comment  Download reader