Page 238 - Journal of Library Science in China, Vol.47, 2021
P. 238
237
Extended English abstracts of articles published in the Chinese edition of Journal of Library Science in China, Vol.47, 2021 237
literature metrology research fields, for example, research on scientific literature structure and
scientific knowledge structure based on knowledge clustering, research on the citation behavior of
scientists, extension of the concept of triangular citation in literature (keywords triangular citation,
author triangular citation, journal triangular citation, subject triangular citation, polygonal citation),
etc.
Typical responsibilities, key qualifications and higher education for data
scientist
CHAO Lemen , XIAO Jiwen & WANG Xiedong
〇a*
A survey for collecting data scientist job announcements from Indeed, LinkedIn and Baidu
Baipin is conducted, and 206 typical cases are selected for the study, which involves 8 countries,
including China, the United States, the United Kingdom, Germany, Canada, Japan, Australia,
and South Korea. Then, the key qualifications as well as typical responsibilities of data scientists
are described via utilizing cluster analysis and opinion mining to provide a basis for the training
of data scientists, especially the construction of data science and big data technology major.
The qualifications for data scientists can be divided into two categories: data science-specific
qualifications and general purpose-oriented ones. Data science-specific qualifications include SQL
programming, Python/R/SAS, and Hadoop MapReduce/HBase/Hive, Spark/Storm, Visual Analysis
with Tableau, ETL, Data Warehouse/Data Lake/BI, Statistics, Machine learning (including
deep learning), Natural Language Processing, Text Analysis, and Computer Vision. General
purpose-oriented qualifications mainly involve the candidate’s readiness of communication and
cooperation, problem-solving, 3C characteristic of data scientists, independent learning, attention
to detail, stress management, and leadership skills. The main responsibilities of data scientists
include designing data-centric solutions, finding valuable insights from massive data, developing
algorithms/models for specific businesses, hypothesis testing and experimental design, data
governance and data quality control, R&D of data products, the innovation of traditional data-
based products, as well as participation in the whole data process, cross-department/domain
cooperation. Besides, personal charisma, experiences of participating in big data competitions and
open source communities, the quality of full-stack data scientists, mathematics and programming
capabilities, user-centered design methods, and humanistic issues including big data privacy
protection, have an important influence on the core competencies of data scientists. At the
same time, emerging business requirements such as data storytelling, causality analysis, real-
time flow processing, and deployment/production model, will become novel topics of emerging
qualifications of data scientists in the future. The main implications of this study for the data
* Correspondence should be addressed to CHAO Lemen, Email:chaolemen@pku.org.cn, ORCID:0000-0001-8218-0348.