Page 238 - Journal of Library Science in China, Vol.47, 2021
P. 238

237
                           Extended English abstracts of articles published in the Chinese edition of Journal of Library Science in China, Vol.47, 2021  237


               literature metrology research fields, for example, research on scientific literature structure and
               scientific knowledge structure based on knowledge clustering, research on the citation behavior of
               scientists, extension of the concept of triangular citation in literature (keywords triangular citation,
               author triangular citation, journal triangular citation, subject triangular citation, polygonal citation),
               etc.



               Typical responsibilities, key qualifications and higher education for data
               scientist

               CHAO Lemen , XIAO Jiwen & WANG Xiedong
                         〇a*
               A survey for collecting data scientist job announcements from Indeed, LinkedIn and Baidu
               Baipin is conducted, and 206 typical cases are selected for the study, which involves 8 countries,
               including China, the United States, the United Kingdom, Germany, Canada, Japan, Australia,
               and South Korea. Then, the key qualifications as well as typical responsibilities of data scientists
               are described via utilizing cluster analysis and opinion mining to provide a basis for the training
               of data scientists, especially the construction of data science and big data technology major.
               The qualifications for data scientists can be divided into two categories: data science-specific
               qualifications and general purpose-oriented ones. Data science-specific qualifications include SQL
               programming, Python/R/SAS, and Hadoop MapReduce/HBase/Hive, Spark/Storm, Visual Analysis
               with Tableau, ETL, Data Warehouse/Data Lake/BI, Statistics, Machine learning (including
               deep learning), Natural Language Processing, Text Analysis, and Computer Vision. General
               purpose-oriented qualifications mainly involve the candidate’s readiness of communication and
               cooperation, problem-solving, 3C characteristic of data scientists, independent learning, attention
               to detail, stress management, and leadership skills. The main responsibilities of data scientists
               include designing data-centric solutions, finding valuable insights from massive data, developing
               algorithms/models for specific businesses, hypothesis testing and experimental design, data
               governance and data quality control, R&D of data products, the innovation of traditional data-
               based products, as well as participation in the whole data process, cross-department/domain
               cooperation. Besides, personal charisma, experiences of participating in big data competitions and
               open source communities, the quality of full-stack data scientists, mathematics and programming
               capabilities, user-centered design methods, and humanistic issues including big data privacy
               protection, have an important influence on the core competencies of data scientists. At the
               same time, emerging business requirements such as data storytelling, causality analysis, real-
               time flow processing, and deployment/production model, will become novel topics of emerging
               qualifications of data scientists in the future. The main implications of this study for the data


               * Correspondence should be addressed to CHAO Lemen, Email:chaolemen@pku.org.cn, ORCID:0000-0001-8218-0348.
   233   234   235   236   237   238   239   240   241   242   243