Page 178 - Journal of Library Science in China, Vol.45, 2019
P. 178

177
                           Extended English abstracts of articles published in the Chinese edition of Journal of Library Science in China, Vol.45, 2019  177






               Research and practice progress of data provenance from the perspective of
               data science
               WANG Fang , ZHAO Hong, MA Jiayue, LI Xiaoyang & ZHANG Xiaoyue
                        〇a ∗
               In the data age, authenticity and reliability are the fundamental requirements of data in many fields.
               It is of great research value and practical significance to realize data quality control and reliable
               management through data provenance. Data provenance is not only a technical problem but also a
               management problem. It should be paid more attention to by scholars in the field of data science
               and information resources management.
                 Data provenance is widely applied in scientific data curation, e-commerce, food safety, culture
               and art, medical treatment, digital library, electronic document management and many other
               fields, and a lot of studies on it have been conducted. From the perspective of data science, this
               paper reviews the research and practice progress of data provenance based on 136 domestic and
               foreign research papers. On the basis of reviewing the concepts, models, computation methods
               and practical applications of data provenance, this paper introduces the related studies and practice
               in the field of information resources management. Finally, the future research trends on data
               provenance are discussed.
                 This paper systematically combs the development of the concept of data provenance and
               introduces five types of data provenance models according to their function level and application
               characteristics in data management, including information description model, general expression
               model, domain application model, safety management model and block-chain provenance
               management model. Model is the abstract representation framework of the strategy and process
               of data provenance, while computation is the technique and algorithm of its implementation. The
               computation of data provenance can be divided into two basic ideas: tag based provenance and
               non-tag-based provenance. Some specific computing methods have been developed for different
               application scenarios. This paper mainly introduces the computing methods in typical application
               scenarios such as relational database, scientific workflow, big data platform, cloud computing and
               block-chain. This paper also focuses on the research and practice of data provenance in the fields
               of digital library, archival information management, online information resources management,
               scientific data sharing and curation as well as electronic commerce information system.
                 On the whole, there are still some limitations in the research on data provenance technology,
               standards and specifications, information security, block-chain fusion and model extension and
               verification. In the future, more in-depth research and practical exploration are needed in these
               areas. This paper is expected to help scholars of information resources management and data


               * Correspondence should be addressed to WANG Fang, Email: wangfangnk@nankai.edu.cn, ORCID: 0000-0002-2655-9975
   173   174   175   176   177   178   179   180   181   182   183