Page 178 - Journal of Library Science in China, Vol.45, 2019
P. 178
177
Extended English abstracts of articles published in the Chinese edition of Journal of Library Science in China, Vol.45, 2019 177
Research and practice progress of data provenance from the perspective of
data science
WANG Fang , ZHAO Hong, MA Jiayue, LI Xiaoyang & ZHANG Xiaoyue
〇a ∗
In the data age, authenticity and reliability are the fundamental requirements of data in many fields.
It is of great research value and practical significance to realize data quality control and reliable
management through data provenance. Data provenance is not only a technical problem but also a
management problem. It should be paid more attention to by scholars in the field of data science
and information resources management.
Data provenance is widely applied in scientific data curation, e-commerce, food safety, culture
and art, medical treatment, digital library, electronic document management and many other
fields, and a lot of studies on it have been conducted. From the perspective of data science, this
paper reviews the research and practice progress of data provenance based on 136 domestic and
foreign research papers. On the basis of reviewing the concepts, models, computation methods
and practical applications of data provenance, this paper introduces the related studies and practice
in the field of information resources management. Finally, the future research trends on data
provenance are discussed.
This paper systematically combs the development of the concept of data provenance and
introduces five types of data provenance models according to their function level and application
characteristics in data management, including information description model, general expression
model, domain application model, safety management model and block-chain provenance
management model. Model is the abstract representation framework of the strategy and process
of data provenance, while computation is the technique and algorithm of its implementation. The
computation of data provenance can be divided into two basic ideas: tag based provenance and
non-tag-based provenance. Some specific computing methods have been developed for different
application scenarios. This paper mainly introduces the computing methods in typical application
scenarios such as relational database, scientific workflow, big data platform, cloud computing and
block-chain. This paper also focuses on the research and practice of data provenance in the fields
of digital library, archival information management, online information resources management,
scientific data sharing and curation as well as electronic commerce information system.
On the whole, there are still some limitations in the research on data provenance technology,
standards and specifications, information security, block-chain fusion and model extension and
verification. In the future, more in-depth research and practical exploration are needed in these
areas. This paper is expected to help scholars of information resources management and data
* Correspondence should be addressed to WANG Fang, Email: wangfangnk@nankai.edu.cn, ORCID: 0000-0002-2655-9975