Page 253 - Journal of Library Science in China, Vol.47, 2021
P. 253
252
252 Journal of Library Science in China, Vol.13, 2021
fields including the representation of assertion semantics and the realization of semantic
linkage between assertions, the existing nanopublication failed to reveal semantic features
and structure characteristics of scientific papers from multi-dimension and multi-granularity,
thus limiting its application and service. In view of this, the research reuses domain ontology,
improves nanopublication common model, proposes representation approach to specific
domain and type of scientific paper’s assertions, and conducts application practices. With a
focus on the semantic features and linkage of Chinese dissertations in information retrieval
domain, the research expands the common structure of nanopublication model, classifies
the specific assertion types, and designs description models of nanopublication for Chinese
dissertations on information retrieval. The research selects certain numbers of Chinese
dissertations on information retrieval as experiment samples, and creates RDF named graphs
and Turtle data for nanopublication. On this basis, empirical research is carried out through
case analysis and data set application in order to further verify the usability of the proposed
models.
The proposed approach to improve nanopublication and extend description models in the
research could provide reference to nanopublication’s application in specific domain and
semantic organization of Chinese dissertation. The proposed model excels in information
retrieval by revealing semantic characteristics of specific statements about experiment data
such as experiment parameter, experiment model and test collection. The model covers the core
classes of information retrieval and formalizes their relationships, which provide description
model for semantic data to automatically extract assertions and semantic relationships. By using
term recognition, entity extraction, machine learning and data cleaning, the model proposed in
this study helps the assertion extraction and automatic annotation of the Chinese dissertation,
and also provides models and methods for automatic construction of nanopublications. There
are limitations for describing specific semantics in other specific domains when applying the
model to creating nanopublications of Chinese dissertations with various structural and semantic
features.
Scientific papers in nature language are complex on content semantics. It is difficult to
identify experiment tasks and procedures, and necessary experiment assessments are also
required. Therefore, in future, it is necessary to further establish a large-scale, high-quality and
inter-linked scientific paper corpus based on innovated description model of scientific contents
to provide a data foundation for extracting and revealing assertions in scientific papers.
Scientific paper is composed of knowledge units with semantic features and logic relationships.
The future application of nanopublication in scientific papers shall focus on formal description
and semantic relationships at the fine granularity of knowledge units, with a purpose to
construct multi-level, multi-granularity and multi-dimension content datasets of scientific
papers.