Page 173 - Journal of Library Science in China, Vol.47, 2021
P. 173
172 Journal of Library Science in China, Vol.13, 2021
Secondly, at the detection and extraction level of archival data objects, this study mainly applies
image recognition and natural language processing technology based on deep learning to complete
this task. In the image recognition task, the image entity detection of WU Baokang’s photos in
various periods is carried out, and the entity type in the photo is determined. This section focuses
on the identification of WU Baokang and his group photo in the photo, and re-annotates important
people such as WU Baokang and Селезнев. In addition, representative physical entities with
narrative significance in the image, such as the Shanghai HSBC Building and the Information
Building of Renmin University of China, are also the contents of the image detection and entity
labeling tasks.
In terms of natural language processing of archival data, we identify named entities one by
one in each volume of WU Baokang’s Academic Chronicle based on the Bert model, focusing
on exploring the co-occurrence between WU Baokang and other named entities such as person,
time, place, and document, extracting semantic relationships between entities and storing them in
association with verbs as the core. At the same time, the mutual verification between text and photo
content is carried out based on time nodes, and the semantic connotation of photos is enriched by
the semantic relationship contained in text data, forming a knowledge-level fusion of dual-modal
data, so as to provide necessary support for the in-depth interpretation and context recognition of
photo files, as shown in Figure 7.
Figure 7. Entity recognition results of WU Baokang’s Academic Chronicle
Finally, at the level of context recognition of WU Baokang’s archival data, the preliminary
research mainly adopts manual identification methods to define the relevant context content of
photo archives at two levels: internal and external. Specifically, the team selected five doctoral and
master’s students with archival research backgrounds and mastery of archival research methods to
refer to International Council on Archives Records in Contexts Ontology (ICA RiC-O) proposed