Page 197 - Journal of Library Science in China, Vol.47, 2021
P. 197
196 Journal of Library Science in China, Vol.13, 2021
overfitting of the model caused by insufficient or excessive training rounds, I set the number of
training rounds for a single model to 10, and then calculate the recognition effect of each migration
by multiple training rounds, and then generalize the domain optimal model, as shown in Table 3.
Table 3. Hyperparameter configuration of BERT-BiLSTM-CRFs model training
Hyperparameter name Hyperparameter value Hyperparameter name Hyperparameter value
batch_size 64 dropout_rate 0.5
max_seq_length 128 lstm_size 128
learning_rate 2.00E-05 num_train_epochs 10
(1) Char2Vec training based on BERT model. Firstly, the learning corpus is further segmented,
and 1% of the overall corpus is extracted from the training set as the validation set. Then, a pre-
training model provided by Google is used as the initial migration model. Finally, Char2Vec is
implemented by word embedding of the corpus in this paper and fine-tuning of the pre-training
model, and the results are shown in Table 4.
Table 4. Word embedding of BERT-based ancient poetry text corpus (example)
tokens 纵 横 计 不 就 , 慷 慨 志 犹 存 …
input_ids 101 5288 3566 6369 679 2218 117 2724 2717 2562 4310 …
input_mask 1 1 1 1 1 1 1 1 1 1 1 …
segment_ids 0 0 0 0 0 0 0 0 0 0 0 …
label_ids 8 7 2 4 7 2 4 7 2 4 4 …
(2) Emotional term extraction results are calculated. Using the machine learning optimal model
CF as the baseline, the BERT-BiLSTM-CRFs deep learning model is trained and tested using the
classifier trained by each migration, and the results are shown in Figure 7.
(a) calculation of original term extraction results (b) calculation of differentiated term extraction results
Figure 7. Emotional term extraction results calculation based on BERT-BiLSTM-CRFs