Experimental results
In this experiment, we tested the performance of the PNASNet-CBAM model using known emotion data from 1914 Dunhuang mural figures. The evaluation metrics included accuracy, precision, recall, as well as F1 and F2 scores, and a systematic evaluation was conducted on the labeled test set. Based on the evaluation outcomes, the model underwent multiple optimizations, including adjustments to the network structure, an increase in the number of training epochs, and the use of more advanced feature extraction techniques. The model was then trained using the cross-entropy loss function and the Adam optimizer, ultimately achieving satisfactory experimental results. The detailed experimental outcomes are presented in three sections: accuracy data, ablation study results, and a performance comparison with other mainstream models.
Table 2 presents the performance of the PNASNet-CBAM model across eight emotion categories, including sample size, accuracy, recall, and precision metrics. The overall performance of the model is outstanding, with particularly exceptional performance in the emotion categories of ‘happiness,’ ‘fear,’ and ‘sadness.’ With a recall of 0.932 and an F1 score of 0.866 for ‘happiness,’ the model effectively captures positive emotion features. The recall for ‘fear’ is 0.919, with an F1 score of 0.829, and the recall for ‘sadness’ is 0.890, with an F1 score of 0.855, demonstrating the model’s fine-grained recognition of negative emotions. Overall, the total recall is 0.892 and the accuracy is 0.762, indicating minimal false negatives and the model’s comprehensive coverage of emotional features. Even in more complex emotion classifications, such as ‘anger,’ ‘disgust,’ and ‘neutral,’ the F1 scores remain above 0.76, reflecting the model’s strong stability and generalization ability. However, some categories exhibit variations, such as the lower precision of the ‘surprise’ category, with a precision of 0.552 and an F1 score of 0.706, indicating a higher misclassification rate. Additionally, the newly introduced ‘contempt’ category has an F1 score of 0.773, demonstrating the model’s adaptability to extended emotion categories.
Result analysis
The performance of the PNASNet-CBAM model remained stable in the classification task across the eight emotion categories, particularly in “happiness,” “fear,” “sadness,” and “neutral.” Both recall and F1 scores remained high, demonstrating that the model was effective in recognizing the emotional features of these categories. Specifically, the overall recall of 0.892 reflects the model’s significant advantage in controlling false negatives. However, the model demonstrated weaker recognition capabilities for the “contempt” and “disgust” categories. The model’s performance in recognizing complex or emotionally ambiguous categories was moderate. We hypothesize that this may be due to the subtle and difficult-to-distinguish visual features of these emotions, which complicate the extraction of their characteristics. Additionally, the imbalance in emotion categories could have influenced the model’s performance. For example, the “fear” category had fewer samples in the training set, which resulted in inadequate generalization ability for the model in these categories.
This experiment provides crucial data support for the enhancement of the facial emotion recognition task for mural figures, but there remains potential for further refinement in terms of false positives and class imbalance. Future work will focus on increasing the sample size to improve the recognition performance of categories with small sample sizes, refining the emotion classification categories, and optimizing feature extraction techniques. With these improvements, the PNASNet-CBAM model is expected to maintain high accuracy and robustness across all eight emotion categories of mural figures in the future.
Ablation experiment
To evaluate the contribution of key modules in the PNASNet-CBAM model to emotion recognition, we constructed a detailed evaluation based on a confusion matrix, as shown in Fig. 6(a) illustrates the results of the model with the CBAM module, while Fig. 6(b) shows the results without it. In the confusion matrix, the vertical axis corresponds to the actual labels, while the horizontal axis corresponds to the predicted labels. TP denotes true positive samples correctly predicted as positive, while FN denotes positive samples incorrectly predicted as negative. According to Tables 3 and 4, the statistical data before and after the addition of the CBAM module show that the total number of positive samples is TP + FN = 935 + 136 = 1,071, with FN accounting for 136, or 12.7%. This indicates that, without the CBAM module, the model has a higher false negative rate, with only 87.3% of positive samples correctly identified. The sparsity in the lower half of the figure reflects the model’s shortcomings in capturing detailed features. After adding the CBAM module, the TP value for positive samples increased from 935 to 955, and the FN value decreased to 116, resulting in a reduction of the false negative rate from 12.7% to 10.8%. This demonstrates that the CBAM module effectively enhanced the model’s ability to recognize positive samples. Additionally, the increased density in the lower half of the confusion matrix further validates that the CBAM module strengthens key feature extraction through its attention mechanism, improving the model’s ability to capture subtle emotional features in the mural figures.

Confusion matrices of the PNASNet model with (a) and without (b) the CBAM module, showing improved emotion recognition performance with CBAM.
The comparison between panels (a) and (b) in Fig. 6 clearly illustrates the improvement in the overall model performance achieved by the CBAM module, as reflected in several key metrics. After incorporating the CBAM module, the model’s accuracy increased from 71.89% to 76.16%, recall improved from 87.3% to 89.2%, and the F1 score rose from 0.777 to 0.812. This experiment unequivocally demonstrates that the CBAM module significantly enhances the model’s ability to recognize facial expressions by optimizing the attention distribution across channels and spatial features. It reduces misclassification, especially in complex backgrounds, thus further enhancing overall performance and providing more reliable support for emotion recognition in Dunhuang mural figures.
Comparison with other advanced peer methods
In this study, we conducted an evaluation of the emotion recognition network by incorporating various mainstream models, including LENET, ALEXNET, RESNET50, RESNET101, DENSENET, VGG16, and VGG19. The experimental results reveal that these models exhibit distinct performance characteristics in the emotion recognition task. Notably, our self-developed network model achieved an accuracy of 76.16%, substantially outperforming all the comparative networks. This result not only validates the effectiveness of the designed network model but also provides a significant theoretical and practical foundation for further research in emotion recognition, demonstrating the potential for optimizing model performance within a deep learning framework. Figure 7 presents the accuracy comparison between the PNASNet-CBAM model and seven other related mainstream models.

Comparison of the accuracy of the PNASNet-CBAM model with seven other mainstream models.
Historical evolution of emotional characteristics in Dunhuang Mogao Grotto mural figures
By classifying and analyzing the emotions of characters in the murals, we traced the features and trends that developed over time. As shown in Table 5, based on the Complete Collection of Dunhuang Murals of China, we divided the 2248 Dunhuang murals into six historical periods in chronological order. For each of these six dynasties, we performed emotion classification on all the mural figures and analyzed their emotional characteristics and trends across different historical periods. Since multiple characters with the same emotional category may appear in each mural, we counted each emotion category only once per mural, meaning we counted the number of murals in which a given emotion category appeared.
Experimental results based on the PNASNet-CBAM model (Fig. 8) show significant differences in classification accuracy of emotional expressions in Dunhuang mural figures across different historical periods, reflecting a regular pattern of emotional expression as history progressed. The following sections provide a detailed analysis of the emotional evolution data and classification accuracy, focusing on key historical periods.

Trends in the classification accuracy of the eight emotion categories for Dunhuang mural characters.
From the late 4th century to the late 5th century (Northern Liang to Western Wei), Dunhuang murals were primarily religious in theme, with neutral emotions being the most prominent feature (accounting for 46.75%). Classification accuracy reached 73.77%, and the overall style was characterized by solemnity and dignity. During this period, Buddhism had just been introduced to Central China, and rulers used Buddhism to consolidate their power. The murals served to promote religious ideas, with figures predominantly exhibiting detached and peaceful neutral emotions.
In the mid-6th century (Northern Zhou), Dunhuang murals continued with religious themes but began to incorporate more humanized emotions. The proportion of happy emotions increased significantly (28.66%), making the figures more vivid. The localization of Buddhism accelerated, and the fusion of Confucianism, Buddhism, and Taoism gradually shifted the emotional expression of the murals from the earlier monotony and solemnity to lively and dynamic emotional depictions.
The late 6th century to early 7th century (Sui Dynasty) saw a significant increase in happy emotions (accounting for 33.46%), with classification accuracy improving to 75.29%. During the Sui Dynasty, which unified China, the mural style embodied an uplifting and optimistic spirit. Notably, the compassionate smiles in the depictions of Bodhisattvas and flying deities became a central expression of happiness, symbolizing social prosperity and the people’s aspirations for a better life.
From the early 7th century to the early 10th century (Tang Dynasty), emotional expressions in Dunhuang murals became more diverse, with surprise (11.28%) and anger (9.31%) emerging as prominent features. The classification accuracies for these emotions were 79.07% and 76.11%, respectively. This period, marked by the cultural prosperity of the Tang Dynasty, saw Buddhist art reach its golden age. Murals emphasized personalized and dramatic expressions, portraying intense emotions of surprise and anger through religious narratives, which enhanced their narrative depth and emotional impact.
In the 10th century (Five Dynasties to Northern Song), emotional expressions in Dunhuang murals returned to a more stable tone, with neutral emotions rising to 49.72%, and classification accuracy reaching 77.83%. This shift reflects the social turmoil following the fragmentation of the Five Dynasties and the conservatism in artistic creation. The murals predominantly depicted calm, peaceful emotions, in line with the social desire for stability and redemption.
From the 11th century to the mid-14th century (Western Xia to Yuan Dynasty), emotional expressions in Dunhuang murals became more complex, with sadness (15.85%) and contempt (8.45%) reaching historical highs. The classification accuracy for sadness also reached its peak during this period (73.63%). This period saw intensified ethnic conflicts, and the role of Buddhism in moral instruction became more pronounced. The prominence of sadness and contempt in the murals reflected societal suffering and discontent.
The evolution of emotional expression in Dunhuang mural figures is closely tied to historical and cultural contexts. The changes in emotional characteristics across different periods not only reflect the evolution of Buddhist art but also serve as an effective tool for studying social psychological traits. Applying emotion recognition technology to the study of Dunhuang murals not only aids in understanding the evolution of art but also has significant practical implications for the preservation and transmission of cultural heritage.
