Toward enhanced unsupervised clustering of 20th century Korean paintings via multimodal features

Machine Learning


Classification of Art-historical movements

A substantial body of research in computational art analysis has focused on the automatic classification of artworks according to categories such as artist, style, or genre. Several studies have specifically addressed automatic artist classification and identification1,10, style classification5,9, and genre classification17,41. Previous research on artist identification, artistic style recognition, and art movement classification has extracted a wide range of features, encompassing both handcrafted features for traditional machine learning approaches6,42 and deep features representing digitized paintings for deep learning models1,10,43.

Regarding handcrafted features, researchers have employed both low-level and high-level descriptors. Low-level features often include color and texture attributes2,44,45, whereas high-level semantic features capture compositional or contextual information46. Some studies have also combined both types of features to enhance classification accuracy5. In the case of Vincent van Gogh, for example, quantitative descriptors such as brushstroke distribution, orientation, width, length, color palette, composition, and shape have been instrumental in defining the artist’s distinctive stylistic signature and facilitating artist identification14,44,47.

Ahmed Elgammal and collaborators have further advanced this field by applying clustering techniques to group artists and artistic schools based on stylistic similarities derived from computational analyses. Their research illustrates how visual features can be quantitatively measured and interpreted to reveal meaningful structural relationships within art history. For instance, large-scale datasets of paintings analyzed through unsupervised clustering have successfully grouped artists not only by individual style but also according to broader movements such as Impressionism, Cubism, and Abstract Expressionism38,39. The results demonstrated that computationally derived clusters frequently aligned with established art-historical taxonomies, while simultaneously uncovering unexpected affinities among artists traditionally regarded as distinct. Such findings underscore the potential of machine learning to both validate existing art-historical frameworks and uncover latent stylistic connections that enrich our understanding of artistic evolution.

In addition to studies centered on feature-based style classification, such as The Shape of Art History in the Eyes of the Machine21, scholars have also explored content-based clustering approaches. Elgammal et al. demonstrated that convolutional neural networks trained solely on style labels can internally organize artworks into a continuous and historically coherent temporal sequence, thereby reconstructing stylistic movements without explicit temporal or contextual input. Their findings further indicated that the learned visual factors correspond closely to canonical art-historical distinctions, such as those proposed by Heinrich Wölfflin, and that certain artists emerge as prototypical exemplars positioned at stylistic extremes.

In Computational Analysis of Content in Fine Art Paintings40, the analytical focus shifts from style to content. This work examines the presence and distribution of objects and subjects within paintings, the reliability of automated content detection—achieving approximately 68% precision for dominant categories—and the co-occurrence patterns among subjects that define semantically meaningful relationships between content types.

Collectively, these studies demonstrate that machine learning can support clustering along multiple dimensions—both stylistic (temporal or school-based) and thematic (content-based)—thereby enriching art-historical analysis through the revelation of formal stylistic trajectories as well as interconnected thematic networks.

Focusing on classification models

Recent studies on artwork classification have primarily employed either Generative or Discriminative models, or combinations of both30,48,49. Some approaches rely exclusively on Generative models, while others focus on Discriminative frameworks. In addition, Graphical models have been investigated for representation learning, and Hypothesis Matching models have been applied for comparative data analysis.

The classification of artworks using Convolutional Neural Networks (CNNs) depends on the extraction of discriminative visual features while maintaining computational efficiency. Recent advancements in Attention Mechanisms have enhanced the ability of models to capture intricate stylistic details and to distinguish subtle variations between art styles, thereby addressing the limitations of conventional CNN architectures30,48,49. By selectively emphasizing semantically relevant image regions, attention-based approaches substantially improve classification accuracy.

The extraction of color and texture features remains fundamental for differentiating artistic styles, as distinct movements often display characteristic color palettes and brushwork patterns. Integrating these low-level features within neural network architectures improves learning efficiency and classification accuracy in automated art analysis50,51,52. The Multi-Class Kernel Method further enhances robustness, particularly in the classification of figurative styles that incorporate diverse feature components such as chromatic attributes, texture, morphology, and compositional structure. By mapping these features into high-dimensional spaces, the method captures complex non-linear relationships, thereby enabling more precise differentiation across artistic styles53.

Transfer learning has also become a critical approach, leveraging pre-trained models to enhance performance on new tasks through the reuse of knowledge from large-scale datasets. This process reduces the need for extensive computational resources while maintaining high classification accuracy48.

Large-scale open datasets have played a central role in advancing computational art analysis. WikiArt54 remains one of the most comprehensive, containing approximately 150,000 artworks from 2,500 artists8,10. Other frequently used datasets include ArtCyclopedia55, Artstor Digital Library46, BBC Painting Dataset56, Mark Harden’s Artchive5, ABC Gallery9, and Artlex & CARLI Digital Collections7. Additional image data have been collected from open-access platforms such as Wikipedia, Flickr, and online museum archives57,58.

Park et al. 59 selected images from 25 artists within the WikiArt dataset and addressed class imbalance using a weighted cross-entropy loss function. Dataset expansion was achieved through augmentation techniques, including resizing, horizontal flipping, and rotation using the Albumentations library. Contrast Limited Adaptive Histogram Equalization (CLAHE) was employed to improve contrast, and the CutMix algorithm was applied to enhance texture representation. By fine-tuning a ResNet50 model—adjusting fully connected layers and freezing convolutional weights—the authors achieved significant improvements in artist classification accuracy.

Previous research has explored the computational classification of traditional Chinese paintings (TCP)60,61,62. Li and Wang60 utilized wavelet transforms and two-dimensional Multi-Resolution Hidden Markov Models (MHMMs) to categorize Chinese ink paintings according to style and artist. Jiang et al.61 distinguished TCP images from non-TCP artworks and further classified them into Gongbi (meticulous brushwork, 1,889 images) and Xieyi (freehand) styles using low-level features—such as color, texture, and edge characteristics—combined with a hybrid classifier integrating decision trees and Support Vector Machines (SVM). Their approach achieved practically viable accuracy for the differentiation of traditional painting styles. Lu et al.⁶² developed a TCP classification framework encompassing four artistic movements (Xieyi, Gongbi, Goule, and Shese) and six painters, employing Bayesian classifiers, k-Nearest Neighbor (k-NN), fuzzy C-means clustering, and non-linear multi-class SVMs to compare classification performance across techniques.

More recent studies have increasingly emphasized improving classification accuracy through data augmentation and the use of advanced deep learning models. Baldrati et al.63 introduced a CLIP-based multimodal framework that combines textual and visual features using the NoisyArt dataset, thereby enhancing both classification and retrieval tasks. Their results demonstrated the effectiveness of multimodal learning in computational art analysis, highlighting the value of cross-modal feature integration. Zhong et al.64 proposed a Two-Channel Dual-Path Network (FPTD) incorporating RGB and brushstroke texture information to improve the fine-art painting classification process. This method employed a Gray-Level Co-Occurrence Matrix (GLCM) to extract texture features from multiple directions, achieving more accurate classification of style, artist, and genre while improving model generalization.

Kim et al.65 advanced this line of inquiry by developing a proxy-learning approach that integrates pre-trained language models with visual data for artistic style analysis. By modeling the semantic relationships between textual descriptions and visual features, their method extracts meaningful visual concepts that significantly enhance automated artwork classification and interpretation. This interdisciplinary approach extends conventional feature extraction methodologies and provides new perspectives on multimodal computational analysis in art history.

Modern and contemporary Korean artists

Prominent figures in modern and contemporary Korean art include Kim Ki-chang, Kim Whan-ki, Do Sang-bong, Park Soo-keun, Yoo Young-kuk, Lee Jung-seob, Chun Kyung-ja, Chang Uc-chin, Byun Kwan-sik, Lee Sang-beom, and Byun Jong-ha. Active throughout the twentieth century, these artists played a pivotal role in defining Korean modernism and shaping a distinct national artistic identity within the global art world. Against the backdrop of Korea’s turbulent modern history, they cultivated highly individual artistic vocabularies that collectively illustrate the evolution of Korean art. Among them, Kim Whan-ki, Park Soo-keun, Yoo Young-kuk, Lee Jung-seob, and Chang Uc-chin are widely recognized as the five leading second-generation Western-style painters and the first generation of Korean modernists66.

Kim Whan-ki (1913–1974) and Yoo Young-kuk (1916–2002) explored Korean modernity through experimental abstraction that bridged traditional aesthetics and contemporary expression. Yoo Young-kuk pursued geometric purity in abstraction, extending pre-war modernist formalism67, whereas Kim Whan-ki incorporated familiar motifs from everyday life to articulate the spiritual dimension of modern Korean experience68. Lee Jung-seob (1916–1956), characterized by a restrained palette and dynamic brushwork, depicted local emotions and resilience through recurring motifs of cows and children69. Chang Uc-chin (1917–1990) engaged with formative compositions inspired by rural life and childhood memories, while Park Soo-keun (1914–1965) portrayed the dignity of ordinary people during the Japanese colonial period and the Korean War, capturing their perseverance and humanity with what critics describe as a “sincere heart and gentle gaze”70.

Except for Park Soo-keun, the other four artists were members of the New Realism Group (Shinsasilpa), an influential collective founded in July 194771. Comprising mainly graduates of Japanese art academies, the group sought to synthesize modernist aesthetics with a renewed Korean identity, responding to the cultural and political upheavals of the post-liberation period.

Kim Ki-chang (1914–2001), an Oriental painter, modernized Joseon-era folk and genre painting through his distinctive “Foolish Painting Style,” reinterpreting traditional narratives with contemporary sensibilities72. Do Sang-bong (1902–1977) infused Korean sentiment into realist painting from the late 1920s to the 1970s, characterized by balanced composition and rich chromatic ton66. Chun Kyung-ja (1924–2015), the only female artist among the eleven, established a unique aesthetic by fusing traditional Korean color palettes with bold, expressive hues. During a period dominated by monochrome ink painting, she introduced new possibilities for chromatic painting, integrating Korean emotionality with modern aesthetics73.

Byun Kwan-sik (1899–1976) preserved the essence of Korean ink traditions while pioneering the “Sojeong Style,” distinguished by diverse ink techniques and innovative compositional layouts74. Lee Sang-beom (1897–1972), influenced by photography and Western painting since the 1920s, became a master of ink-wash landscapes. Through the development of the “Cheongjeon Style,” he transformed conceptual landscapes into realistic depictions of nature, employing mijeomjun techniques to modernize traditional sansu painting75. Byun Jong-ha (1926–2000) expanded the spatial vocabulary of Korean painting through his “three-dimensional painting” style, developed during the 1960s. By introducing depth and materiality to pictorial surfaces, his work blurred the boundaries between painting and sculpture, paving new directions in Korean modern art76.

Dataset composition

In this study, we constructed a dataset comprising 1,100 paintings from eleven representative modern and contemporary Korean artists, with 100 works per artist (Table 1). The selected artists include Kim Ki-chang, Kim Whan-ki, Do Sang-bong, Park Soo-keun, Yoo Young-kuk, Lee Sang-beom, Lee Jung-seob, Chun Kyung-ja, Chang Uc-chin, Byun Kwan-sik, and Byun Jong-ha. The research team conducted a rigorous selection process to ensure the dataset’s representativeness in terms of both stylistic diversity and historical significance within twentieth-century Korean art.

Table 1 Composition of the dataset and artist-specific representative images

The majority of images were sourced from the National Museum of Modern and Contemporary Art (MMCA), which, as of 28 October 2024, provides digital access to 11,479 images, including 3608 modern paintings. To further expand the dataset, we collected high-resolution images from publicly accessible and reputable platforms such as Google Arts & Culture and Google Search. Authenticity and data integrity were prioritized by exclusively selecting artworks published on authoritative sources that provide verified metadata, including title, creation year, and medium

For the acquisition of supplementary images, Google Search was used with the artist’s full name as a keyword and the “large size” filter enabled to obtain high-resolution results. Missing metadata (e.g., title, year of creation) were recovered through reverse image searches using Google Lens, cross-referenced with artist foundation archives and other credible institutional databases. When high-quality images were unavailable, only verified reproductions were retained to ensure both reliability and comprehensive coverage of modern and contemporary Korean paintings within the dataset.

Cultural and institutional validation

Following the passing of Lee Kun-hee, the late chairman of the Samsung Group, in 2020, his family made an unprecedented donation of over 23,000 cultural artifacts to the National Museum of Korea (NMK) and the National Museum of Modern and Contemporary Art (MMCA)—an event widely described as “the donation of the century.” Of these, 21,600 works were entrusted to the NMK and 1,488 to the MMCA77. This monumental bequest formed the basis for The Lee Kun-hee Collection: Masterpieces of Korean Art, in which the eleven artists featured in our dataset were also prominently represented78. Their inclusion in this nationally curated exhibition affirms their acknowledged status as canonical figures in modern and contemporary Korean painting, thereby minimizing potential sampling bias and reinforcing both the cultural and academic legitimacy of the dataset.

Beyond establishing cultural and institutional validation, it is equally important to strengthen the linkage between quantitative findings and art-historical interpretation. Rather than merely identifying which computational methods perform effectively for particular artists, analysis should integrate concrete visual evidence—such as close-up examinations of brushstrokes—to elucidate why these methods succeed. Such integration produces richer, more explanatory interpretations that bridge the gap between algorithmic outcomes and aesthetic understanding. This perspective aligns with prior studies such as The Shape of Art History in the Eyes of the Machine21, which demonstrated that machine-learned stylistic features correspond closely with established art-historical taxonomies, and Computational Analysis of Content in Fine Art Paintings40, which revealed meaningful co-occurrence patterns among pictorial subjects through quantitative methods.

Stylistic and temporal diversity

The dataset was designed to balance both stylistic and temporal diversity, thereby reducing sampling bias and enhancing representativeness. The selected artists encompass abstraction, realism, traditional ink painting, expressionism, and folk-inspired hybrid practices, as summarized in Table 2. This classification confirms that the corpus spans a broad spectrum of artistic practices reflective of modern and contemporary Korean art. For example, Lee Sang-beom and Byun Kwan-sik are widely recognized for mastery of traditional ink painting, Kim Whan-ki represents postwar abstract composition, and Park Soo-keun exemplifies modern realism. Moreover, the dataset extends across the twentieth century, covering early modernist tendencies as well as later experimental approaches.

Table 2 Artists classification by stylistic orientation

All works were digitized at high resolution, and image quality and authenticity were manually verified. Metadata—including artist name, year of production (when available), and artistic category (e.g., sketch, watercolor, oil, ink painting)—were documented to provide contextual information and to enable more granular computational analysis. To ensure reproducibility, we recorded the data-collection workflow and the criteria for artist and work selection. By clarifying the distribution of styles and categories, the dataset mitigates concerns about sampling bias and strengthens the reliability of subsequent analyses.

Proposed framework

This study proposes an analytical framework integrating multiple image feature extraction techniques, dimensionality reduction, and clustering to analyze artworks by modern and contemporary Korean painters (Fig. 1). The framework consists of the following stages.

Fig. 1: Overall architecture for clustering.
figure 1

The proposed framework extracts complementary features (RGB, HSV, GLCM, and CLIP) from each image, concatenates them into a unified representation, and applies dimensionality reduction (t-SNE) followed by K-means clustering to group visually and semantically similar images.

First, input data comprise images categorized by artist, with object regions cropped according to pre-existing annotations before processing. The collected artworks are then transformed into diverse visual feature vectors using multiple encoders. Four encoding methods—RGB means values79, HSV means values51, color histograms, the Gray-Level Co-occurrence Matrix (GLCM)80 for texture analysis, and CLIP embeddings81—were employed to extract complementary representations. RGB and HSV function as color spaces, GLCM captures texture, and CLIP provides high-level semantic features.

All feature types were L2-normalized and concatenated with equal weight to form a single multimodal feature vector for each image. Each vector was stored with its corresponding filename to ensure traceability and facilitate retrieval of representative or misclustered samples. Equal-weight fusion was adopted for three reasons: (1) the limited dataset size and heterogeneity could cause learned weights or attention mechanisms to overfit; (2) equal weighting maintains interpretability by transparently balancing color, texture, and semantic cues; and (3) it establishes a stable baseline for evaluating the contribution of each modality. Comparative ablation results are presented in Table 4.

Subsequently, all feature vectors were aggregated into a feature matrix representing the dataset’s overall visual characteristics. To visualize the high-dimensional structure, we applied t-SNE for dimensionality reduction, followed by K-means clustering to identify typological similarities and group-level relationships among images. Finally, to assign stylistic labels without task-specific training (e.g., realism, ink painting, abstraction), a zero-shot vision-language approach was used, matching image and text-prompt embeddings within a shared semantic space via cosine similarity.

CLIP is a vision–language model trained on large-scale image–text pairs that aligns both modalities within a shared embedding space. In our pipeline, we use the pretrained image encoder to extract semantic embeddings only, without any supervised fine-tuning82. t-SNE is a nonlinear dimensionality reduction method that preserves local neighborhood structure83. It is used to obtain a low-dimensional representation for visualization and as input for clustering. Because the procedure is unsupervised, labels are attached post hoc by majority mapping within each cluster, and we report clustering accuracy rather than supervised classification accuracy. This capability underpins the zero-shot style assignment described earlier.

Upon image input, feature vectors are generated through four modules—RGB, HSV, GLCM, and CLIP. The RGB module extracts color composition through red, green, and blue channels, while the HSV module computes mean and standard deviation values for each color component, reflecting edge count, dark-pixel ratio, symmetry, and average values in hue, saturation, and value spaces. GLCM, a statistical texture analysis technique, characterizes textures by quantifying the frequency of specific pixel-pair occurrences at predefined spatial relationships, thereby enabling detailed statistical measurement of texture. Lastly, CLIP captures semantic associations between text and images by encoding each image into an embedding vector and linking it with textual descriptions. The model calculates cosine similarity between image and text embeddings, selecting the most relevant description. We build upon this capability to assign style labels through zero-shot matching of image and text-prompt embeddings via cosine similarity, without any additional training. In this study, pretrained CLIP encoders are employed to improve feature representation and ensure consistency across the dataset.

The clustering module employs K-means clustering on low-dimensional feature vectors projected by t-SNE, automatically grouping modern and contemporary Korean paintings by typological similarity. Feature vectors from all four modules are concatenated into a unified vector per image and vertically stacked across the dataset to construct a feature matrix. This high-dimensional matrix is reduced to two dimensions—x and y axes—through t-SNE, enabling visual representation of feature space. By integrating four complementary modules, the matrix effectively captures both color and texture attributes, aiding clustering and evaluation through majority-label assignment. The resulting two-dimensional vectors, which preserve essential visual features, serve as inputs for the clustering process.

Subsequently, K-means clustering partitions the dataset into 11 × 20 initial clusters, a value of K chosen to capture the diversity of styles and evolving techniques among the eleven modern and contemporary Korean painters represented. These initial clusters maximize granularity by identifying fine stylistic variations at early stages. Post-processing refinement was then applied by analyzing cluster sizes, removing smaller or redundant clusters, and retaining only the upper half to enhance interpretability. Final clustering was performed using the centroids of the retained clusters, followed by reapplying K-means to produce the final grouping.

The clustering results were evaluated by selecting representative images for each cluster and analyzing the distribution of ground-truth labels (painter names) to assess cluster purity and mean accuracy. The dominant ground-truth label within each cluster was assigned as its representative label, quantifying the degree to which the clusters corresponded to individual painters’ stylistic tendencies. The outcomes were visualized using representative images and captions, providing interpretable insights into the model’s ability to capture stylistic groupings.

In addition, clusters were displayed with circle colors denoting ground-truth labels and outline colors representing clustering results, allowing intuitive verification of label alignment. This visualization approach supported qualitative evaluation of stylistic coherence and misclassification patterns. The hierarchical clustering strategy thus balanced the capture of artistic diversity during the initial partitioning stage with interpretability in subsequent refinements. Given that modern and contemporary Korean painters exhibit both distinctive personal styles and intra-artist variations across periods and subjects, this multistage clustering methodology effectively accommodated the dataset’s heterogeneity and stylistic complexity.

For zero-shot style assignment, we constructed label-specific prompt sets in both English and Korean, incorporating descriptors related to medium, brushwork, composition, and degrees of abstraction or expression. For each style label, multiple text prompts were encoded using the CLIP text encoder. The resulting prompt embeddings were L2-normalized and averaged to form a single prototype representation per label. Style prediction was performed by computing cosine similarity between the image embedding and each label prototype, followed by a softmax operation across labels to generate probabilistic style assignments. The complete prompt sets employed for each style category are summarized in Table 3.

Table 3 Prompts used for automatic image captioning



Source link