Python libraries for AI/ML models can be contaminated by metadata • The Register

Machine Learning


A vulnerability in the popular AI and ML Python library used in the Hugging Face model, which has been downloaded tens of millions of times, allows remote attackers to hide malicious code in metadata. The code is then automatically executed when a file containing the tainted metadata is loaded.

The open source libraries – NeMo, Uni2TS, and FlexTok – were created by Nvidia, Salesforce, and Apple, respectively, in collaboration with the Swiss Federal Institute of Technology’s Visual Intelligence and Learning Lab (EPFL VILAB).

All three libraries use Hydra, another Python library that is managed by Meta and commonly used as a configuration management tool for machine learning projects. Specifically, this vulnerability involves Hydra’s instantiate() function.

Palo Alto Networks’ Unit 42 discovered the security flaw and reported it to library management. Administrators have since issued security warnings, fixes, and in two cases CVEs. Threat hunters say they have not seen any exploits of these vulnerabilities so far, but “there is ample opportunity for attackers to exploit them.”

“It’s common for developers to create their own variations of state-of-the-art models with various tweaks and quantizations, often by researchers not affiliated with reputable institutions,” Unit 42 malware research engineer Curtis Carmony said in an analysis Tuesday. “An attacker can simply create a modification of an existing popular model with real or claimed benefits and add malicious metadata.”

Additionally, Hugging Face does not have access to metadata content as easily as other files, and does not flag files that use the Safe Tensor or NeMo file formats as potentially unsafe.

Hugging Face’s models use over 100 different Python libraries, about 50 of which use Hydra. “While these formats themselves may be secure, the code that utilizes them has a very large attack surface,” Carmony wrote.

register contacted Hugging Face along with the library maintainers (Meta, Nvidia, Salesforce, and Apple) and received only one response. A Salesforce spokesperson said, “We proactively fixed this issue in July 2025, and there is no evidence of unauthorized access to customer data.”

We will update this story if we hear back from other companies.

hydra

As previously mentioned, this vulnerability is related to how NeMo, Uni2TS, and FlexTok use the hydra.utils.instantiate() function to load configurations from model metadata, which allows remote code execution (RCE).

The authors or maintainers of these libraries seem to have overlooked the fact that instantiate() does more than just accept the name of the class to instantiate. It also takes any callable name and passes it the specified arguments.

This allows attackers to more easily accomplish RCE using built-in Python functions such as eval() and os.system().

Meta has since updated Hydra’s documentation to warn that an RCE can occur when using instantiate() and urge users to add a blocklist mechanism that compares the _target_ value to a list of dangerous functions before it is called. However, at this time, the blocklist mechanism is not available in Hydra releases.

Here we take a closer look at three AI/ML libraries that use Hydra’s instantiate() function and related vulnerabilities.

Nemo

NeMo is a PyTorch-based framework created by Nvidia in 2019. Its .nemo and .qnemo file extensions (TAR files that include the model_config.yaml file) store the model’s metadata along with the .pt or .safetensors files, respectively.

The problem here is that these NeMo files do not have their metadata sanitized before making the API call to hydra.utils.instantiate(). This allows an attacker to load a .nemo file containing maliciously crafted metadata, triggering vulnerabilities, and allowing an attacker to accomplish an RCE or modify data.

Nvidia issued CVE-2025-23304 to track the high-severity bug and released a fix in NeMo version 2.3.2.

NeMo is also integrated with Hugging Face, and once the model is downloaded, an attacker could follow the same code path to exploit this vulnerability.

According to Unit 42, over 700 models of Hugging Face from various developers are available in the NeMo file format.

Uni2TS

Uni2TS is a PyTorch library created by Salesforce and used in the Morai Foundation Model for Time Series Analysis along with the set of models published on Hugging Face.

This library only works with .safetensors files created by Hugging Face as a safe format for storing tensors, as opposed to pickle, which allows arbitrary code execution during the loading process.

Salesforce models that use these libraries have hundreds of thousands of downloads on Hugging Face, and other users have published several modifications of these models.

Hugging Face also provides a PyTorchModelHubMixin interface for creating custom model classes that can be integrated with the rest of the framework.

This interface provides a specific mechanism for registering coder functions. And, as you might expect, the uni2TS library uses this mechanism to decode the configuration of certain arguments through calls to hydra.utils.instantiate().

On July 31st, Salesforce published CVE-2026-22584 and deployed a fix.

flex stock

Early last year, Apple and EPFL VILAB created FlexTok, a Python-based framework that allows AI/ML models to process images.

Similar to uni2TS, FlexTok uses only safetensor files, extends PyTorchModelHubMixin, and can load configuration and meta data from .safetensors files. After FlexTok decodes the metadata, it passes it to hydra.utils.instantiate(), which causes the vulnerability.

“As of January 2026, Hugging Face’s models do not appear to use the ml-flextok library, with the exception of models published by EPFL VILAB that have totaled tens of thousands of downloads,” Carmony wrote.

Apple and EPFL VILAB fixed these security issues by using YAML to parse the configuration. The administrator also added an allowed list of classes that can call Hydra’s instantiate() function and updated the documentation to say only models from trusted sources should be loaded. ®



Source link