Coding guide to demonstrate targeted data poisoning attacks in deep learning with label inversion for CIFAR-10 using PyTorch

This tutorial demonstrates a realistic data poisoning attack by manipulating labels in the CIFAR-10 dataset and observing their effects on model behavior. Use ResNet-style convolutional networks to build clean and poisoned training pipelines side by side to ensure stable and comparable learning dynamics. By selectively flipping some of the samples from the target class to the malicious class during training, we show how subtle corruptions in the data pipeline propagate into systematic misclassifications at inference time. Please check Full code here.

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report


CONFIG = {
   "batch_size": 128,
   "epochs": 10,
   "lr": 0.001,
   "target_class": 1,
   "malicious_label": 9,
   "poison_ratio": 0.4,
}


torch.manual_seed(42)
np.random.seed(42)

Set up the core environment needed for your experiments and define all global configuration parameters in one place. Ensure reproducibility by fixing random seeds across PyTorch and NumPy. Also, explicitly select your computing device so that the tutorial runs efficiently on both CPUs and GPUs. Please check Full code here.

class PoisonedCIFAR10(Dataset):
   def __init__(self, original_dataset, target_class, malicious_label, ratio, is_train=True):
       self.dataset = original_dataset
       self.targets = np.array(original_dataset.targets)
       self.is_train = is_train
       if is_train and ratio > 0:
           indices = np.where(self.targets == target_class)[0]
           n_poison = int(len(indices) * ratio)
           poison_indices = np.random.choice(indices, n_poison, replace=False)
           self.targets[poison_indices] = malicious_label


   def __getitem__(self, index):
       img, _ = self.dataset[index]
       return img, self.targets[index]


   def __len__(self):
       return len(self.dataset)

Implement a custom dataset wrapper that allows you to control label poisoning during training. Selectively switch a configurable portion of the sample from the target class to the malicious class while leaving the test data intact. The original image data is preserved so that only the integrity of the label is intact. Please check Full code here.

def get_model():
   model = torchvision.models.resnet18(num_classes=10)
   model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
   model.maxpool = nn.Identity()
   return model.to(CONFIG["device"])


def train_and_evaluate(train_loader, description):
   model = get_model()
   optimizer = optim.Adam(model.parameters(), lr=CONFIG["lr"])
   criterion = nn.CrossEntropyLoss()
   for _ in range(CONFIG["epochs"]):
       model.train()
       for images, labels in train_loader:
           images = images.to(CONFIG["device"])
           labels = labels.to(CONFIG["device"])
           optimizer.zero_grad()
           outputs = model(images)
           loss = criterion(outputs, labels)
           loss.backward()
           optimizer.step()
   return model

Define a lightweight ResNet-based model aligned to CIFAR-10 and implement a complete training loop. Train the network using standard cross-entropy loss and Adam optimization to ensure stable convergence. To isolate the effects of data poisoning, keep the training logic for clean and poisoned data the same. Please check Full code here.

def get_predictions(model, loader):
   model.eval()
   preds, labels_all = [], []
   with torch.no_grad():
       for images, labels in loader:
           images = images.to(CONFIG["device"])
           outputs = model(images)
           _, predicted = torch.max(outputs, 1)
           preds.extend(predicted.cpu().numpy())
           labels_all.extend(labels.numpy())
   return np.array(preds), np.array(labels_all)


def plot_results(clean_preds, clean_labels, poisoned_preds, poisoned_labels, classes):
   fig, ax = plt.subplots(1, 2, figsize=(16, 6))
   for i, (preds, labels, title) in enumerate([
       (clean_preds, clean_labels, "Clean Model Confusion Matrix"),
       (poisoned_preds, poisoned_labels, "Poisoned Model Confusion Matrix")
   ]):
       cm = confusion_matrix(labels, preds)
       sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=ax[i],
                   xticklabels=classes, yticklabels=classes)
       ax[i].set_title(title)
   plt.tight_layout()
   plt.show()

Perform inference on the test set and collect predictions for quantitative analysis. Compute the confusion matrix to visualize the per-class behavior of both the clean and tainted models. Use these visual diagnostics to highlight patterns of target misclassification resulting from attacks. Please check Full code here.

transform = transforms.Compose([
   transforms.RandomHorizontalFlip(),
   transforms.ToTensor(),
   transforms.Normalize((0.4914, 0.4822, 0.4465),
                        (0.2023, 0.1994, 0.2010))
])


base_train = torchvision.datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
base_test = torchvision.datasets.CIFAR10(root="./data", train=False, download=True, transform=transform)


clean_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=0)
poison_ds = PoisonedCIFAR10(base_train, CONFIG["target_class"], CONFIG["malicious_label"], ratio=CONFIG["poison_ratio"])


clean_loader = DataLoader(clean_ds, batch_size=CONFIG["batch_size"], shuffle=True)
poison_loader = DataLoader(poison_ds, batch_size=CONFIG["batch_size"], shuffle=True)
test_loader = DataLoader(base_test, batch_size=CONFIG["batch_size"], shuffle=False)


clean_model = train_and_evaluate(clean_loader, "Clean Training")
poisoned_model = train_and_evaluate(poison_loader, "Poisoned Training")


c_preds, c_true = get_predictions(clean_model, test_loader)
p_preds, p_true = get_predictions(poisoned_model, test_loader)


plot_results(c_preds, c_true, p_preds, p_true, classes)


print(classification_report(c_true, c_preds, target_names=classes, labels=[1]))
print(classification_report(p_true, p_preds, target_names=classes, labels=[1]))

Prepare the CIFAR-10 dataset, build a clean data loader and a tainted data loader, and run both training pipelines end-to-end. Evaluate the trained model on a shared test set to ensure a fair comparison. Complete your analysis by reporting class-specific precision and recall to reveal the effects of poisoning on your target classes.

In conclusion, we observed how label-level data poisoning degrades class-specific performance without necessarily destroying overall accuracy. We analyzed this behavior using confusion matrices and class-by-class classification reports to reveal the target failure mode introduced by the attack. This experiment highlights the importance of data provenance, validation, and monitoring in real-world machine learning systems, especially in safety-critical areas.

Please check Full code here. Also, feel free to follow us Twitter Don't forget to join us 100,000+ ML subreddits and subscribe our newsletter. hang on! Are you on telegram? You can now also participate by telegram.

Check out the latest releases ai2025.devis a 2025-focused analytics platform that transforms model launches, benchmarks, and ecosystem activity into structured datasets that can be filtered, compared, and exported.

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His latest endeavor is the launch of Marktechpost, an artificial intelligence media platform. It stands out for its thorough coverage of machine learning and deep learning news, which is technically sound and easily understood by a wide audience. The platform boasts over 2 million views per month, demonstrating its popularity among viewers.

Source link