Right to Digital Erasure and Machine Unlearning

Machine Learning


The application of DPDP Act is on the processing of digital personal data within the territory of India and extends overseas with respect to offering goods or services to data principals within India.

This article is one of the winning entries (Ranked 3rd) of Lexathon organised by NLU, Odisha, a technology law conclave on AI, data protection, and innovation which took place in April, 2026.

Introduction

In the famous case of K.S. Puttaswamy (Privacy-9J.) v. Union of India1, the Supreme Court of India elevated informational self-determination to constitutional status under Article 21 of the Constitution. Under the Digital Personal Data Protection Act, 2023 (DPDP Act), data principals possess the right to seek correction, completion, updating, and erasure of personal data.2 In conventional database systems, compliance is straightforward; a record can be located, identified, and deleted, and an audit trail produced. However, artificial intelligence (AI) systems fundamentally alter this paradigm. When personal data trains a neural network, it is not stored as a discrete entry but used in an iterative optimisation process adjusting millions or billions of parameters. An individual data point’s contribution is distributed across the entire parameter space, making traditional deletion technically ambiguous.

Even if a developer claims to remove an individual’s data influence through retraining or approximate unlearning, how can that claim be verified? The Delhi High Court, in Jorawer Singh Mundy v. Union of India3, recognised the right to be forgotten as a facet of the fundamental right to privacy, ordering de-indexing of certain judicial records. Yet erasure of data influence embedded in a trained model is qualitatively different from removal of access to discrete information. This paper proposes a framework integrating machine unlearning with zero-knowledge cryptographic proof systems to create verifiable compliance mechanisms. The objective is not metaphysical forgetting but procedural and mathematically demonstrable exclusion proving, with cryptographic certainty, that a specific data point was not included in the dataset used to train or retrain a model.

Right to erasure (RTE) under the DPDP Act: Legislative framework

The application of this Act is on the processing of digital personal data within the territory of India and extends overseas with respect to offering goods or services to data principals within India. Processing is lawful where based on informed consent4, or within the categories of legitimate use under Section 7. Under the said Act Section 12 grants right of erasure of personal data for the processing of which they have previously given consent. Section 12(3) provides that a data principal may request erasure, and the data fiduciary must comply unless retention of such data is essential for any prescribed purpose.5 Section 8(7) further mandates erasure in case of withdrawal of consent by the data principal or when the given purpose is no longer fulfilled6.

These provisions construct a framework assuming personal data is identifiable and removable. The Justice B.N. Srikrishna Committee, which laid the intellectual groundwork for India’s data protection framework, had envisaged erasure as a core component of individual data autonomy, emphasising that the right to have personal data erased is essential to preventing the indefinite persistence of information that may be outdated, irrelevant, or harmful.

However, the statute does not provide technical guidance on how erasure should operate when personal data has been incorporated into the training of AI systems. There is no statutory distinction between structured database storage and machine learning training data. The Act does not define what constitutes erasure in the context of data that has been used to train a statistical model and has thereby contributed to the adjustment of parameters that collectively determine the model’s behaviour. This silence creates significant interpretive challenges. Two legal questions emerge with particular force. Firstly, does erasure require only the removal of the data record from the data fiduciary’s storage systems, or does it also require the removal of the data’s influence from trained models that were built using that data? Secondly, what constitutes adequate proof of compliance with an erasure request when the data in question has been processed through a machine learning pipeline?

If a model continues to reflect patterns derived from an individual’s data, the right to erasure risks becoming symbolic rather than substantive. An individual whose personal data was used to train a model that subsequently makes decisions affecting her, such as credit scoring, insurance pricing, or content recommendation, may find that the erasure of her raw data record provides no meaningful relief if the model’s parameters continue to encode her data’s statistical influence. At the same time, requiring complete retraining from scratch for every individual erasure request may be economically and computationally infeasible, particularly for large-scale models trained on millions or billions of data points. The legislation also addresses the conflict between personal freedom and technical progress. The Orissa High Court noted in the case of Subhranshu Rout v. State of Odisha that it is impossible to take back material that has entered the public domain, raising concerns about privacy and related rights.7 Another major problem with AI is data persistence, which influences future output and obligations and adds complexity because Section 8(5) explicitly addresses appropriate data security and personal data issues.8

The technical problem of machine unlearning

1. How neural networks encode data

Neural networks learn by adjusting internal parameters through iterative optimisation algorithms, most commonly variants of stochastic gradient descent.9 Each data sample contributes to incremental parameter adjustments aggregated across the entire training dataset. The resulting parameters reflect the cumulative influence of all data samples; there is no one-to-one mapping between a specific data point and specific parameters.10 Research has demonstrated that training data can leave detectable traces: Large language models may reveal memorised training data through targeted querying,11 and model inversion attacks can reconstruct approximate representations of training data from model outputs. These findings confirm that the right to erasure is a practical necessity in the AI context, yet the technical architecture makes it exceedingly difficult to isolate and remove such traces.

2. Existing unlearning techniques

The most straightforward approach to machine unlearning is full retraining without the target data, which by definition produces a model free from the removed data’s influence. However, full retraining is often prohibitively expensive. Bourtoule and others proposed Sharded, Isolated, Sliced, and Aggregated (SISA) training, which partitions data into isolated shards, each training a separate sub-model, only the affected shard requires retraining upon erasure. Approximate methods include the formalism of Cao and Yang, influence function-based approaches building on Koh and Liang,12 and fine-tuning removal techniques.

All approximate methods share important limitations: they may not completely eliminate the target data’s influence,13 may degrade model performance, and may not scale effectively. Most critically, they generally lack verifiable guarantees. A comprehensive survey has confirmed that the absence of rigorous verification mechanisms remains a significant gap.14 Without verification, any claim of unlearning remains an assertion rather than a demonstrated fact, which is insufficient for legal compliance.

The verification gap

The central issue for legal compliance is whether machine unlearning can be verified by an independent party. In conventional databases, a regulator can examine records and audit logs to confirm deletion. In AI systems, verification is qualitatively different: the question is whether a trained model has been modified to remove the influence of a specific data point. Thudi and others argued that existing unlearning methods lack formally verifiable guarantees and are therefore insufficient to meet legal standards.15. This creates an accountability deficit in AI-based data processing.16 Data principals cannot independently verify whether their data’s influence has been removed. The Data Protection Board of India is empowered to adjudicate complaints,17 but the Act does not equip it with technical standards for auditing AI training pipelines.18 The verification problem is compounded by the tension between transparency and trade secret protection: Forcing full disclosure imposes commercial costs, while self-certification undermines statutory purpose. Existing approaches such as membership inference attacks19 and differential privacy techniques20 offer partial solutions but do not provide direct evidence that a specific data point was excluded from a specific training run. What is needed is a mechanism allowing a data fiduciary to prove, with mathematical certainty, that a model was trained without a specific data point, without revealing proprietary information. Zero-knowledge proof systems have been identified as a promising candidate.

Cryptographic verification as a solution

1. Zero-knowledge proofs

This type of cryptographic protocol allows the first party (the prover) to persuade the second party (the verifier) that a certain assertion is verifiable without disclosing any information other than the validity of the statement itself. This concept was released in the foundational paper in year 1989, which formalised the notion of knowledge complexity in interactive proof systems.21 The essential insight is that it is possible to construct mathematical protocols in which the act of proving a statement does not require the disclosure of the underlying data or computation that makes the statement true.22 A zero-knowledge proof must satisfy threeproperties. Firstly, completeness: if the statement is true and both parties follow the protocol, the verifier will accept the proof. Secondly, soundness: if the statement is false, no dishonest prover can convince the verifier of its truth, except with negligible probability. Thirdly, zero-knowledge: if the statement is true, the verifier learns nothing beyond the fact that the statement is true. These properties make zero-knowledge proofs uniquely suited to compliance verification in contexts where the data or computation underlying a claim must remain confidential.

In the context of machine unlearning, a zero-knowledge proof would allow a data fiduciary to prove that a model was trained on a dataset that did not include a specific data point, without revealing the training dataset, the model’s parameters, or the details of the training algorithm. The verifier, whether a regulator, a court, or the data principal herself, could be mathematically convinced of compliance without gaining access to any proprietary information. Recent research has demonstrated the feasibility of zero-knowledge proofs for machine learning training processes. Garg and others constructed and experimentally validated zero-knowledge proofs of training for logistic regression models, demonstrating that it is possible to generate proofs that a model was honestly trained on a committed dataset.23 Abbaszadeh and others extended this work to deep neural networks, constructing optimised proof systems for gradient descent operations that exploit the tensor structure of neural network computations.24

2. Proposed framework

This paper proposes a three-layered framework for cryptographically verifiable machine unlearning, designed to provide legally sufficient evidence of erasure while preserving the confidentiality of training data and model architecture.

Layer 1: Cryptographic commitment to training data— Before training commences, each data record in the training dataset is individually hashed using a collision-resistant cryptographic hash function.25 The root of the Merkle tree serves as a compact cryptographic commitment to the entire training dataset. This root hash is published or deposited with a trusted third party or Regulatory Authority before training begins. The Merkle tree structure has two important properties. Firstly, it is tamper-evident; any modification to any data record changes the corresponding leaf hash, which propagates through the tree and changes the root hash. Secondly, it supports efficient membership and non-membership proofs. Given a data record, it is possible to prove that the record is or is not included in the committed dataset by providing a short proof path through the tree, without revealing the rest of the dataset.

Layer 2: Verifiable retraining— When an erasure request is received, the data fiduciary identifies the hash of the target data record and removes it from the dataset. The model is then retrained using a deterministic and documented training procedure on the modified dataset, which corresponds to a new Merkle tree with a different root hash. A zero-knowledge proof is then generated that establishes three facts: first, that the new training dataset corresponds to a specified modified Merkle root; second, that the hash of the excluded data record is not present in the new Merkle tree; and third, that the training algorithm was executed according to predefined, auditable rules on the committed dataset. Garg and others demonstrated that such proofs can be constructed for logistic regression using Zero-Knowledge Succinct Non-Interactive Arguments of Knowledges (zk-SNARKs),26 and recent work by Abbaszadeh and others has extended these techniques to deeper neural networks using sumcheck-based proof systems. A recent comprehensive framework for end-to-end verifiable AI pipelines has further demonstrated how such cryptographic tools can be linked across the full AI lifecycle from data sourcing to training, inference, and unlearning.

Layer 3: Forgery resistance— A critical requirement of any verifiable unlearning system is that it must be resistant to fraudulent compliance claims. A dishonest data fiduciary might attempt to generate a proof that purports to demonstrate unlearning without actually retraining the model or might substitute a different dataset to produce a valid-looking proof while retaining the original model. The proof system must therefore bind the training computation to the committed dataset in a manner that prevents such forgery. The cryptographic commitment scheme achieves this by ensuring that the proof is valid only for a model trained on the specific dataset corresponding to the committed Merkle root. Any deviation from the committed dataset or the prescribed training algorithm would produce a different computation trace that would not satisfy the proof verification equation. Thudi and others have emphasised that forgery resistance is essential for any auditable definition of machine unlearning, as without it, the verification mechanism provides no genuine assurance.27

Multi-granularity unlearning

The proposed framework supports multilevel machine unlearning, which enables erasure of data at various degrees of specificity for compliance with legal and regulatory requirements under the DPDP Act. At the sample level, the framework permits absolute removal of an individual’s data record from the training dataset, corresponding to Section 12(3), where a data principal requests the erasure of her personal data. Through utilisation of the proposed Merkle tree structure, the individual’s data hash is excluded, the model is retrained and the zero-knowledge proof verifies that the specific record is no longer part of the data set.

At the feature level, specific attributes, like age, religion, or medical condition, etc. can be removed without the deletion of the entire record, which is achieved through selective masking before retraining and the proof system can verify the exclusion of the specific attribute from the training computation.

At the class level, the entire categories of data, such as data with respect to a particular demographic or data source or type of sensitive data can be removed. This may be required to address issues such as algorithmic discrimination or removal of an entire category of data from a model’s training set by a regulator. It may also be necessary in cases involving concerns of broader compliance affecting groups of data principals rather than individuals. Overall, multi-granularity unlearning strengthens compliance, especially in cases where sensitive personal data and algorithmic fairness are involved, which is carried out by enabling verifiable, cryptographic proof of erasure while maintaining model integrity.

Legal evaluation

1. Procedural compliance versus substantive erasure

A critical legal question arises: does the right to erasure require complete elimination of every statistical trace, or is procedural proof of data exclusion sufficient? Absolute influence elimination may be theoretically impossible in complex neural networks with billions of parameters. Even after retraining, other correlated data points may produce similar outputs. A standard of absolute influence elimination would be both technically impossible and doctrinally unsound. Legal standards do not generally require metaphysical certainty but rely on procedural standards and reasonable approximations.28 Section 12(3) directs erasure without specifying what this means for machine learning.29 If a data fiduciary can prove through a cryptographically sound zero-knowledge proof that target data was excluded from the retraining dataset, and that retraining followed documented and auditable procedures, this should satisfy the statutory requirement.

2. Evidentiary considerations

The proofs must be admissible as evidence. The Sakshya Adhiniyam, 2023, Section 63, addresses electronic record admissibility.30 The Supreme Court, in Arjun Panditrao Khotkar v. Kailash Kushanrao Gorantyal, clarified requirements for electronic evidence production.31 Indian law recognises digital signatures under the Information Technology Act, 2000, Section 3-A,32 providing precedent for cryptographic methods. The Data Protection Board may need to issue technical guidelines specifying cryptographic standards for proofs of unlearning.

3. Policy implications

Mandating verifiable unlearning would enhance accountability, provide data principals meaningful enforcement mechanisms, protect trade secrets through zero-knowledge proofs, and promote regulatory clarity.33 However, compliance burdens could be significant, particularly for smaller entities, given the penalty provisions of up to two hundred and fifty crore rupees. A graduated approach mandating cryptographic verification for significant data fiduciaries under Section 10 while permitting alternative methods for smaller entities would ensure proportionality.

Challenges and limitations

The proposed unlearning framework, in spite of being conceptually strong, faces four major challenges—

1. Computational overhead: Generation of zero-knowledge proofs for machine learning training is highly resource-intensive, which often costs multiple times the costs of the underlying computation itself. For instance, optimised proof systems for deep neural networks, such as those developed by Abbaszadeh and others, involve substantial prover-side computation.34 In case of large-scale models involving billions of parameters, the cost of generation of zero-knowledge proof might be restricted by the current technology. Garg and others reported that their zero-knowledge proof of training for logistic regression, while demonstrating feasibility, involved computational overhead that would need to be significantly reduced for practical deployment at scale.35

2. Scalability: Modern large language models are trained on massive datasets and comprise billions of parameters. Even when approaches like SISA facilitate reduction of costs through partitioning of data, they also result in the maintenance of multiple sub-models and aggregation of their outputs. The current proof systems have only been tested on relatively small models, and significant efficiency improvements are required for application of the framework to large AI systems.

3. Statistical residual effects: In spite of removal and retraining without specific data, models may still exhibit correlated behaviours due to overlapping of information in the remaining dataset. Research on membership inference attacks shows that distinguishing a model trained with a specific data point from the one trained without it can be difficult as the statistical signal of a single data point is often smaller than the noise by the rest of the training set.36 The proposed framework addresses this challenge by defining compliance in procedural terms (proof of exclusion from the training set) rather than substantive terms (lack of guarantee regarding any correlation between the model’s output with the excluded data).

4. Regulatory readiness: Courts and other regulatory bodies, including the Data Protection Board of India, may lack the technical expertise required for assessment of zero-knowledge proofs and cryptographic protocols. Effective implementation would require institutional capacity building, i.e., appointment of technical advisors and establishment of specialised panels in cryptography and machine learning. Along with that, interdisciplinary collaboration between legal scholars, computer scientists, and cryptographers is essential to the refinement and development of the proposed framework.

Conclusion

The right to be forgotten represents a commitment to informational self-determination, grounded in the constitutional recognition of privacy under Article 21.37 The DPDP Act operationalises this through erasure rights, yet AI systems challenge implementation in ways the statute does not address. Personal data used to train neural networks does not persist in a locatable, deletable form, and existing unlearning techniques lack standardised verification mechanisms.

This paper has proposed that zero-knowledge proofs offer a pathway to operationalise erasure obligations. By requiring cryptographic commitment to training datasets, documented retraining upon erasure requests, and zero-knowledge proofs of data exclusion, the framework transforms compliance from declaratory assurance into mathematical proof.38 Significant challenges remain like computational cost, scalability, statistical residuals, and regulatory capacity, but these call for interdisciplinary collaboration rather than resignation. If deletion cannot be demonstrated, the right cannot be meaningful. Cryptographically verifiable machine unlearning ensures that the right to be forgotten is not merely a legal aspiration but an operationally enforceable guarantee.


*Student, National Law University Delhi.

**Student, National Law University Delhi.

1. (2017) 10 SCC 1, para 169 (Chandrachud J).

2. Digital Personal Data Protection Act, 2023, S. 12(1).

3. 2021 SCC OnLine Del 2306.

4. Digital Personal Data Protection Act, 2023, S and 5(1).. 4 and 5(1).

5. Digital Personal Data Protection Act, 2023, S. 12(3).

6. Digital Personal Data Protection Act, 2023a,, S. 8(7)(a) and 8(8).

7. 2020 SCC OnLine Ori 878 .

8. Digital Personal Data Protection Act, 2023, S. 10(2).

9. Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning (MIT Press, 2016) p. 96.

10. Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning (MIT Press, 2016) pp. 96, 168.

11. Nicholas Carlini and others, “Extracting Training Data from Large Language Models” (2021) 30th USENIX Security Symposium 2633.

12. Pang Wei Koh and Percy Liang, “Understanding Black-box Predictions via Influence Functions” (2017) 34th International Conference on Machine Learning (ICML) 1885.

13. Lucas Bourtoule et al, “Machine Unlearning” (2020) arXiv:1912.03817.

14. Thanh Tam Nguyen and others, “A Survey of Machine Unlearning” (2022) arXiv:2209.02299. This survey provides a comprehensive overview of unlearning techniques and their limitations.

15. Anvith Thudi and others, “On the Necessity of Auditable Algorithmic Definitions for Machine Unlearning” (2022) Proceedings of the 31st USENIX Security Symposium 4007, 4009.

16. This accountability deficit arises because the internal workings of neural networks are not directly accessible to external auditors in the manner that conventional databases are.

17. Digital Personal Data Protection Act, 202318, Ss. 18—26 (powers and functions of the Data Protection Board of India).

18. Digital Personal Data Protection Act, 2023, Ss. 18—26 (powers and functions of the Data Protection Board of India). S. 8(5) imposes an obligation on data fiduciaries to protect personal data by taking reasonable security safeguards but does not specify technical standards for verifying erasure in AI contexts.

19. Nicholas Carlini and others, “Extracting Training Data from Large Language Models” (2021) 30th USENIX Security Symposium 2633, 2645.

20. Cynthia Dwork and Aaron Roth, “The Algorithmic Foundations of Differential Privacy” (2014) 9(3) Foundations and Trends in Theoretical Computer Science 211.

21. Shafi Goldwasser, Silvio Micali and Charles Rackoff, “The knowledge Complexity of Interactive Proof Systems”, 1989 18(1) SIAM J COMPUT 186.

22. Shafi Goldwasser, Silvio Micali and Charles Rackoff, “The knowledge Complexity of Interactive Proof Systems”, 1989 18(1) SIAM J COMPUT, 191.

23. Sanjam Garg and others, “Experimenting with Zero-Knowledge Proofs of Training” (2023) 2023 ACM SIGSAC Conference on Computer and Communications Security 1.

24. Kasra Abbaszadeh and others, “Zero-Knowledge Proofs of Training for Deep Neural Networks” (2024) Cryptology ePrint Archive, Paper 2024/162.

25. A collision-resistant hash function is one for which it is computationally infeasible to find two distinct inputs that produce the same output.

26. Sanjam Garg and others, “Experimenting with Zero-Knowledge Proofs of Training” (2023) 2023 ACM SIGSAC Conference on Computer and Communications Security 1, 6.

27. Anvith Thudi and others, “On the Necessity of Auditable Algorithmic Definitions for Machine Unlearning” (2022) 31st USENIX Security Symposium 4007, 4012.

28. This doctrinal argument draws on the general principle that legal standards of compliance are assessed against what is procedurally reasonable rather than what is metaphysically certain.

29. Digital Personal Data Protection Act, 2023, S. 12(3).

30. This provision addresses the admissibility of electronic records and requires that such records be accompanied by a certificate identifying the device from which the record was produced.

31. (2020) 7 SCC 1 : (2020) 4 SCC (Civ) 1 : (2020) 3 SCC (Cri) 1 : (2020) 2 SCC (L&S) 587, para 68.

32. Information Technology Act, 2000, S. 3-A (electronic signatures).

33. Information Technology Act, 2000, S. 3-A (electronic signatures).

34. Kasra Abbaszadeh and others, “Zero-Knowledge Proofs of Training for Deep Neural Networks” (2024) Cryptology ePrint Archive, Paper 2024/162, 3.

35. Sanjam Garg and others, “Experimenting with Zero-Knowledge Proofs of Training” (2023) 2023 ACM SIGSAC Conference on Computer and Communications Security 1, 15.

36. Carlini and others (n 20) Nicholas Carlini and others, “Extracting Training Data from Large Language Models” (2021) 30th USENIX Security Symposium 2633, 2649.

37. K.S. Puttaswamy (Privacy-9J.) v. Union of India, (2017) 10 SCC 1.

38. Balan K and others (n 33) 56.



Source link