Brandi, J., Noberini, R., Bonaldi, T. & Cecconi, D. Advances in enrichment methods for mass spectrometry-based proteomics analysis of post-translational modifications. J. Chromatogr. A 1678, 463352 (2022).
Google Scholar
Deribe, Y. L., Pawson, T. & Dikic, I. Post-translational modifications in signal integration. Nat. Struct. Mol. Biol. 17, 666–672 (2010).
Google Scholar
Liu, J., Qian, C. & Cao, X. Post-translational modification control of innate immunity. Immunity 45, 15–30 (2016).
Google Scholar
Qian, M. et al. Targeting post-translational modification of transcription factors as cancer therapy. Drug Discov. Today 25, 1502–1512 (2020).
Google Scholar
Rauh, D. et al. An acetylome peptide microarray reveals specificities and deacetylation substrates for all human sirtuin isoforms. Nat. Commun. 4, 2327 (2013).
Google Scholar
Merbl, Y. & Kirschner, M. W. Large-scale detection of ubiquitination substrates using cell extracts and protein microarrays. Proc. Natl Acad. Sci. USA. 106, 2543–2548 (2009).
Google Scholar
Moore, K. E. & Gozani, O. An unexpected journey: Lysine methylation across the proteome. Biochim. Biophys. Acta (BBA) Gene Reg. Mech. 1839, 1395–1403 (2014).
Google Scholar
Polo, S. et al. A single motif responsible for ubiquitin recognition and monoubiquitination in endocytic proteins. Nature 416, 451–455 (2002).
Google Scholar
Rathert, P., Zhang, X., Freund, C., Cheng, X. & Jeltsch, A. Analysis of the substrate specificity of the dim-5 histone lysine methyltransferase using peptide arrays. Chem. Biol. 15, 5–11 (2008).
Google Scholar
Rathert, P. et al. Protein lysine methyltransferase G9a acts on non-histone targets. Nat. Chem. Biol. 4, 344–346 (2008).
Google Scholar
Kudithipudi, S., Dhayalan, A., Kebede, A. F. & Jeltsch, A. The SET8 H4K20 protein lysine methyltransferase has a long recognition sequence covering seven amino acid residues. Biochimie 94, 2212–2218 (2012).
Google Scholar
Kudithipudi, S., Kusevic, D., Weirich, S. & Jeltsch, A. Specificity analysis of protein lysine methyltransferases using SPOT peptide arrays. J. Vis. Exp. 99, e52203 (2014).
Chopra, A., Willmore, W. G. & Biggar, K. K. Insights into a cancer-target demethylase: substrate prediction through systematic specificity analysis for KDM3A. Biomolecules 12, 641 (2022).
Google Scholar
Bradley, D. et al. The substrate quality of CK2 target sites has a determinant role on their function and evolution. Cell Syst. 15, 544–562.e8 (2024).
Google Scholar
Mitchell, C. J. et al. Unbiased identification of substrates of protein tyrosine phosphatase ptp-3 in C. elegans. Mol. Oncol. 10, 910–920 (2016).
Google Scholar
Yu-Ying, Y., Markus, G. & Howard, H. C. Identification of lysine acetyltransferase p300 substrates using 4-pentynoyl-coenzyme A and bioorthogonal proteomics. Bioorg. Med. Chem. Lett. 21, 4976–4979 (2011).
Google Scholar
Biggar, K. K. et al. Proteome-wide prediction of lysine methylation leads to identification of H2BK43 methylation and outlines the potential methyllysine proteome. Cell Rep. 32, 107896 (2020).
Google Scholar
Jamal, S., Ali, W., Nagpal, P., Grover, A. & Grover, S. Predicting phosphorylation sites using machine learning by integrating the sequence, structure, and functional information of proteins. J. Transl. Med. 19, 218 (2021).
Google Scholar
Kiemer, L., Bendtsen, J. D. & Blom, N. NetAcet: prediction of N-terminal acetylation sites. Bioinformatics 21, 1269–1270 (2005).
Google Scholar
Neely, B. A. et al. Toward an integrated machine learning model of a proteomics experiment. J. Proteome Res. 22, 681–696 (2023).
Google Scholar
Deng, W. et al. GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences. Sci. Rep. 6, 39787 (2016).
Google Scholar
Wu, Z., Lu, M. & Li, T. Prediction of substrate sites for protein phosphatases 1B, SHP-1, and SHP-2 based on sequence features. Amino Acids 46, 1919–1928 (2014).
Google Scholar
Wang, X. et al. UbiBrowser 2.0: a comprehensive resource for proteome-wide known and predicted ubiquitin ligase/deubiquitinase–substrate interactions in eukaryotic species. Nucleic Acids Res. 50, D719–D728 (2022).
Google Scholar
Smith, K., Rhoads, N. & Chandrasekaran, S. Protocol for CAROM: a machine learning tool to predict post-translational regulation from metabolic signatures. STAR Protoc. 3, 101799 (2022).
Google Scholar
Lanouette, S. et al. Discovery of substrates for a SET domain lysine methyltransferase predicted by multistate computational protein design. Structure 23, 206–215 (2015).
Google Scholar
Ferrari, E. et al. Identification of new substrates of the protein-tyrosine phosphatase PPT1B by bayesian integration of proteome evidence. J. Biol. Chem. 286, 4173–4185 (2011).
Google Scholar
Vinogradov, A. A., Chang, J. S., Onaka, H., Goto, Y. & Suga, H. Accurate models of substrate preferences of post-translational modification enzymes from a combination of mRNA display and deep learning. ACS Cent. Sci. 8, 814–824 (2022).
Google Scholar
Fang, J. et al. Purification and functional characterization of SET8, a nucleosomal histone H4-lysine 20-specific methyltransferase. Curr. Biol. 12, 1086–1099 (2002).
Google Scholar
Milite, C. et al. The emerging role of lysine methyltransferase SETD8 in human diseases. Clin. Epigenet 8, 102 (2016).
Google Scholar
Biggar, K. K., Wang, Z. & Li, S. S.-C. SnapShot: Lysine methylation beyond histones. Mol. Cell 68, 1016–1016.e1 (2017).
Google Scholar
Zhang, H. et al. SET8 prevents excessive DNA methylation by methylation-mediated degradation of UHRF1 and DNMT1. Nucleic Acids Res. 47, 9053–9068 (2019).
Google Scholar
Chin, H. G. et al. The microtubule-associated histone methyltransferase SET8, facilitated by transcription factor LSF, methylates α-tubulin. J. Biol. Chem. 295, 4748–4759 (2020).
Google Scholar
Wu, Q.-J. et al. The sirtuin family in health and disease. Sig Transduct. Target Ther. 7, 402 (2022).
Google Scholar
Hornbeck, P. V. et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43, D512–D520 (2015).
Google Scholar
Wang, D. et al. MusiteDeep: a deep-learning based webserver for protein post-translational modification site prediction and visualization. Nucleic Acids Res. 48, W140–W146 (2020).
Google Scholar
Yin, Y. et al. SET8 recognizes the sequence RHRK20VLRDN within the N terminus of histone H4 and mono-methylates lysine 20. J. Biol. Chem. 280, 30025–30031 (2005).
Google Scholar
Topcu, E., Ridgeway, N. H. & Biggar, K. K. PeSA 2.0: A software tool for peptide specificity analysis implementing positive and negative motifs and motif-based peptide scoring. Comput. Biol. Chem. 101, 107753 (2022).
Google Scholar
Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019).
Google Scholar
Erjavac, I., Kalafatovic, D. & Mauša, G. Coupled encoding methods for antimicrobial peptide prediction: how sensitive is a highly accurate model?. Artif. Intell. Life Sci. 2, 100034 (2022).
Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280 (2002).
Google Scholar
Ruiz-Blanco, Y. B., Paz, W., Green, J. & Marrero-Ponce, Y. ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins. BMC Bioinforma. 16, 162 (2015).
Google Scholar
Romero-Molina, S., Ruiz-Blanco, Y. B., Green, J. R. & Sanchez-Garcia, E. ProtDCal-Suite: A web server for the numerical codification and functional analysis of proteins. Protein Sci. 28, 1734−1743 (2019).
Szeghalmy, S. & Fazekas, A. A comparative study of the use of stratified cross-validation and distribution-balanced stratified cross-validation in imbalanced learning. Sensors 23, 2333 (2023).
Google Scholar
Burkov, A. The Hundred-Page Machine Learning Book Hard Cover ed. edition, Vol. 160 (Andriy Burkov, Polen, 2019).
Brownlee, J. Imbalanced Classification with Python: Choose Better Metrics, Balance Skewed Classes, and Apply Cost-Sensitive Learning. (Machine Learning Mastery, 2021).
Izenman, A. J. Linear discriminant analysis. In Modern Multivariate Statistical Techniques (ed. Izenman, A. J.) 237–280 (Springer New York, 2013).
Kamalov, F., Leung, H.-H. & Cherukuri, A. K. Keep it simple: random oversampling for imbalanced data. In 2023 Advances in Science and Engineering Technology International Conferences (ASET) 1–4 (IEEE, Dubai, 2023).
Wright, R. E. Logistic regression. Read. Underst. Multivar. Stat. 217, 244 (1995).
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: Synthetic minority over-sampling technique. jair 16, 321–357 (2002).
Google Scholar
Nguyen, H. M., Cooper, E. W. & Kamei, K. Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradig. 3, 4–21 (2009).
Google Scholar
Petersen, B., Petersen, T. N., Andersen, P., Nielsen, M. & Lundegaard, C. A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct. Biol. 9, 51 (2009).
Google Scholar
Baryshnikova, A. Spatial Analysis of Functional Enrichment (SAFE) in Large Biological Networks. In Computational Cell Biology (eds von Stechow, L. & Santos Delgado, A.) 249–268 (Springer New York, 2018).
The Gene Ontology, C. onsortium et al. The gene ontology resource: enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2021).
Google Scholar
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).
Google Scholar
Couture, J.-F., Collazo, E., Brunzelle, J. S. & Trievel, R. C. Structural and functional analysis of SET8, a histone H4 Lys-20 methyltransferase. Genes Dev. 19, 1455–1465 (2005).
Google Scholar
Kaczmarek Michaels, K., Mohd Mostafa, S., Ruiz Capella, J. & Moore, C. L. Regulation of alternative polyadenylation in the yeast saccharomyces cerevisiae by histone H3K4 and H3K36 methyltransferases. Nucleic Acids Res. 48, 5407–5425 (2020).
Google Scholar
Liu, B. et al. A functional single nucleotide polymorphism of SET8 is prognostic for breast cancer. Oncotarget 7, 34277–34287 (2016).
Google Scholar
Jørgensen, S. et al. The histone methyltransferase SET8 is required for S-phase progression. J. Cell Biol. 179, 1337–1345 (2007).
Google Scholar
Yang, C., Wang, K., Zhou, Y. & Zhang, S.-L. Histone lysine methyltransferase SET8 is a novel therapeutic target for cancer treatment. Drug Discov. Today 26, 2423–2430 (2021).
Google Scholar
Bogliolo, M. et al. Mutations in ERCC4, encoding the DNA-repair endonuclease XPF, cause fanconi anemia. Am. J. Hum. Genet. 92, 800–806 (2013).
Google Scholar
Faridounnia, M., Folkers, G. & Boelens, R. Function and interactions of ERCC1-XPF in DNA damage response. Molecules 23, 3205 (2018).
Google Scholar
Xu, L. et al. Roles for the methyltransferase SETD8 in DNA damage repair. Clin. Epigenet 14, 34 (2022).
Google Scholar
Zhang, H. et al. Quantitative proteomic analysis of the lysine acetylome reveals diverse SIRT2 substrates. Sci. Rep. 12, 3822 (2022).
Google Scholar
Levy, D. et al. A proteomic approach for the identification of novel lysine methyltransferase substrates. Epigenetics Chromatin 4, 19 (2011).
Google Scholar
Meng, L. et al. Mini-review: recent advances in post-translational modification site prediction based on deep learning. Comput. Struct. Biotechnol. J. 20, 3522–3532 (2022).
Google Scholar
Schwartz, D. Prediction of lysine post-translational modifications using bioinformatic tools. Essays Biochem. 52, 165–177 (2012).
Google Scholar
Shilatifard, A. The COMPASS Family of histone H3K4 methylases: mechanisms of regulation in development and disease pathogenesis. Annu. Rev. Biochem. 81, 65–95 (2012).
Google Scholar
Weber, L. M. et al. The histone acetyltransferase KAT6A is recruited to unmethylated CpG islands via a DNA binding winged helix domain. Nucleic Acids Res. 51, 574–594 (2023).
Google Scholar
Shinsky, S. A., Monteith, K. E., Viggiano, S. & Cosgrove, M. S. Biochemical reconstitution and phylogenetic comparison of human SET1 family core complexes involved in histone methylation. J. Biol. Chem. 290, 6361–6375 (2015).
Google Scholar
Rienzo, M. et al. PRDM12 in health and diseases. IJMS 22, 12030 (2021).
Google Scholar
Hashimoto, K., Wada, K., Matsumoto, K. & Moriya, M. Physical interaction between SLX4 (FANCP) and XPF (FANCQ) proteins and biological consequences of interaction-defective missense mutations. DNA Repair 35, 48–54 (2015).
Google Scholar
Bakker, J. L. et al. Analysis of the novel fanconi anemia gene SLX4 / FANCP in familial breast cancer cases. Hum. Mutat. 34, 70–73 (2013).
Google Scholar
Grimes, M. et al. Integration of protein phosphorylation, acetylation, and methylation data sets to outline lung cancer signaling networks. Sci. Signal. 11, eaaq1087 (2018).
Google Scholar
Leutert, M., Entwisle, S. W. & Villén, J. Decoding post-translational modification crosstalk with proteomics. Mol. Cell. Proteom. 20, 100129 (2021).
Google Scholar
Yang, J., Hu, Y., Zhang, B., Liang, X. & Li, X. The JMJD Family histone demethylases in crosstalk between inflammation and cancer. Front. Immunol. 13, 881396 (2022).
Ikram, S. et al. The SMYD3-MAP3K2 signaling axis promotes tumor aggressiveness and metastasis in prostate cancer. Sci. Adv. 9, eadi5921 (2023).
Google Scholar
Heinonen, T. et al. Dual deletion of the sirtuins SIRT2 and SIRT3 impacts on metabolism and inflammatory responses of macrophages and protects from endotoxemia. Front. Immunol. 10, 2713 (2019).
Google Scholar
Chen, L. T. et al. Target sequence-conditioned design of peptide binders using masked language modeling. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02761-2 (2025).
Rathore, A. S., Kumar, N., Choudhury, S., Mehta, N. K. & Raghava, G. P. S. Prediction of hemolytic peptides and their hemolytic concentration. Commun. Biol. 8, 176 (2025).
Google Scholar
Chopra, A. et al. A peptide array pipeline for the development of spike-ACE2 interaction inhibitors. Peptides 158, 170898 (2022).
Google Scholar
Hilpert, K., Winkler, D. F. & Hancock, R. E. Cellulose-bound peptide arrays: preparation and applications. Biotechnol. Genet. Eng. Rev. 24, 31–106 (2007).
Google Scholar
Bradford, M. M. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248–254 (1976).
Google Scholar
Hornbeck, P. V. et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261–D270 (2012).
Google Scholar
Rossum, G. V. & Drake, F. L. Python 3 Reference Manual. (CreateSpace, 2009).
McKinney, W. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference 56–61 (IEEE, 2010).
Rowe, E. M. & Biggar, K. K. An optimized method using peptide arrays for the identification of in vitro substrates of lysine methyltransferase enzymes. MethodsX 5, 118–124 (2018).
Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825−2830 (2018).
Lemaitre, G., Nogueira, F. & Aridas, C. K. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 1−5 (2016).
Harris et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Google Scholar
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Google Scholar
Dietterich, T. G. Ensemble methods in machine learning. In Multiple Classifier Systems, 1–15 (Springer Berlin Heidelberg, 2000).
The UniProt Consortium et al. UniProt: The universal protein knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Krzywinski, M. et al. Circos: An information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Google Scholar
Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R. & Pfister, H. UpSet: Visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph. 20, 1983–1992 (2014).
Google Scholar
Rosario, F. J. et al. Placental remote control of fetal metabolism: trophoblast mTOR signaling regulates liver IGFBP-1 phosphorylation and IGF-1 bioavailability. IJMS 24, 7273 (2023).
Google Scholar
MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).
Google Scholar
Szklarczyk, D. et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015).
Google Scholar
Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference (eds Varoquaux, G., Vaught, T. & Millman, J.) 11–15 (Pasadena, CA USA, 2008).
Tate, J. G. et al. COSMIC: The catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Google Scholar
Shannon, P. et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Google Scholar
Morris, J. H. et al. clusterMaker: a multi-algorithm clustering plugin for cytoscape. BMC Bioinforma. 12, 436 (2011).
Google Scholar
Bader, G. D. & Hogue, C. W. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinforma. 4, 2 (2003).
Google Scholar
