Alberts, B., Hanson, B., Kelner, KL Editorial: Peer review. science 32115–15 (2008).
Kelly, J., Sadeghieh, T., Adeli, K. Peer review of scientific publications: Benefits, critiques, and survival guide. EJIFCC twenty five227 (2014).
Google Scholar
Publons global peer review status 2018 (Clarivate Analytics, 2018).
Azad, A. & Banu, A. Artificial intelligence conference publication trends: The rise of hyper-prolific authors. Preprint at https://doi.org/10.48550/arXiv.2412.07793 (2024).
McCook, A. Is peer review broken? The number of submissions is increasing, reviewers are overburdened, and authors at top journals are complaining more and more about the process. What’s wrong with peer review? scientist (February 1, 2006).
Tucker, N. et al. Assist with feedback to ICLR 2025 reviewers. ICLR 2025 Review Feedback Blog post by the agent team and program chair. ICLR Blog https://blog.iclr.cc/2024/10/09/iclr2025-assisting-reviewers/ (2024).
In Rogers, A. & Augenstein, I. How can we improve peer review in NLP? Computational Linguistics Association Survey Results: EMNLP 2020 (Eds. Cohn, T., He, Y. & Liu, Y.) 1256–1262 (ACL, 2020).
Rogers, A., Karpinska, M., Boyd-Graber, J., and N. Okazaki. Program Chair Report on Peer Review at ACL 2023. Procedures 61st Annual Meeting of the Association for Computational Linguistics Vol. 1, xl–lxxv (ACL, 2023).
Arns, M. Open access is making reviewers tired. nature 515467 (2014).
Cortes, C. & Lawrence, ND Conflicts in conference peer review: Reconsidering the 2014 NeurIPS experiment. Preprint available at https://doi.org/10.48550/arXiv.2109.09774 (2021).
Claude 3.5 Sonnet (Anthropic, 2024).
Liang, W. et al. Can large-scale language models provide useful feedback for research papers?A large-scale empirical analysis. NEJM AI 1AIoa2400196 (2024).
Yuksekgonul, M. et al. Optimize generative AI by backpropagating language model feedback. nature 639609–616 (2025).
Madhan, A. et al. Self-improvement: Iterative improvement through self-feedback. Advanced neural information processes. system. 3646534–46594 (2023).
Google Scholar
Hosseini, M. & Horbach, SPJM Combat reviewer fatigue or amplify bias? Considerations and recommendations for the use of GhatGPT and other large-scale language models in academic peer review. Integrate resolution. pastor peer 84 (2023).
Liang, W. et al. Monitoring AI-modified content at scale: A case study on the impact of ChatGPT on AI conference peer review. in Procedures 41st International Conference on Machine Learning 29575–29620 (ICML, 2024).
Zhang, Y. et al. The siren’s song in the AI sea: A study of hallucinations in large-scale language models. computational linguistics 511373–1418 (2025).
Zhou, J. et al. Evaluating the commands of large-scale language models. Preprint available at https://doi.org/10.48550/arXiv.2311.07911 (2023).
Liu, R. & Shah, NB Reviewer GPT? An exploratory study on using large-scale language models for article reviews. Preprint available at https://doi.org/10.48550/arXiv.2306.00622 (2023).
Biswas, S., Dobaria, D. & Cohen, HL ChatGPT and the future of journal reviews: A feasibility study. Yale J. Biol. medicine. 96415–420 (2023).
Liang, W. et al. Mapping the increasing use of LLM in scientific writing. in Procedure 1st Conference on Language Modeling (Corum) (2024).
Shah, N.B. Challenges, experiments, and computational solutions in peer review. common. ACM 6576–87 (2022).
Price, S. & Flach, PA Computer support for academic peer review: An artificial intelligence perspective. common. ACM 6070–79 (2017).
Kankanhalli, A. Peer Review in the Age of Generative AI. J. Assoc. Information Systems. twenty five76–84 (2024).
Kuznetsov, I. et al. What can natural language processing do for peer review? Preprint available at https://doi.org/10.48550/arXiv.2405.06563 (2024).
Leung, T.I., Taiane de Azevedo, C., Mavragani, A. & Eysenbach, G. Best practices for using AI tools as an author, reviewer, or editor. J.Med.Internet resolution twenty fivee51584 (2023).
Checco, A., Bracciale, L., Loreti, P., Pinfield, S., Bianchi, G. AI-assisted peer review. Humanit. Social science. common. 825 (2021).
Kousha, K. & Thelwall, M. Artificial intelligence supporting publishing and peer review: An overview and review. learn. Publications. 374–12 (2024).
Goldberg, A. et al. The usefulness of LLM as an author checklist assistant for scientific papers: A NeurIPS’24 experiment. Preprint available at https://doi.org/10.48550/arXiv.2411.03417 (2024).
Su, X., Wambsganss, T., Rietche, R., Neshaei, SP & Käser, T. Reviewriter: AI-generated instructions for peer-review writing. in 18th Workshop on Innovative Use of NLP for Building Procedures Educational Applications (ed. E. Kochmar) 57–71 (ACL, 2023).
D’Arcy, M., Hope, T., Birnbaum, L. & Downey, D. MARG: Multi-agent review generation for scientific articles. Preprint available at https://doi.org/10.48550/arXiv.2401.04259 (2024).
GPT-4 Technical Report (OpenAI, 2024).
Goldberg, A. et al. Peer review of peer review: Randomized controlled trials and other experiments. PLoS ONE 20e0320444 (2025).
Kocak, B., Onur, MR, Park, SH, Baltzer, P. & Dietzel, M. Ensuring peer review integrity in the era of large language models: A critical inventory of challenges, red flags, and recommendations. EUR. J. Radiol. Artif. intelligence. 2100018 (2025).
Yes, R. et al. Have you arrived yet? Clarifying the risks of using large-scale language models in academic peer review. Preprint available at https://doi.org/10.48550/arXiv.2412.01708 (2024).
Shin, H. et al. Beware of blind spots: A focus-level assessment framework for LLM reviews. in Conference on Empirical Methods in Procedural Natural Language Processing 35630–35656 (EMNLP, 2025).
Luo, M. et al. Benchmarking peer-review harm detection: A challenging task using a new dataset. Preprint available at https://doi.org/10.48550/arXiv.2502.01676 (2025).
Tamkin, A. et al. Clio: Privacy-preserving insights into real-world AI use. Preprint available at https://doi.org/10.48550/arXiv.2412.13678 (2024).
Saad-Falcon, J. et al. LMUnit: Fine-grained evaluation with natural language unit testing. in Computational Linguistics Association survey results 3303–3324 (ACL, 2025).
Prasad, A., Stengel-Eskin, E., Chen, JC-Y., Khan, Z., Bansal, M. Learn to generate unit tests for automated debugging. Preprint available at https://doi.org/10.48550/arXiv.2502.01619 (2025).
Charlin, L., Zemel, RS & Boutilier, C. A framework for optimizing article matching. in Procedures 27th Conference on Uncertainty in Artificial Intelligence 1186–95 (AUAI Press, 2011).
ICML 2023 Examiner Tutorial (ICML 2023 Program Committee, 2023).
How to be a good reviewer? ICML 2022 Reviewer Tutorial (ICML 2022 Program Chair, 2022).
Last minute review advice (ACL PC Chair, 2017).
Baldenegro, M. LXCV @ CVPR 2021 Reviewer Mentoring Program: and How to Write a Good Review. Presentation at LatinX in Computer Vision (LXCV) Workshop, CVPR 2021 (2021).
Rogers, A. ARR reviewer guidelines (Computational Linguistics Association, 2021).
Silbiger, New Jersey and Billboard Stabler Unprofessional peer review unfairly harms underrepresented groups within STEM. Peer J 7e8247 (2019).
Feniak, M. et al. PyPDF library. https://pypi.org/project/pypdf/ (2024).
Ribeiro, MT & Lundberg, S. Test language models (and prompts) like you would test software (Medium, 2023).
Thakkar, N. zou-group/review_feedback_agent: First release. Zenodo https://doi.org/10.5281/zenodo.17903957 (2025).
