A large-scale randomized study of large-scale language model feedback in peer review

Alberts, B., Hanson, B., Kelner, KL Editorial: Peer review. science 32115–15 (2008).

Article Google Scholar

Kelly, J., Sadeghieh, T., Adeli, K. Peer review of scientific publications: Benefits, critiques, and survival guide. EJIFCC twenty five227 (2014).

Google Scholar

Publons global peer review status 2018 (Clarivate Analytics, 2018).

Azad, A. & Banu, A. Artificial intelligence conference publication trends: The rise of hyper-prolific authors. Preprint at https://doi.org/10.48550/arXiv.2412.07793 (2024).

McCook, A. Is peer review broken? The number of submissions is increasing, reviewers are overburdened, and authors at top journals are complaining more and more about the process. What’s wrong with peer review? scientist (February 1, 2006).

Tucker, N. et al. Assist with feedback to ICLR 2025 reviewers. ICLR 2025 Review Feedback Blog post by the agent team and program chair. ICLR Blog https://blog.iclr.cc/2024/10/09/iclr2025-assisting-reviewers/ (2024).

In Rogers, A. & Augenstein, I. How can we improve peer review in NLP? Computational Linguistics Association Survey Results: EMNLP 2020 (Eds. Cohn, T., He, Y. & Liu, Y.) 1256–1262 (ACL, 2020).

Rogers, A., Karpinska, M., Boyd-Graber, J., and N. Okazaki. Program Chair Report on Peer Review at ACL 2023. Procedures 61st Annual Meeting of the Association for Computational Linguistics Vol. 1, xl–lxxv (ACL, 2023).

Arns, M. Open access is making reviewers tired. nature 515467 (2014).

Article Google Scholar

Cortes, C. & Lawrence, ND Conflicts in conference peer review: Reconsidering the 2014 NeurIPS experiment. Preprint available at https://doi.org/10.48550/arXiv.2109.09774 (2021).

Claude 3.5 Sonnet (Anthropic, 2024).

Liang, W. et al. Can large-scale language models provide useful feedback for research papers?A large-scale empirical analysis. NEJM AI 1AIoa2400196 (2024).

Article Google Scholar

Yuksekgonul, M. et al. Optimize generative AI by backpropagating language model feedback. nature 639609–616 (2025).

Article Google Scholar

Madhan, A. et al. Self-improvement: Iterative improvement through self-feedback. Advanced neural information processes. system. 3646534–46594 (2023).

Google Scholar

Hosseini, M. & Horbach, SPJM Combat reviewer fatigue or amplify bias? Considerations and recommendations for the use of GhatGPT and other large-scale language models in academic peer review. Integrate resolution. pastor peer 84 (2023).

Liang, W. et al. Monitoring AI-modified content at scale: A case study on the impact of ChatGPT on AI conference peer review. in Procedures 41st International Conference on Machine Learning 29575–29620 (ICML, 2024).

Zhang, Y. et al. The siren’s song in the AI sea: A study of hallucinations in large-scale language models. computational linguistics 511373–1418 (2025).

Article Google Scholar

Zhou, J. et al. Evaluating the commands of large-scale language models. Preprint available at https://doi.org/10.48550/arXiv.2311.07911 (2023).

Liu, R. & Shah, NB Reviewer GPT? An exploratory study on using large-scale language models for article reviews. Preprint available at https://doi.org/10.48550/arXiv.2306.00622 (2023).

Biswas, S., Dobaria, D. & Cohen, HL ChatGPT and the future of journal reviews: A feasibility study. Yale J. Biol. medicine. 96415–420 (2023).

Article Google Scholar

Liang, W. et al. Mapping the increasing use of LLM in scientific writing. in Procedure 1st Conference on Language Modeling (Corum) (2024).

Shah, N.B. Challenges, experiments, and computational solutions in peer review. common. ACM 6576–87 (2022).

Article Google Scholar

Price, S. & Flach, PA Computer support for academic peer review: An artificial intelligence perspective. common. ACM 6070–79 (2017).

Article Google Scholar

Kankanhalli, A. Peer Review in the Age of Generative AI. J. Assoc. Information Systems. twenty five76–84 (2024).

Kuznetsov, I. et al. What can natural language processing do for peer review? Preprint available at https://doi.org/10.48550/arXiv.2405.06563 (2024).

Leung, T.I., Taiane de Azevedo, C., Mavragani, A. & Eysenbach, G. Best practices for using AI tools as an author, reviewer, or editor. J.Med.Internet resolution twenty fivee51584 (2023).

Article Google Scholar

Checco, A., Bracciale, L., Loreti, P., Pinfield, S., Bianchi, G. AI-assisted peer review. Humanit. Social science. common. 825 (2021).

Article Google Scholar

Kousha, K. & Thelwall, M. Artificial intelligence supporting publishing and peer review: An overview and review. learn. Publications. 374–12 (2024).

Article Google Scholar

Goldberg, A. et al. The usefulness of LLM as an author checklist assistant for scientific papers: A NeurIPS’24 experiment. Preprint available at https://doi.org/10.48550/arXiv.2411.03417 (2024).

Su, X., Wambsganss, T., Rietche, R., Neshaei, SP & Käser, T. Reviewriter: AI-generated instructions for peer-review writing. in 18th Workshop on Innovative Use of NLP for Building Procedures Educational Applications (ed. E. Kochmar) 57–71 (ACL, 2023).

D’Arcy, M., Hope, T., Birnbaum, L. & Downey, D. MARG: Multi-agent review generation for scientific articles. Preprint available at https://doi.org/10.48550/arXiv.2401.04259 (2024).

GPT-4 Technical Report (OpenAI, 2024).

Goldberg, A. et al. Peer review of peer review: Randomized controlled trials and other experiments. PLoS ONE 20e0320444 (2025).

Article Google Scholar

Kocak, B., Onur, MR, Park, SH, Baltzer, P. & Dietzel, M. Ensuring peer review integrity in the era of large language models: A critical inventory of challenges, red flags, and recommendations. EUR. J. Radiol. Artif. intelligence. 2100018 (2025).

Article Google Scholar

Yes, R. et al. Have you arrived yet? Clarifying the risks of using large-scale language models in academic peer review. Preprint available at https://doi.org/10.48550/arXiv.2412.01708 (2024).

Shin, H. et al. Beware of blind spots: A focus-level assessment framework for LLM reviews. in Conference on Empirical Methods in Procedural Natural Language Processing 35630–35656 (EMNLP, 2025).

Luo, M. et al. Benchmarking peer-review harm detection: A challenging task using a new dataset. Preprint available at https://doi.org/10.48550/arXiv.2502.01676 (2025).

Tamkin, A. et al. Clio: Privacy-preserving insights into real-world AI use. Preprint available at https://doi.org/10.48550/arXiv.2412.13678 (2024).

Saad-Falcon, J. et al. LMUnit: Fine-grained evaluation with natural language unit testing. in Computational Linguistics Association survey results 3303–3324 (ACL, 2025).

Prasad, A., Stengel-Eskin, E., Chen, JC-Y., Khan, Z., Bansal, M. Learn to generate unit tests for automated debugging. Preprint available at https://doi.org/10.48550/arXiv.2502.01619 (2025).

Charlin, L., Zemel, RS & Boutilier, C. A framework for optimizing article matching. in Procedures 27th Conference on Uncertainty in Artificial Intelligence 1186–95 (AUAI Press, 2011).

ICML 2023 Examiner Tutorial (ICML 2023 Program Committee, 2023).

How to be a good reviewer? ICML 2022 Reviewer Tutorial (ICML 2022 Program Chair, 2022).

Last minute review advice (ACL PC Chair, 2017).

Baldenegro, M. LXCV @ CVPR 2021 Reviewer Mentoring Program: and How to Write a Good Review. Presentation at LatinX in Computer Vision (LXCV) Workshop, CVPR 2021 (2021).

Rogers, A. ARR reviewer guidelines (Computational Linguistics Association, 2021).

Silbiger, New Jersey and Billboard Stabler Unprofessional peer review unfairly harms underrepresented groups within STEM. Peer J 7e8247 (2019).

Article Google Scholar

Feniak, M. et al. PyPDF library. https://pypi.org/project/pypdf/ (2024).

Ribeiro, MT & Lundberg, S. Test language models (and prompts) like you would test software (Medium, 2023).

Thakkar, N. zou-group/review_feedback_agent: First release. Zenodo https://doi.org/10.5281/zenodo.17903957 (2025).

Source link

Binance美国注册 commented on Meta’s Mark Zuckerberg on Threads, the future of AI, and Quest 3: Your article helped me a lot, is there any more re
binance us register commented on Campfire brings design review to Quest 3, adds AI assistant: Can you be more specific about the content of your
gate io commented on Over two-thirds of IT leaders concerned about deepfake attacks: Thank you for your sharing. I am worried that I la
Registrera commented on Cloud Trends and Cybersecurity Challenges: Navigating the Future | Data Center Knowledge: Thank you for your sharing. I am worried that I la
Binance推荐码 commented on BITS Pilani unveils ‘Rakesh Kapoor Innovation Centre’; aims to revolutionise future of education: Thanks for sharing. I read many of your blog posts

A large-scale randomized study of large-scale language model feedback in peer review

RECENT POSTS

6 Artificial Intelligence (AI) Jobs to Consider in 2026

Small data, big maps: Training geospatial ML models when you lack samples

40% of Cannes Lions entries will use AI by 2026, but advanced applications are still far behind

Related Posts