Chinese Researchers Propose a Generation and Editing Approach That Uses Results from Running Code Generated from LLMs to Improve Code Quality for Competitive Programming Tasks

Screenshot 2023-05-18 at 4.33.30 PM — https://arxiv.org/abs/2305.04087

Researchers take inspiration from the human programming process to help LLMs perform better in competitive programming jobs. Competitive programming jobs have recently been applied to large-scale language models. This task requires an accurate implementation of hundreds of lines of solutions and the use of sample test cases to explain the problem in sophisticated natural language. Executing the solution against hidden test cases enables evaluation of the solution. However, his current LLM accuracy and pass rate could be even higher for this purpose. For example, in the APPS test, a widely used competitive programming benchmark, the de facto strongest model, GPT3, is only 7% accurate.

Programmers often develop an initial program, run some sample test cases, and then modify the code according to the results of the tests while solving a competitive programming problem. During this step, programmers may use important information from test results to troubleshoot the software. They implement this concept using a workflow comparable to neural-based editors. We inspected the code generated by the pre-trained LLM and found that some of the generated code could be enhanced with minor tweaks.

You’ll know that the error message identifies your coding mistake, so you can quickly fix the problem. This allows him to consider editing methods and use the execution results to improve the quality of the code generated by LLM. In this study, a Peking University researcher proposes a unique generation and editing approach to improve his LLM in competitive programming tasks. In their method, he uses LLM functionality in three phases to emulate the behavior of the human programmer described above.

Generation using LLM. They use huge language models like black box generators to create programs based on problem descriptions.
execution. Run your code using LLM on a sample test case and get the execution results. You can also provide the execution result template as an additional comment and include data to help you make changes.
edit. They create a fault-aware neural code editor that uses the generated code and additional comments as input to improve the code. Their code editor strives to increase the power and accuracy of LLM-based code generation.

🚀 Check out 100’s of AI Tools at the AI Tools Club

They conduct in-depth research on APPS and HumanEval public competition programming benchmarks. To demonstrate universality, they applied the methodology to nine well-known LLMs with parameter values ranging from 110M to 175B. Their strategy dramatically improves LLM performance. Notably, their method improves the average pass@1 for APPS-dev and APPS-test by 89% and 31%, respectively. Their small editor model can boost pass@1 from 26.6% to 32.4% in APPS-dev tests, even with the largest language model used, GPT3-175B. They proved the method’s transferability in an out-of-distribution benchmark by improving pass@1 average by 48% on a new kind of dataset called HumanEval. Recently, various methods of post-processing programs created by LLM have been proposed.

These methods perform extensive LLM sampling, rerank the sampled programs, and produce the final program. In contrast, their strategy has two advantages. One is to keep the sample budget constant, greatly reducing the computational load of LLM. Their editors modify the program directly and perform better than these reranking-based techniques, especially for limited sample budgets like pass@1. They are, to their knowledge, the first company to use edit-based post-processing techniques for programming contests.

Below is a list of contributors.

• They propose a way to generate and edit huge language models to generate high quality code for difficult programming tasks.

• Create a fault-aware neural code editor that uses error messages and generates code as input to improve code accuracy and quality.

• They conducted trials using 2 well-known datasets and 9 LLMs to demonstrate the effectiveness and applicability of the strategy.

Please check paper.don’t forget to join 21,000+ ML SubReddit, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email me. Asif@marktechpost.com

🚀 Check out 100’s of AI Tools at the AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his Bachelor of Science in Data Science and Artificial Intelligence from the Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is in image processing and he is passionate about building solutions around it. He loves connecting with people and collaborating on interesting projects.

➡️ Introducing Bright Data: The World’s #1 Web Data Platform

Source link