Computer Science > Software Engineering
[Submitted on 26 Oct 2023]CodeFusion: A Pre-trained Diffusion Model for Code Generation
Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust VerbruggenImagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. We evaluate CodeFusion on the task of natural language to code generation for Bash, Python, and Microsoft Excel conditional formatting (CF) rules. Experiments show that CodeFusion (75M parameters) performs on par with state-of-the-art auto-regressive systems (350M-175B parameters) in top-1 accuracy and outperforms them in top-3 and top-5 accuracy due to its better balance in diversity versus quality.
Comments: | EMNLP 2023, 12 pages |
Subjects: | Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL) |
Cite as: | arXiv:2310.17680 [cs.SE] |
(or arXiv:2310.17680v1 [cs.SE] for this version) | |
https://doi.org/10.48550/arXiv.2310.17680 Focus to learn more |
Submission history
From: Mukul Singh [view email][v1] Thu, 26 Oct 2023 11:06:15 UTC (463 KB)
https://arxiv.org/pdf/2310.17680.pdf
AI summary of summary
In simple terms, CodeFusion is a tool that helps developers generate whole programs or functions based on given instructions, without having to write everything from scratch every time. It's like an assistant that listens to your requests and suggests possible solutions, rather than just suggesting one specific thing at a time as some other tools might do. The tool has already been trained on lots of examples, so it knows what makes sense in different programming languages like Bash, Python, and Microsoft Excel. When you give it a natural language instruction, CodeFusion generates a partially completed program, but then keeps improving it until it reaches a high-quality solution. This process involves "denoising" or removing random noise from the program, which allows CodeFusion to consider all previous steps when deciding on each new step. Compared to some other popular tools, CodeFusion tends to suggest more diverse options while still maintaining good overall quality. So, instead of always generating exactly the same solution, CodeFusion offers multiple possibilities that are all likely to be helpful. Overall, CodeFusion aims to make coding easier and faster, especially for tasks where small changes need to be made repeatedly.
Last edited: