bnew

Veteran
Joined
Nov 1, 2015
Messages
56,124
Reputation
8,239
Daps
157,823

Computer Science > Software Engineering​

[Submitted on 26 Oct 2023]

CodeFusion: A Pre-trained Diffusion Model for Code Generation​

Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen
Imagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. We evaluate CodeFusion on the task of natural language to code generation for Bash, Python, and Microsoft Excel conditional formatting (CF) rules. Experiments show that CodeFusion (75M parameters) performs on par with state-of-the-art auto-regressive systems (350M-175B parameters) in top-1 accuracy and outperforms them in top-3 and top-5 accuracy due to its better balance in diversity versus quality.
Comments:EMNLP 2023, 12 pages
Subjects:Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
Cite as:arXiv:2310.17680 [cs.SE]
(or arXiv:2310.17680v1 [cs.SE] for this version)
https://doi.org/10.48550/arXiv.2310.17680
Focus to learn more

Submission history​

From: Mukul Singh [view email]
[v1] Thu, 26 Oct 2023 11:06:15 UTC (463 KB)


https://arxiv.org/pdf/2310.17680.pdf









AI summary of summary

In simple terms, CodeFusion is a tool that helps developers generate whole programs or functions based on given instructions, without having to write everything from scratch every time. It's like an assistant that listens to your requests and suggests possible solutions, rather than just suggesting one specific thing at a time as some other tools might do. The tool has already been trained on lots of examples, so it knows what makes sense in different programming languages like Bash, Python, and Microsoft Excel. When you give it a natural language instruction, CodeFusion generates a partially completed program, but then keeps improving it until it reaches a high-quality solution. This process involves "denoising" or removing random noise from the program, which allows CodeFusion to consider all previous steps when deciding on each new step. Compared to some other popular tools, CodeFusion tends to suggest more diverse options while still maintaining good overall quality. So, instead of always generating exactly the same solution, CodeFusion offers multiple possibilities that are all likely to be helpful. Overall, CodeFusion aims to make coding easier and faster, especially for tasks where small changes need to be made repeatedly.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,124
Reputation
8,239
Daps
157,823

TECH

Biden issues U.S.′ first AI executive order, requiring safety assessments, civil rights guidance, research on labor market impact​

PUBLISHED MON, OCT 30 20235:17 AM EDTUPDATED 2 HOURS AGO
thumbnail
Hayden Field@HAYDENFIELD
thumbnail
Lauren Feiner@LAUREN_FEINER

KEY POINTS
  • U.S. President Joe Biden unveiled a new executive order on artificial intelligence.
  • It’s the U.S. government’s first action of its kind, requiring new safety assessments, equity and civil rights guidance and research on AI’s impact on the labor market.
  • The order builds on voluntary commitments the White House previously secured from leading AI companies and represents the first major binding government action on the technology.
SAN FRANCISCO, CALIFORNIA - JUNE 20: President Joe Biden speaks as he meets with AI experts and researchers at the Fairmont Hotel in San Francisco, Calif., on Tuesday, June 20, 2023. (Jane Tyska/Digital First Media/East Bay Times via Getty Images)

President Joe Biden speaks as he meets with AI experts and researchers at the Fairmont Hotel in San Francisco, California, June 20, 2023.
Jane Tyska | Medianews Group | Getty Images

President Joe Biden issued a new executive order on artificial intelligence — the U.S. government’s first action of its kind — requiring new safety assessments, equity and civil rights guidance and research on AI’s impact on the labor market.

While law enforcement agencies have warned that they’re ready to apply existing law to abuses of AI and Congress has endeavored to learn more about the technology to craft new laws, the executive order could have a more immediate impact. Like all executive orders, it “has the force of law,” according to a senior administration official who spoke with reporters on a call Sunday.

The White House breaks the key components of the executive order into eight parts:
  • Creating new safety and security standards for AI, including by requiring some AI companies to share safety test results with the federal government, directing the Commerce Department to create guidance for AI watermarking, and creating a cybersecurity program that can make AI tools that help identify flaws in critical software.
  • Protecting consumer privacy, including by creating guidelines that agencies can use to evaluate privacy techniques used in AI.
  • Advancing equity and civil rights by providing guidance to landlords and federal contractors to help avoid AI algorithms furthering discrimination, and creating best practices on the appropriate role of AI in the justice system, including when it’s used in sentencing, risk assessments and crime forecasting.
  • Protecting consumers overall by directing the Department of Health and Human Services to create a program to evaluate potentially harmful AI-related health-care practices and creating resources on how educators can responsibly use AI tools.
  • Supporting workers by producing a report on the potential labor market implications of AI and studying the ways the federal government could support workers affected by a disruption to the labor market.
  • Promoting innovation and competition by expanding grants for AI research in areas such as climate change and modernizing the criteria for highly skilled immigrant workers with key expertise to stay in the U.S.
  • Working with international partners to implement AI standards around the world.
  • Developing guidance for federal agencies’ use and procurement of AI and speeding up the government’s hiring of workers skilled in the field.

The order represents “the strongest set of actions any government in the world has ever taken on AI safety, security, and trust,” White House Deputy Chief of Staff Bruce Reed said in a statement.

It builds on voluntary commitments the White House previously secured from leading AI companies and represents the first major binding government action on the technology. It also comes ahead of an AI safety summit hosted by the U.K.

The senior administration official referenced the fact that 15 major American technology companies have agreed to implement voluntary AI safety commitments but said that it “is not enough” and that Monday’s executive order is a step toward concrete regulation for the technology’s development.
“The President, several months ago, directed his team to pull every lever, and that’s what this order does: bringing the power of the federal government to bear in a wide range of areas to manage AI’s risk and harness its benefits,” the official said.

Biden’s executive order requires that large companies share safety test results with the U.S. government before the official release of AI systems. It also prioritizes the National Institute of Standards and Technology’s development of standards for AI “red-teaming,” or stress-testing the defenses and potential problems within systems. The Department of Commerce will develop standards for watermarking AI-generated content.

The order also addresses training data for large AI systems, and it lays out the need to evaluate how agencies collect and use commercially available data, including data purchased from data brokers, especially when that data involves personal identifiers.

The Biden administration is also taking steps to beef up the AI workforce. Beginning Monday, the senior administration official said, workers with AI expertise can find relevant openings in the federal government on AI.gov.

The administration official said Sunday that the “most aggressive” timing for some safety and security aspects of the order involves a 90-day turnaround, and for some other aspects, that time frame could be closer to a year.

Building on earlier AI actions​

Monday’s executive order follows a number of steps the White House has taken in recent months to create spaces to discuss the pace of AI development, as well as proposed guidelines.

Since the viral rollout of ChatGPT in November 2022 — which within two months became the fastest-growing consumer application in history, according to a UBS study — the widespread adoption of generative AI has already led to public concerns, legal battles and lawmaker questions. For instance, days after Microsoft folded ChatGPT into its Bing search engine, it was criticized for toxic speech, and popular AI image generators have come under fire for racial bias and propagating stereotypes.

Biden’s executive order directs the Department of Justice, as well as other federal offices, to develop standards for “investigating and prosecuting civil rights violations related to AI,” the administration official said Sunday on the call with reporters.

“The President’s executive order requires a clear guidance must be provided to landlords, federal benefits programs and federal contractors to keep AI algorithms from being used to exacerbate discrimination,” the official added.

In August, the White House challenged thousands of hackers and security researchers to outsmart top generative AI models from the field’s leaders, including OpenAI, Google, Microsoft, Meta and Nvidia. The competition ran as part of Def Con, the world’s largest hacking conference.

“It is accurate to call this the first-ever public assessment of multiple LLMs,” a representative for the White House Office of Science and Technology Policy told CNBC at the time.

The competition followed a July meeting between the White House and seven top AI companies, including Alphabet, Microsoft, OpenAI, Amazon, Anthropic, Inflection and Meta. Each of the companies left the meeting having agreed to a set of voluntary commitments in developing AI, including allowing independent experts to assess tools before public debut, researching societal risks related to AI and allowing third parties to test for system vulnerabilities, such as in the competition at Def Con.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,124
Reputation
8,239
Daps
157,823


FUTURE FORECAST

10:44 AM by NOOR AL-SIBAI

Google AI Chief Says There’s a 50% Chance We’ll Hit AGI in Just 5 Years​

"I think it's entirely plausible."​

/ Artificial Intelligence/ Agi/ Deep Mind/ Google
Getty Images


Image by Getty Images

More than a decade ago, the co-founder of Google's DeepMind artificial intelligence lab predicted that by 2028, AI will have a half-and-half shot of being about as smart as humans — and now, he's holding firm on that forecast.

In an interview with tech podcaster Dwarkesh Patel, DeepMind co-founder Shane Legg said that he still thinks that researchers have a 50-50 chance of achieving artificial general intelligence (AGI), a stance he publicly announced at the very end of 2011 on his blog.

It's a notable prediction considering the exponentially growing interest in the space. OpenAI CEO Sam Altman has long advocated for an AGI, a hypothetical agent that is capable of accomplishing intellectual tasks as well as a human, that can be of benefit to all. But whether we'll ever be able to get to that point — let alone agree on one definition of AGI — remains to be seen.

Legg apparently began looking towards his 2028 goalpost all the way back in 2001 after reading "The Age of Spiritual Machines," the groundbreaking 1999 book by fellow Google AI luminary Ray Kurzweil that predicts a future of superhuman AIs.

"There were two really important points in his book that I came to believe as true," he explained. "One is that computational power would grow exponentially for at least a few decades. And that the quantity of data in the world would grow exponentially for a few decades."

Paired with an understanding of the trends of the era, such as the deep learning method of teaching algorithms to "think" and process data the way human brains do, Legg wrote back at the start of the last decade that in the coming ones, AGI could well be achieved — so long as "nothing crazy happens like a nuclear war."

Today, the DeepMind co-founder said that there are caveats to his prediction that the AGI era will be upon us by the end of this decade.

The first, broadly, is that definitions of AGI are reliant on definitions of human intelligence — and that kind of thing is difficult to test precisely because the way we think is complicated.

"You'll never have a complete set of everything that people can do," Legg said — things like developing episodic memory, or the ability to recall complete "episodes" that happened in the past, or even understanding streaming video. But if researchers could assemble a battery of tests for human intelligence and an AI model were to perform well enough against them, he continued, then "you have an AGI."

When Patel asked if there could be a single simple test to see whether an AI system had reached general intelligence, such as beating Minecraft, Legg pushed back.

"There is no one thing that would do it, because I think that's the nature of it," the AGI expert said. "It's about general intelligence. So I'd have to make sure [an AI system] could do lots and lots of different things and it didn't have a gap."

The second biggest caveat, Legg added, was the ability to scale AI training models way, way up — a worthy point given how much energy AI companies are already using to churn out large language models like OpenAI's GPT-4.

"There's a lot of incentive to make a more scalable algorithm to harness all this computing data," Legg explained. "So I thought it would be very likely that we'll start to discover scalable algorithms to do this."

Asked where he thought we stand today on the path to AGI, Legg said that he thinks computational power is where it needs to be to make it happen, and the "first unlocking step" would be to "start training models now with the scale of the data that is beyond what a human can experience in a lifetime" — a feat he believes the AI industry is ready to achieve.

All that said, Legg reiterated his personal stance that he only believes there's a 50 percent chance researchers will achieve AGI before the end of this decade, and Futurism has reached out to DeepMind to see if the Google subsidiary has anything to add to that prognosis.

"I think it's entirely plausible," he said, "but I'm not going to be surprised if it doesn't happen by then."
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,124
Reputation
8,239
Daps
157,823

Artists Lose First Round of Copyright Infringement Case Against AI Art Generators​

While a federal judge advanced an infringement claim against Stability AI, he dismissed the rest of the lawsuit.

BY WINSTON CHO


Plus Icon

OCTOBER 30, 2023 4:57PM
Artificial intelligence systems and fine arts in the future. Robot drawing a men portrait.
Logo text


Artists suing generative artificial intelligence art generators have hit a stumbling block in a first-of-its-kind lawsuit over the uncompensated and unauthorized use of billions of images downloaded from the internet to train AI systems, with a federal judge’s dismissal of most claims.


U.S. District Judge William Orrick on Monday found that copyright infringement claims cannot move forward against Midjourney and DeviantArt, concluding the accusations are “defective in numerous respects.” Among the issues are whether the AI systems they run on actually contain copies of copyrighted images that were used to create infringing works and if the artists can substantiate infringement in the absence of identical material created by the AI tools. Claims against the companies for infringement, right of publicity, unfair competition and breach of contract were dismissed, though they will likely be reasserted.

Notably, a claim for direct infringement against Stability AI was allowed to proceed based on allegations the company used copyrighted images without permission to create Stable Diffusion. Stability has denied the contention that it stored and incorporated those images into its AI system. It maintains that training its model does not include wholesale copying of works but rather involves development of parameters — like lines, colors, shades and other attributes associated with subjects and concepts — from those works that collectively define what things look like. The issue, which may decide the case, remains contested.


The litigation revolves around Stability’s Stable Diffusion, which is incorporated into the company’s AI image generator DreamStudio. In this case, the artists will have to establish that their works were used to train AI system. It’s alleged that DeviantArt’s DreamUp and Midjourney are powered by Stable Diffusion. A major hurdle artists face is that training datasets are largely a black box.


In his dismissal of infringement claims, Orrick wrote that plaintiffs’ theory is “unclear” as to whether there are copies of training images stored in Stable Diffusion that are utilized by DeviantArt and Midjourney. He pointed to the defense’s arguments that it’s impossible for billions of images “to be compressed into an active program,” like Stable Diffusion.

“Plaintiffs will be required to amend to clarify their theory with respect to compressed copies of Training Images and to state facts in support of how Stable Diffusion – a program that is open source, at least in part – operates with respect to the Training Images,” stated the ruling.


Orrick questioned whether Midjourney and DeviantArt, which offers use of Stable Diffusion through their own apps and websites, can be liable for direct infringement if the AI system “contains only algorithms and instructions that can be applied to the creation of images that include only a few elements of a copyrighted” work.


The judge stressed the absence of allegations of the companies playing an affirmative role in the alleged infringement. “Plaintiffs need to clarify their theory against Midjourney — is it based on Midjourney’s use of Stable Diffusion, on Midjourney’s own independent use of Training Images to train the Midjourney product, or both?” Orrick wrote.


According to the order, the artists will also likely have to show proof of infringing works produced by AI tools that are identical to their copyrighted material. This potentially presents a major issue because they have conceded that “none of the Stable Diffusion output images provided in response to a particular Text Prompt is likely to be a close match for any specific image in the training data.”

“I am not convinced that copyright claims based a derivative theory can survive absent ‘substantial similarity’ type allegations,” the ruling stated.

Though defendants made a “strong case” that claim should be dismissed without an opportunity to be reargued, Orrick noted artists’ contention that AI tools can create material that are similar enough to their work to be misconstrued as fakes.


Claims for vicarious infringement, violations of the Digital Millenium Copyright Act for removal of copyright management information, right of publicity, breach of contract and unfair competition were similarly dismissed.

“Plaintiffs have been given leave to amend to clarify their theory and add plausible facts regarding “compressed copies” in Stable Diffusion and how those copies are present (in a manner that violates the rights protected by the Copyright Act) in or invoked by the DreamStudio, DreamUp, and Midjourney products offered to third parties,” Orrick wrote. “That same clarity and plausible allegations must be offered to potentially hold Stability vicariously liable for the use of its product, DreamStudio, by third parties.”


Regarding the right of publicity claim, which takes issue with defendants profiting off of plaintiffs’ names by allowing users to request art in their style, the judge stressed that there’s not enough information supporting arguments that the companies used artists’ identities to advertise products.


Two of the three artists who filed the lawsuit have dropped their infringement claims because they didn’t register their work with the copyright office before suing. The copyright claims will be limited to artist Sarah Anderson’s works, which she has registered. As proof that Stable Diffusion was trained on her material, Anderson relied on the results of a search of her name on haveibeentrained.com, which allows artists to discover if their work has been used in AI model training and offers an opt-out to help prevent further unauthorized use.

“While defendants complain that Anderson’s reference to search results on the ‘haveibeentrained’ website is insufficient, as the output pages show many hundreds of works that are not identified by specific artists, defendants may test Anderson’s assertions in discovery,” the ruling stated.


Stability, DeviantArt and Midjourney didn’t respond to requests for comment.


On Monday, President Joe Biden issued an executive order to create some safeguards against AI. While it mostly focuses on reporting requirements over the national security risks some companies’ systems present, it also recommends the watermarking of photos, video and audio developed by AI tools to protect against deep fakes. Biden, at a signing of the order, stressed the technology’s potential to “smear reputations, spread fake news and commit fraud.”
“The inclusion of copyright and intellectual property protection in the AI Executive Order reflects the importance of the creative community and IP-powered industries to America’s economic and cultural leadership,” said the Human Artistry Campaign in a statement.


At a meeting in July, leading AI companies voluntarily agreed to guardrails to manage the risks posed by the emerging technology in a bid by the White House to get the industry to regulate itself in the absence of legislation instituting limits around the development of the new tools. Like the executive order issued by Biden, it was devoid of any kind of reporting regime or timeline that could legally bind the firms to their commitments.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,124
Reputation
8,239
Daps
157,823
October 31, 2023

Phind Model beats GPT-4 at coding, with GPT-3.5-like speed and 16k context​

We're excited to announce that Phind now defaults to our own model that matches and exceeds GPT-4's coding abilities while running 5x faster. You can now get high quality answers for technical questions in 10 seconds instead of 50.​

The current 7th-generation Phind Model is built on top of our open-source CodeLlama-34B fine-tunes that were the first models to beat GPT-4's score on HumanEval and are still the best open source coding models overall by a wide margin.

  • The Phind Model V7 achieves 74.7% pass@1 on HumanEval
This new model has been fine-tuned on an additional 70B+ tokens of high quality code and reasoning problems and exhibits a HumanEval score of 74.7%. However, we've found that HumanEval is a poor indicator of real-world helpfulness. After deploying previous iterations of the Phind Model on our service, we've collected detailed feedback and noticed that our model matches or exceeds GPT-4's helpfulness most of the time on real-world questions. Many in our Discord community have begun using Phind exclusively with the Phind Model despite also having unlimited access to GPT-4.

One of the Phind Model's key advantages is that's very fast. We've been able to achieve a 5x speedup over GPT-4 by running our model on H100s using the new TensorRT-LLM library from NVIDIA, reaching 100 tokens per second single-stream.

Another key advantage of the Phind Model is context – it supports up to 16k tokens. We currently allow inputs of up to 12k tokens on the website and reserve the remaining 4k for web results.

There are still some rough edges with the Phind Model and we'll continue improving it constantly. One area where it still suffers is consistency — on certain challenging questions where it is capable of getting the right answer, the Phind Model might take more generations to get to the right answer than GPT-4.


 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,124
Reputation
8,239
Daps
157,823




Javi Lopez ⛩️
@javilopen
22h
22h
First of all, would you like to play the game?

Here's a link! (Currently, it doesn't work on mobile): bestaiprompts.art/angry-pump…

If you read the text below the game screen, which provides explanations, you'll see how you can create your own levels and play them! :smile:

Javi Lopez ⛩️
@javilopen
22h
22h
💡 Introduction

I have to admit, I'm genuinely blown away. Honestly, I never thought this would be possible. I truly believe we're living in a historic moment that we've only seen in sci-fi movies up until now.

These new work processes, where we can create anything using just natural language, are going to change the world as we know it.

It's such a massive tidal wave that those who don't see it coming will be hit hard.

So... let's start riding the wave!
Javi Lopez ⛩️
@javilopen
22h
22h
🎨 Graphics

This was the easiest part, after all, I've been generating images with AI for over a year and a half :smile: Here are all the prompts for your enjoyment!

👉 Title Screen (DALL·E 3 from GPT-4)

- "Photo of a horizontal vibrant home screen for a video game titled 'Angry Pumpkins'. The design is inspired by the 'Angry Birds' game aesthetic but different. Halloween elements like haunted houses, gravestones, and bats dominate the background. The game logo is prominently displayed at the center-top, with stylized pumpkin characters looking angry and ready for action on either side. A 'Play' button is located at the bottom center, surrounded by eerie mist."

👉 Backgrounds (Midjourney)

I used one image for the background (with several inpaintings):

- "Angry birds skyline in iPhone screenshot, Halloween Edition, graveyard, in the style of light aquamarine and orange, neo-traditionalist, kerem beyit, earthworks, wood, Xbox 360 graphics, light pink and navy --ar 8:5"

And another, cropped, for the ground:

- "2d platform, stone bricks, Halloween, 2d video game terrain, 2d platformer, Halloween scenario, similar to angry birds, metal slug Halloween, screenshot, in-game asset --ar 8:5"

👉 Characters (Midjourney)

- "Halloween pumpkin, in-game sprite but Halloween edition, simple sprite, 2d, white background"
- "Green Halloween monster, silly, amusing, in-game sprite but Halloween edition, simple sprite, 2d, white background"

👉 Objects (Midjourney)

I created various "sprite stylesheets" and then cropped and removed the background using Photoshop/Photopea. For small details, I used the inpainting of Midjourney.

- "Wooden box. Item assets sprites. White background. In-game sprites"
- "Skeleton bone. Large skeleton bone. Item assets sprites. White background. In-game sprites"
- "Rectangular stone. Item assets sprites. White background. In-game sprites"
- "Wooden box. Large skeleton bone. Item assets sprites. White background. In-game sprites"
- "Item assets sprites. Wooden planks. White background. In-game sprites. Similar to Angry Birds style"

Javi Lopez ⛩️
@javilopen
22h
22h
🤖 Programming (GPT-4)

🔗 Full source code here: bestaiprompts.art/angry-pump…

Although the game is just 600 lines of which I haven't written ANY, this was the most challenging part. As you can see, I got into adding many details like different particle effects, different types of objects, etc. And to this day, we're still not at a point where GPT-4 can generate an entire game with just a prompt. But I have no doubt that in the future we'll be able to create triple AAA video games just by asking for it.

Anyway, back to the present, the TRICK is to request things from GPT-4 iteratively. Actually, very similar to how a person would program it: Starting with a simple functional base and iterate, expand, and improve the code from there.

Let's see some tricks and prompts I used:

👉 Start with something simple

- "Can we now create a simple game using matter.js and p5.js in the style of "Angry Birds"? Just launch a ball with angle and force using the mouse and hit some stacked boxes with 2D physics.

👉 And from there, keep asking for more and more things. And every time something goes wrong, clearly explain the mistake and let it fix it. Patience! Examples:

- "Now, I ask you: do you know how the birds are launched in Angry Birds? What the finger does on the screen? Exactly. Add this to the game, using the mouse."
- "I have this error, please, fix it: Uncaught ReferenceError: Constraint is not defined"
- "I would like to make a torch with particle effects. Can it be done with p5.js? Make one, please."
- "Now, make the monsters circular, and be very careful: apply the same technique that already exists for the rectangular ones regarding scaling and collision area, and don't mess it up like before. 😂"

👉 This part took us (GTP-4 and me) many iterations and patience.

- "There's something off with the logic that calculates when there's a strong impact on a bug. If the impact is direct, it works well, but not if it's indirect. For example, if I place a rectangle over two bugs and drop a box on the rectangle, even though the bugs should be affected by the impact, they don't notice it. What can we do to ensure they also get affected when things fall on top of a body they are under?"
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,124
Reputation
8,239
Daps
157,823



When comparing two models, a common reference point of compute is often used.

If you trained a 7b model with 3x the number of tokens/compute to beat a 13b model, did you really beat it? Probably not. 😶

Here's a paper we wrote in 2021 (arxiv.org/abs/2110.12894) that I still found to be super relevant in 2023. @m__dehghani @giffmana

There's a lot of misnomers around efficiency and here's why it's super important to reason properly about this.

Some points:

1. Model comparisons can be tricky w.r.t to compute/efficiency. (e.g., FLOPs, throughput, parameters) can easily change the relative comparisons.

2. NLPers like to use number of parameters as an all encompassing absolute metric. There is a strong bias that size of the model is more important than anything else. Often times, the training FLOPs behind the model is not considered. Models are referred to as Model X 4B or Model Y 10B and not Model Z 6E22 flops. 😂

3. There are people that think that if you train for longer, it's a win. Sure, from an inference point of view, yes, maybe. But it doesn't necessarily mean the modeling behind or recipe is better. I've seen this being conflated so many times.

4. Sparsity of models is sometimes overlooked and very tricky to compare apples to apples. Models can have different FLOP-parameter ratio (encoder-decoder vs decoder, sparse vs dense, universal transformer etc). Many people actually don't know this about encoder-decoders.

5. In the past, I've seen works that claim huge efficiency gains by using "adapters" because there are far fewer "trainable parameters". It sounds impressive on paper and fools the unsuspecting reviewer since it obfuscates the fact that inference/training speed does not benefit from the same boost.

6. Methods can sometimes claim superior theoretical complexity but can be 10x slower in practice because of implementation details or hardware constraints. (many efficient attention methods did not take off because of this).

When writing this paper "the efficiency misnomer" ⬇️ , I think we all felt that a holistic view of compute metrics need to be taken into account.

TL;DR: parameter-matching < flop/throughput matching but taking into account everything holistically is important.

If you care about one thing, sure, optimise for that. But if you're making general model comparisons, this can make things confusing real fast.

It's a 2 year old paper, but still important especially so in this new age of LLM noise.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,124
Reputation
8,239
Daps
157,823


SEMICONDUCTORSNEWS

Nvidia Is Piloting a Generative AI for Its Engineers​

ChipNeMo summarizes bug reports, gives advice, and writes design-tool scripts​

SAMUEL K. MOORE

31 OCT 2023

3 MIN READ
Illustration of a green circuit board and chips attached to a brain with eyes.

NICHOLAS LITTLE
EDACHIP DESIGNNVIDIALLMSDEBUGGINGELECTRONIC DESIGN AUTOMATION

In a keynote address at the IEEE/ACM International Conference on Computer-Aided Design Monday, Nvidia chief technology officer Bill Dally revealed that the company has been testing a large-language-model AI to boost the productivity of its chip designers.
“Even if we made them 5 percent more productive, that’s a huge win,” Dally said in an interview ahead of the conference. Nvidia can’t claim it’s reached that goal yet. The system, called ChipNeMo, isn’t ready for the kind of large—and lengthy—trial that would really prove its worth. But a cadre of volunteers at Nvidia is using it, and there are some positive indications, Dally said.

ChipNeMo is a specially tuned spin on a large language model. It starts as an LLM made up of 43 billion parameters that acquires its skills from one trillion tokens—fundamental language units—of data. “That’s like giving it a liberal arts education,” said Dally. “But if you want to send it to graduate school and have it become specialized, you fine-tune it on a particular corpus of data…in this case, chip design.”

That took two more steps. First, that already-trained model was trained again on 24 billion tokens of specialized data. Twelve billion of those tokens came from design documents, bug reports, and other English-language internal data accumulated over Nvidia’s 30 years work designing chips. The other 12 billion tokens came from code, such as the hardware description language Verilog and scripts for carrying things out with industrial electronic design automation (EDA) tools. Finally, the resulting model was submitted to “supervised fine-tuning,” training on 130,000 sample conversations and designs.

The result, ChipNeMo, was set three different tasks: as a chatbot, as an EDA-tool script writer, and as a summarizer of bug reports.

Acting as a chatbot for engineers could save designers time, said Dally. “Senior designers spend a lot of time answering questions for junior designers,” he said. As a chatbot, the AI can save senior designer’s time by answering questions that require experience, like what a strange signal might mean or how a specific test should be run.

Chatbots, however, are notorious for their willingness to lie when they don’t know the answer and their tendency to hallucinate. So Nvidia developers integrated a function called retrieval-augmented generation into ChipNeMo to keep it on the level. That function forces the AI to retrieve documents from Nvidia’s internal data to back up its suggestions.

The addition of retrieval-augmented generation “improves the accuracy quite a bit,” said Dally. “More importantly, it reduces hallucination.”

In its second application, ChipNeMo helped engineers run tests on designs and parts of them. “We use many design tools,” said Dally. “These tools are pretty complicated and typically involve many lines of scripting.” ChipNeMo simplifies the designer’s job by providing a “very natural human interface to what otherwise would be some very arcane commands.”

ChipNeMo’s final use case, analyzing and summarizing bug reports, “is probably the one where we see the prospects for the most productivity gain earliest,” said Dally. When a test fails, he explained, it gets logged into Nvidia’s internal bug-report system, and each report can include pages and pages of detailed data. Then an “ARB” (short for “action required by”) is sent to a designer for a fix, and the clock starts ticking.

ChipNeMo summarizes the bug report’s many pages into as little as a single paragraph, speeding decisions. It even can write that summary in two modes: one for the engineer and one for the manager.

Makers of chip-design tools, such as Synopsys and Cadence, have been diving into integration of AI into their systems. But according to Dally, they won’t be able to achieve the same thing Nvidia is after.
“The thing that enables us to do this is 30 years of design documents and code in a database,” he said. ChipNeMo is learning “from the entire experience of Nvidia.” EDA companies just don’t have that kind of data.

FROM YOUR SITE ARTICLES
 
Top