The A.I Megathread (LLM , GPT , Development)

bnew · Apr 30, 2024

Biden-Harris Administration Announces Key AI Actions 180 Days Following President Biden’s Landmark Executive Order | The White House

Six months ago, President Biden issued a landmark Executive Order to ensure that America leads the way in seizing the promise and managing the risks of artificial intelligence (AI). Since then, agencies all across government have taken vital steps to manage AI’s safety and security risks...

www.whitehouse.gov

Biden-

Six months ago, President Biden issued a landmark Executive Order to ensure that America leads the way in seizing the promise and managing the risks of artificial intelligence (AI). Since then, agencies all across government have taken vital steps to manage AI’s safety and security risks, protect Americans’ privacy, advance equity and civil rights, stand up for consumers and workers, promote innovation and competition, advance American leadership around the world, and more.

Today, federal agencies reported that they completed all of the 180-day actions in the E.O. on schedule, following their recent successes completing each 90-day, 120-day, and 150-day action on time. Agencies also progressed on other work tasked by the E.O. over longer timeframes.

Actions that agencies reported today as complete include the following:

Managing Risks to Safety and Security:

Over 180 days, the Executive Order directed agencies to address a broad range of AI’s safety and security risks, including risks related to dangerous biological materials, critical infrastructure, and software vulnerabilities. To mitigate these and other threats to safety, agencies have:

Established a framework for nucleic acid synthesis screening to help prevent the misuse of AI for engineering dangerous biological materials. This work complements in-depth study by the Department of Homeland Security (DHS), Department of Energy (DOE) and Office of Science and Technology Policy on AI’s potential to be misused for this purpose, as well as a DHS report that recommended mitigations for the misuse of AI to exacerbate chemical and biological threats. In parallel, the Department of Commerce has worked to engage the private sector to develop technical guidance to facilitate implementation. Starting 180 days after the framework is announced, agencies will require that grantees obtain synthetic nucleic acids from vendors that screen.
Released for public comment draft documents on managing generative AI risks, securely developing generative AI systems and dual-use foundation models, expanding international standards development in AI, and reducing the risks posed by AI-generated content. When finalized, these documents by the National Institute of Standards and Technology (NIST) will provide additional guidance that builds on NIST’s AI Risk Management Framework, which offered individuals, organizations, and society a framework to manage AI risks and has been widely adopted both in the U.S. and globally.
Developed the first AI safety and security guidelines for critical infrastructure owners and operators. These guidelines are informed by the completed work of nine agencies to assess AI risks across all sixteen critical infrastructure sectors.
Launched the AI Safety and Security Board to advise the Secretary of Homeland Security, the critical infrastructure community, other private sector stakeholders, and the broader public on the safe and secure development and deployment of AI technology in our nation’s critical infrastructure. The Board’s 22 inaugural members include representatives from a range of sectors, including software and hardware company executives, critical infrastructure operators, public officials, the civil rights community, and academia.
Piloted new AI tools for identifying vulnerabilities in vital government software systems. The Department of Defense (DoD) made progress on a pilot for AI that can find and address vulnerabilities in software used for national security and military purposes. Complementary to DoD’s efforts, DHS piloted different tools to identify and close vulnerabilities in other critical government software systems that Americans rely on every hour of every day.

Standing up for Workers, Consumers, and Civil Rights

The Executive Order directed bold steps to mitigate other risks from AI—including risks to workers, to consumers, and to Americans’ civil rights—and ensure that AI’s development and deployment benefits all Americans. Today, agencies reported that they have:

Developed bedrock principles and practices for employers and developers to build and deploy AI safely and in ways that empower workers. Agencies all across government are now starting work to establish these practices as requirements, where appropriate and authorized by law, for employers that receive federal funding.
Released guidance to assist federal contractors and employers comply with worker protection laws as they deploy AI in the workplace. The Department of Labor (DOL) developed a guide for federal contractors and subcontractors to answer questions and share promising practices to clarify federal contractors’ legal obligations, promote equal employment opportunity, and mitigate the potentially harmful impacts of AI in employment decisions. DOL also provided guidance regarding the application of the Fair Labor Standards Act and other federal labor standards as employers increasingly use of AI and other automated technologies in the workplace.
Released resources for job seekers, workers, and tech vendors and creators on how AI use could violate employment discrimination laws. The Equal Employment Opportunity Commission’s resources clarify that existing laws apply the use of AI and other new technologies in employment just as they apply to other employment practices.
Issued guidance on AI’s nondiscriminatory use in the housing sector. In two guidance documents, the Department of Housing and Urban Development affirmed that existing prohibitions against discrimination apply to AI’s use for tenant screening and advertisement of housing opportunities, and it explained how deployers of AI tools can comply with these obligations.
Published guidance and principles that set guardrails for the responsible and equitable use of AI in administering public benefits programs. The Department of Agriculture’s guidance explains how State, local, Tribal, and territorial governments should manage risks for uses of AI and automated systems in benefits programs such as SNAP. The Department of Health and Human Services (HHS) released a plan with guidelines on similar topics for benefits programs it oversees. Both agencies’ documents prescribe actions that align with the Office of Management and Budget’s policies, published last month, for federal agencies to manage risks in their own use of AI and harness AI’s benefits.
Announced a final rule clarifying that nondiscrimination requirements in health programs and activities continue to apply to the use of AI, clinical algorithms, predictive analytics, and other tools. Specifically, the rule applies the nondiscrimination principles under Section 1557 of the Affordable Care Act to the use of patient care decision support tools in clinical care, and it requires those covered by the rule to take steps to identify and mitigate discrimination when they use AI and other forms of decision support tools for care.
Developed a strategy for ensuring the safety and effectiveness of AI deployed in the health care sector. The strategy outlines rigorous frameworks for AI testing and evaluation, and it outlines future actions for HHS to promote responsible AI development and deployment.

Harnessing AI for Good

President Biden’s Executive Order also directed work to seize AI’s enormous promise, including by advancing AI’s use for scientific research, deepening collaboration with the private sector, and piloting uses of AI. Over the past 180 days, agencies have done the following:

Announced DOE funding opportunities to support the application of AI for science, including energy-efficient AI algorithms and hardware.
Prepared convenings for the next several months with utilities, clean energy developers, data center owners and operators, and regulators in localities experiencing large load growth. Today, DOE announced new actions to assess the potential energy opportunities and challenges of AI, accelerate deployment of clean energy, and advance AI innovation to manage the growing energy demand of AI.
Launched pilots, partnerships, and new AI tools to address energy challenges and advance clean energy. For example, DOE is piloting AI tools to streamline permitting processes and improving siting for clean energy infrastructure, and it has developed other powerful AI tools with applications at the intersection of energy, science, and security. Today, DOE also published a report outlining opportunities AI brings to advance the clean energy economy and modernize the electric grid.
Initiated a sustained effort to analyze the potential risks that deployment of AI may pose to the grid. DOE has started the process of convening energy stakeholders and technical experts over the coming months to collaboratively assess potential risks to the grid, as well as ways in which AI could potentially strengthen grid resilience and our ability to respond to threats—building off a new public assessment.
Authored a report on AI’s role in advancing scientific research to help tackle major societal challenges, written by the President’s Council of Advisors on Science and Technology.

Bringing AI Talent into Government

The AI and Tech Talent Task Force has made substantial progress on hiring through the AI Talent Surge. Since President Biden signed the E.O., federal agencies have hired over 150 AI and AI-enabling professionals and, along with the tech talent programs, are on track to hire hundreds by Summer 2024. Individuals hired thus far are already working on critical AI missions, such as informing efforts to use AI for permitting, advising on AI investments across the federal government, and writing policy for the use of AI in government.

The General Services Administration has onboarded a new cohort of Presidential Innovation Fellows (PIF) and also announced their first-ever PIF AI cohort starting this summer.
DHS has launched the DHS AI Corps, which will hire 50 AI professionals to build safe, responsible, and trustworthy AI to improve service delivery and homeland security.
The Office of Personnel Management has issued guidance on skills-based hiring to increase access to federal AI roles for individuals with non-traditional academic backgrounds.
For more on the AI Talent Surge’s progress, read its report to the President. To explore opportunities, visit Join the National AI Talent Surge

The table below summarizes many of the activities that federal agencies have completed in response to the Executive Order.

bnew · Apr 30, 2024

1/1
We've been in the kitchen cooking Excited to release the first
@AIatMeta LLama-3 8B with a context length of over 1M on
@huggingface
- coming off of the 160K context length model we released on Friday!

A huge thank you to
@CrusoeEnergy
for sponsoring the compute and let us know if you want to work with our team on custom models: Finance Whitepaper

gradientai/Llama-3-8B-Instruct-Gradient-1048k · Hugging Face

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/2
ollama run llama3-gradient

2/2
Just ollama run a model.

If you want to import a model from hugging face or other places:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 30, 2024

1/9
llama-3 models did very poorly on this benchmark, simply because their context length is *limited to 8k*. But... with zero-training (actually just a simple 2 line config) you can get 32k context out of llama-3 models with *exceptional* quality. llama-3 8B surpasses many models *significantly* larger

2/9
The full run is here, llama-3 models are really capable models... the fact that we can use dynamic scaling with no training to get 32k context is quite a buff

3/9
llama-3 70B benchmark is incoming, I suspect it will be very high up on this one

4/9
Here's one completion from the model (description to find is the first text)

5/9
The benchmark is pretty good! The way it works is the model is given a large code snippet as context (many thousands of tokens), a function "description" which is used to tell the model what to retrieve and then the instructions that ask the model to use the description to *find*…

6/9
As you can imagine, it's more complex than simple needle in haystack tests (just return a line or quote). This requires the model to interpret the description of the function correctly and return the function

7/9
Ok wow... llama 3 70B is going to crush this benchmark will break into top 3 easy. It's taking extra long (running it on consumer hw @ bf16). 88.0 on python == sonnet level

8/9
ah sorry I had posted it a few days ago

9/9
seems to only require rope scaling config

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · May 1, 2024

1/7
InstantFamily

Masked Attention for Zero-shot Multi-ID Image Generation

In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and

2/7
visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively

3/7
preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate

4/7
the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID generation problems. Additionally, our model achieves state-of-the-art performance in both single-ID and multi-ID

5/7
preservation. Furthermore, our model exhibits remarkable scalability with a greater number of ID preservation than it was originally trained with.

6/7
paper page:

7/7
daily papers: Paper page - InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces...

arxiv.org

Computer Science > Computer Vision and Pattern Recognition

[Submitted on 30 Apr 2024]

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Chanran Kim, Jeongin Lee, Shichang Joung, Bongmo Kim, Yeul-Min Baek

In the field of personalized image generation, the ability to create images preserving concepts has significantly improved. Creating an image that naturally integrates multiple concepts in a cohesive and visually appealing composition can indeed be challenging. This paper introduces "InstantFamily," an approach that employs a novel masked cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation. Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions. Additionally, our masked cross-attention mechanism enables the precise control of multi-ID and composition in the generated images. We demonstrate the effectiveness of InstantFamily through experiments showing its dominance in generating images with multi-ID, while resolving well-known multi-ID generation problems. Additionally, our model achieves state-of-the-art performance in both single-ID and multi-ID preservation. Furthermore, our model exhibits remarkable scalability with a greater number of ID preservation than it was originally trained with.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.19427 [cs.CV]
	(or arXiv:2404.19427v1 [cs.CV] for this version)

Submission history

From: Chanran Kim [view email]
[v1] Tue, 30 Apr 2024 10:16:21 UTC (20,960 KB)

https://arxiv.org/pdf/2404.19427

bnew · May 1, 2024

Breh13 · May 1, 2024

One that thing that’s going to pop is people making edits.

Don’t have to be proficient in photoshop and these tools anymore. AI swapping faces, removing persons and backgrounds with inpainting. And with Suno people creating Motown versions of modern music.

:pachaha:

Some of these tools were a bit shyt just year or 2 ago and most have improved crazy. Imagine another 3-5.

One thing people struggle with though is the prompts. Like I need to know the prompts these people use for the Motown music. Ton of trial and error to getting the best version.

bnew · May 1, 2024

1/8
There is a mysterious new model called gpt2-chatbot accessible from a major LLM benchmarking site. No one knows who made it or what it is, but I have been playing with it a little and it appears to be in the same rough ability level as GPT-4. A mysterious GPT-4 class model? Neat!

2/8
Maybe better than GPT-4. Hard to tell, but it does do much better at the iconic “draw a unicorn with code”’ task…

You can access it here: https://chat.lmsys.org

3/8
GPT 4 Turbo TikZ unicorn vs gpt2-chatbot TikZ unicorn

4/8
It identifies itself as GPT-4 (with v2 personality?) but who knows?

5/8
Some interesting experiments

6/8
uh.... gpt2-chatbot just solved an International Math Olympiad (IMO) problem in one-shot

the IMO is insanely hard. only the FOUR best math students in the USA get to compete

prompt + its thoughts x.com/itsandrewgao/s…

7/8
Anonymous testing is a apparently a thing LMSYS Org does for model makers (which makes sense).

But they really should insist on cooler names for secret models in testing. All of the labs are so bad at naming their AIs.

8/8
hi @simonw, thanks a ton! We really value your feedback.

Just to clarify, following our policy, we've partnered with several model developers to bring their new models to our platform for community preview testing. These models are strictly for testing and won't be listed on the

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
GPT2-Chatbot nearly built a flappy bird clone in one shot. It messed up initializing movement and didn't give actual assets.

But I had Opus create a build script to grab the assets GPT2 intended to be there and Opus pointed to the actual flappy bird assets...

Ya can't flap and doesn't auto-restart. But man was that close.

I am fully confident if I could just use the model I'd have a python version working in a few prompts.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Mockapapella (@mockapapella) on Threads

I tried out the mysterious new gpt2-chatbot. I have a test question that I ask all new LLMs about a very specific issue I've come across when deploying models to production. As far as I (and my team) are aware, the answer to this question does not exist anywhere on the internet. Some LLMs got...

www.threads.net

Sung Kim (@sung.kim.mw) on Threads

I have to hand it to wandb.ai on speed. I put GPT2-chatbot’s coding skills to the test A new model known as gpt2-chatbot has appeared on the LMSYS platform, attracting attention for its advanced capabilities. This article details my approach, the tests I performed, and the insights I gathered...

www.threads.net

Steve Steiner (@stevejsteiner) on Threads

Interesting I just blind picked gpt2-chatbot over gpt4-turbo-2024-4-9 in llmsys. It was a series of 3 questions. Outline process philosophy, compare and contrast with Deleuze, and then ask whether Terrance Deacon’s morphodynamics and telodynamics aligned more with one or the other. gpt4: “it is...

www.threads.net

Michael Tchuindjang on LinkedIn: Powerful New Chatbot Disappears as Mysteriously as It Arrived

An advanced AI chatbot with unknown origins is turning heads last week as online communities unpack a cryptic tweet from Sam Altman. Online AI communities…

www.linkedin.com

Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude

A powerful new AI model called "gpt2-chatbot" shows capabilities that appear to be at or above the level of GPT-4.

the-decoder.com

AI in practice

Apr 30, 2024

Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude

X.com

Maximilian Schreiner

Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.

Profile

E-Mail

Content

Summary

A powerful new AI model called "gpt2-chatbot" shows capabilities that appear to be at or above the level of GPT-4.

The model, called "gpt2-chatbot," appeared without much fanfare in the LMSYS Org Chatbot Arena, a website that compares AI language models. However, its performance quickly caught the attention of testers.

"I would agree with assessments that it is at least GPT-4 level," says Andrew Gao, an AI researcher at Stanford University who has been tracking the model on LMSYS since its release.

For example, gpt2-chatbot solved a problem from the prestigious International Mathematical Olympiad on the first try - a feat he described as "insanely hard."

According to Ethan Mollick, a professor at the Wharton School, the model seems to perform better than GPT-4 Turbo on complex reasoning tasks such as writing code. Chase McCoy, founding engineer at CodeGen, said that gpt2-chatbot "is definitely better at complex code manipulation tasks than Claude Opus or the latest GPT4. Did better on all the coding prompts we use to test new models."

There are more examples on Twitter: Alvaro Cintas generated a Snake game on the first attempt.

Sully Omar, co-founder of Cognosys, had the model draw a unicorn - a test from Microsoft's controversial "Sparks of AGI" paper.

GPT-4.5 or something entirely different?

The strong performance and clues about the tokenizer used by OpenAI suggest that gpt2-chatbot may come from OpenAI and could be a test of GPT-4.5 or another new model from the company. LMSYS confirmed that it also allows model providers to test their models anonymously. The model also describes itself as ChatGPT and "based on GPT-4."

However, self-descriptions of AI models are not always reliable, and some testers report more hallucinations than GPT-4 Turbo. OpenAI CEO Sam Altman responded to the rumors with a post on X: "I have a soft spot for gpt2." In short, although the similarities to earlier OpenAI creations suggest a possible connection, conclusive evidence is still lacking.

bnew · May 2, 2024

Turns out the Rabbit R1 was just an Android app all along

Looks like this gadget could have just been an app.

www.theverge.com

Turns out the Rabbit R1 was just an Android app all along

AI is in its Juicero era.

By Allison Johnson, a reviewer with 10 years of experience writing about consumer tech. She has a special interest in mobile photography and telecom. Previously, she worked at DPReview.

Updated Apr 30, 2024, 11:23 PM EDT

54 Comments

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

I mean, at least it’s just $200? Image: David Pierce / The Verge

Since it launched last week, Rabbit’s R1 AI gadget has inspired a lot of questions, starting with “ Why isn’t this just an app?” Well, friends, that’s because it is just an app.

Over at Android Authority, Mishaal Rahman managed to download Rabbit’s launcher APK on a Google Pixel 6A. With a little tweaking, he was able to run the app as if it were on Rabbit’s own device. Using the volume-up key in place of the R1’s single hardware button, he was able to set up an account and start asking it questions, just as if he was using the $199 R1.

Oh boy.

Rahman points out that the app probably doesn’t offer all of the same functionality as the R1. In his words: “the Rabbit R1’s launcher app is intended to be preinstalled in the firmware and be granted several privileged, system-level permissions — only some of which we were able to grant — so some of the functions would likely fail if we tried.” But the fact that the software runs on a midrange phone from almost two years ago suggests that it has more in common with a plain ‘ol Android app than not.

Rabbit founder and CEO Jesse Lyu disagrees with this characterization. He gave a lengthy statement to The Verge that we’ve partially quoted below — it was also posted to Rabbit’s X account if you want to read it in full.

“rabbit r1 is not an Android app... rabbit OS and LAM run on the cloud with very bespoke AOSP and lower level firmware modifications, therefore a local bootleg APK without the proper OS and Cloud endpoints won’t be able to access our service. rabbit OS is customized for r1 and we do not support third-party clients.”

The R1 isn’t alone; Humane’s AI pin appears to run on a version of Android’s open-source software, too. But it’s the R1 in the hot seat right now as the first reviews have started to trickle out — and they’re not great, Bob. Rabbit issued its first software update earlier today to address some complaints, including a fast-draining battery. That issue seems to be better controlled post-update; my R1’s idle battery performance is vastly improved after downloading the update this morning.

But the bigger problem is that the R1 just doesn’t do enough useful things to justify its existence when, you know, phones exist. It looks like this AI gadget could have just been an app after all.

Update April 30th 11:23PM ET: Added a statement from Rabbit CEO Jesse Lyu.

bnew · May 2, 2024

Exclusive: Inside the Rise of Jesse Lyu and the Rabbit R1

Rabbit’s founder and CEO, Jesse Lyu, tells all about the origins of the R1, how he worked with Teenage Engineering to design it in "10 minutes," and what he thinks about the AI gadget competition, including the Humane Ai Pin and Open Interpreter 01 Light.

www.inverse.com

INSIDE THE RISE OF JESSE LYU AND THE RABBIT R1

Rabbit’s founder and CEO, Jesse Lyu, tells all about the origins of the R1 and what he thinks about the AI gadget competition.

bnew · May 2, 2024

Screenshots suggest TikTok is circumventing Apple App Store commissions | TechCrunch

The feature may be hidden from most users, either by design or because it's only shown to users in a specific group, like testers or high spenders.

techcrunch.com

Screenshots suggest TikTok is circumventing Apple App Store commissions

Sarah Perez @sarahpereztc / 3:46 PM EDT•April 30, 2024

TikTok and YouTube apps on screen iphone xr, close up

Image Credits: Anatoliy Sizov (opens in a new window)/ Getty Images

TikTok may be routing around the App Store to save money on commissions. According to new findings, the ByteDance-owned social video app is presenting some of its users with a link to a website for purchasing the coins used for tipping digital creators. Typically, these coins are bought via in-app purchase, which requires a 30% commission paid to Apple.

The feature may be hidden from most users, either by design or because it’s only shown to users in a specific group, like testers or high spenders. In any event, those who do have access to the new option are seeing a screen that encourages them to “recharge” — that is, buy more coins — via tiktok.com. Although these screenshots were discovered within the iOS app by TechCrunch tipster David Tesler, it’s not clear how many TikTok users are seeing them or when or how they’re being shown.

Tesler says the option to purchase via the web was displayed to an account that had previously purchased a large amount of coins.

Image Credits: Screenshot from TikTok app

In some cases, users are shown a screen that includes a message such as “Try recharging on tiktok.com to avoid in-app service fees” followed by a “Try now” link. Other times, they may get a pop-up that says “Try recharging on tiktok.com” with another message about the potential savings. This one reads, “You can save the service fee and get access to popular payment methods,” and is followed by a big, red “Try now” button or a less prominent option that says “Don’t show again.”

Image Credits: Screenshot from TikTok app

Users who follow the provided link are taken to the website for buying coins: tiktok.com/coin. From this web view, they can pay using a variety of methods, including Apple Pay or debit or credit cards. The website reminds users that purchases made directly with TikTok will save them around 25% “with a lower third-party service fee.”

On the web, users can purchase packs of coins ranging from 70 coins to 17,500 coins, or even enter a custom (higher) amount. Inside the app, however, coin packs are available starting at 20 coins up to 16,500 with no option for a custom amount.

Image Credits: Screenshot from TikTok app

That could suggest TikTok is only showing the web links to those users who typically buy larger packs of coins at one time.

While Apple did begin to allow developers of select apps to add links to their websites from inside the app back in 2022, the use case was limited. The only apps that qualify to offer these lines for “account management” are what Apple calls “reader” apps — or those apps that provide access to paid digital content as their main functionality (think: Netflix, not Facebook). In addition, apps that choose to use the External Link Entitlement cannot offer in-app purchases via the App Store as well. It’s an either/or situation.

Typical IAP flow. Image Credits: Screenshot from TikTok iOS app

Given that TikTok is also offering most of its users the option to buy via in-app purchases, it seems it’s not abiding by the External Link Entitlement rules even if it had been granted the exception (which would be surprising).

TikTok and Apple have not returned requests for comment at this time. TikTok’s help documentation about coins says they’re available for purchase and recharge through the App Store and Google Play on mobile devices.

Tesler noted that when Fortnite inserted an option that routed users around Apple’s in-app purchases, Apple banned the app from the App Store. It’s unclear what, if any, action Apple will take against TikTok now, given the current politics around the Beijing-based app.

In 2020, Fortnite was removed from the app store for a similar interface in which they presented users with an option to bypass apple in-app purchases pic.twitter.com/LLcilXEUQb

— David Tesler (@getdavenow) April 30, 2024

TikTok’s current U.S. fate is uncertain, as a bill to ban the app has now been signed into law by President Biden. However, the company said it plans to fight the ban in court, as it did before under President Trump. Biden had originally put the effort to ban the app on hold until a new bipartisan bill passed both the House and Senate.

bnew · May 3, 2024

1/2
Did anybody notice Nvidia published a competitive llama3-70b QA/RAG fine tune?

nvidia/Llama3-ChatQA-1.5-70B · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

2/2
actually, looks like the 8b version might be more interesting.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Nvidia has published a competitive llama3-70b QA/RAG fine tune #LLM #LLMs

ChatQA-1.5, which excels at conversational question answering (QA) and retrieval-augumented generation (RAG). ChatQA-1.5 is built using the training recipe from ChatQA (1.0), and it is built on top of Llama-3 foundation model. Additionally, we incorporate more conversational QA data to enhance its tabular and arithmatic calculation capability. ChatQA-1.5 has two variants: ChatQA-1.5-8B and ChatQA-1.5-70B.
Nvidia/ChatQA-1.5-70B: nvidia/Llama3-ChatQA-1.5-70B · Hugging Face
Nvidia/ChatQA-1.5-8B: nvidia/Llama3-ChatQA-1.5-8B · Hugging Face

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/1
Nvidia presents ChatQA

Building GPT-4 Level Conversational QA Models

paper page: Paper page - ChatQA: Building GPT-4 Level Conversational QA Models

introduce ChatQA, a family of conversational question answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

Note that ChatQA-1.5 is built based on Llama-3 base model, and ChatQA-1.0 is built based on Llama-2 base model. ChatQA-1.5 used some samples from the HybriDial training dataset. To ensure fair comparison, we also compare average scores excluding HybriDial. The data and evaluation scripts for ConvRAG can be found here.

bnew · May 3, 2024

Full Line Code Completion in JetBrains IDEs: All You Need to Know | The JetBrains Blog

Learn more about a new feature in v2024.1 of JetBrains IDEs – full line code completion.

blog.jetbrains.com

Full Line Code Completion in JetBrains IDEs: All You Need to Know

Ekaterina Ryabukha
April 4, 2024

Programming with AI is still a highly divisive topic, but there’s no denying that more and more developers are starting to incorporate AI into their daily workflows. Whether you’ve already picked your side in the debate or are still undecided, we’ve got a new feature in v2024.1 of JetBrains IDEs that might just pique your interest – full line code completion. It’s AI-powered and runs locally without sending any data over the internet.

In this blog post, we’ll tell you more about what full line code completion is, how it works, what languages are supported, and how you can provide feedback about it to us.

What is full line code completion in JetBrains IDEs?

This new type of code completion was added to JetBrains IDEs with the latest 2024.1 update. As you can see below, it takes the form of gray-toned, single-line suggestions that complete lines based on the context of the current file:

GIF

These suggestions are powered by specialized language models that we’ve trained specifically for different languages and frameworks. The models run locally without sending any code over the internet.

Full line code complеtion is currently available for Java, Kotlin, Python, JavaScript, TypeScript, CSS, PHP, Go, and Ruby within the corresponding JetBrains IDEs: IntelliJ IDEA Ultimate, PyCharm Professional, WebStorm, PhpStorm, GoLand, and RubyMine. In the coming months, we plan to extend the functionality to C#, Rust, and C++, so it will also land in Rider, RustRover, and CLion.

Note that full line code completion is included with your active JetBrains IDE subscription at no additional cost – just make sure you’re on v2024.1 or later. If you don’t yet have a subscription, you can also use this feature during the 30-day free trial.

How does full line completion work?

With full line code completion, we had two main goals in mind. The first one is obvious – to help you save time and increase your coding speed. But beyond that, we also wanted to provide a solution that addresses the constraints certain organizations have when it comes to using AI solutions that are connected to the cloud.

Here’s a breakdown of how full line code completion helps to realize these two aims:

It works locally and is available offline. This means you can take advantage of the feature even if you aren’t connected to the internet.
It doesn’t send any data from your machine over the internet. The language models that power full line code completion run locally, which is great for two reasons. First, your code remains safe, as it never leaves your machine. Second, there are no additional cloud-related expenses – that’s why this feature comes at no additional cost.
It’s integrated deeply into JetBrains IDEs. All suggestions will be appropriately formatted, with the IDE checking for balanced brackets and quotes. Additionally, we use the power of static analysis and our understanding of code to filter out incorrect suggestions. Each supported language has its own set of suggested code correctness checks. The most basic ones, like unresolved reference checks, are implemented for most languages to guarantee that the IDE doesn’t suggest non-existent variables and methods. The auto-import feature is also supported.
It’s designed to keep your workflow as smooth as possible. We use smart filtering to avoid showing suggestions that tend to be canceled explicitly or deleted right after they were added.

For some additional technical details, see this section below.

Full line code completion vs. AI Assistant

There are two ways you can benefit from AI functionality in JetBrains IDEs – full line code completion and JetBrains AI Assistant. We appreciate that this might be confusing, so let’s take a closer look at what they have in common and how they differ.

Both full line code completion and JetBrains AI Assistant aim to help you work faster. They both also go beyond the standard completion that has been available in JetBrains IDEs for some time already. However, JetBrains AI Assistant offers a more comprehensive feature set, including context-aware smart chat and the ability to generate tests or write documentation.

See the table below for a comparison of the two AI functionalities:

Please rest assured that we never train any of our AI features on customers’ code. If your company has strict data privacy regulations, but you still want to speed up your workflows with AI, full line code completion may be a better choice for you.

Under the hood

The backbone of full line code completion is a programming-language specific language model, which is trained in house using a dataset of open-source code with permissive licenses. The language model’s input is the code before the caret, though for some languages, we also add content from related files. The output is the model’s suggested continuation of the current line, which is shown in gray.

The language model’s inference runs on your local machine. To ensure the most efficient generation, the model inference runs in a separate process and is heavily optimized for the target machine’s architecture. For example, if you’re using x86-64 architecture, the model will run on the CPU, whereas if you’re using ARM64 architecture, the model will use the power of your computer’s GPU.

After the suggestion is generated, a number of post-processing steps are applied. First, we check whether this suggestion is syntactically and semantically correct, and then we perform smart filtering, formatting, parenthesis balancing, and various other manipulations. Post-processing is crucial for user experience, so we do our best to show only valuable suggestions that don’t disturb your workflow.

Lastly, you may also be wondering why we decided to go for single-line suggestions. The length of the AI completion suggestions is a trade-off. While longer suggestions do tend to reduce how many keystrokes you have to make, which is good, they also increase the number of reviews required on your end. Taking the above into account, we decided that completing a single line of code would be a fair compromise.

This decision allowed us to reduce the size of the model without any significant decline in suggestion quality. In the 2024.1 version of JetBrains IDEs, we use a language model that has 100 million parameters, with a maximum context size of 1,536 tokens, which is roughly 170 lines of code.

How to tweak the feature

You can configure full line code completion in Settings | Editor | General | Code Completion – all the settings can be found there, under the Machine Learning-Assisted Completion section:

If you’d like to turn off the feature, you can do so by unticking the Enable Full Line suggestions checkbox. Alternatively, you can disable the plugin powering this feature. To do so, go to Settings | Plugins, switch to the Installed tab, and look for full line code completion.

How to provide feedback

Full line code completion is still in active development, so we encourage you to share your feedback with us. You can do so by leaving a comment under this blog post. You can also upvote existing issues here or create a new one by logging in and clicking on the New Issue button in the top right-hand corner.

That’s it for today. Please give full line code completion a try and let us know what you think. We’ll continue improving this functionality further, with support for C#, Rust, and C++ as well as better integration with AI Assistant’s multi-line code completion being our top priorities for now. Stay tuned for updates!

bnew · May 3, 2024

bnew · May 3, 2024

OpenFunctions v2

Gorilla: Large Language Model Connected with Massive APIs

Blog 7: Gorilla OpenFunctions v2

Gorilla OpenFunctions v2

Gorilla OpenFunctions-v2! SoTA for open-source models. On-par with commercial models.

With the latest iteration of Gorilla OpenFunctions-v2, we are delighted to mark significant advancements in function calling for LLMs within the open-source community. As a direct substitute for its predecessor, Gorilla OpenFunctions-v2 retains its open-source ethos while introducing exciting enhancements. These include support for multiple programming languages such as Python, Java, JavaScript, and REST API - the first among both open-source and closed-source models, alongside the ability to handle multiple and parallel function calls, and the ability to determine function relevance. This update cements Gorilla OpenFunctions-v2's position at the forefront of function calling capabilities among LLMs. Moreover, the drop-in replacement allows for seamless integration of OpenFunctions into a diverse range of applications, from social media platforms like Instagram to delivery services like DoorDash, as well as utility tools including Google Calendar and Stripe.

See What's New!!

The five new exciting features we are happy to launch with OpenFunctions-v2 are:

More Data Types: Gorilla OpenFunctions-v2 can now support diverse languages with expanded support for argument types in function calls. This includes [string, number, boolean, list, tuple, dict, any] for Python, [string, number, boolean, list, tuple, dict, any] for Java and [string, number, boolean, dict, bigint, array, date, any] for Javascript. For reference, OpenAI and many others only support JSON schema, i.e., [string, number, integer, object, array, and boolean]. Native support for these types means, you can now plug-and-play openfunctions-v2 without having to weave through string literals.
Parallel & Multiple Functions: Support for Parallel and Multiple Functions. Multiple functions refers to the scenario where the user can input multiple functions when they are not sure which exact function is best to service the prompt. In this scenario, the Gorilla model picks one or more (or none) of the functions provided to respond to the user's requests. In parallel functions, the user's prompt could be serviced by multiple calls to the same function. Gorilla not only supports both of these, but the benefits stack one-on-top of the other!
Function Relevance Detection: Reduce hallucinations in scenarios when no function, or even no relevant function is provided. Gorilla openfunctions v2 can now automatically detect whether the functions provided to the model can address the user's prompt. Recognizing this, the LLM raises an “Error” message to the user providing them with additional information.
Enhanced Capabilities for RESTful APIs: Enhance ability to format RESTful API calls. RESTful APIs are a common phenomenon within the web powering many popular software services including Slack, PayPal, etc. Our model is specially trained to handle RESTful API calls with good quality.

Quick Links:

How well to other function-calling models perform: Berkeley Function Calling Leaderboard
Play with the model online: Gorilla OpenFunctions-v2 web-demo
Check out the project: GitHub
Model (6.91B) on HuggingFace : gorilla-llm/gorilla-openfunctions-v2

Integrating OpenFunctions-v2 in your App

Using Gorilla OpenFunctions-v2 is straightforward:

To help with quick prototyping, we provide a hosted Gorilla Openfunctions-v2 model for inference. Or you can run it locally, or self-host it by accessing the model from HuggingFace. The example below, demonstrates how to invoke the hosted Gorilla Openfunctions-v2 model:
import openai

def get_gorilla_response(prompt="", model="gorilla-openfunctions-v2", functions=[]):
openai.api_key = "EMPTY" # Hosted for free with from UC Berkeley
openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"
try:
completion = openai.ChatCompletion.create(
model="gorilla-openfunctions-v2",
temperature=0.0,
messages=[{"role": "user", "content": prompt}],
functions=functions,
)
# completion.choices[0].message.content, string format of the function call
# completion.choices[0].message.functions, Json format of the function call
return completion.choices[0]
Prompt the model:
What's the weather like in the two cities of Boston and San Francisco?
Format your function call: The model will return the function call based on your request.

query = "What's the weather like in the two cities of Boston and San Francisco?"

Code:

functions = [
    {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA",
                },
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
            },
            "required": ["location"],
        },
    }
]

Get Your Function Call: The model will return a Python function call based on your request.
This opens up possibilities for developers and non-developers alike, allowing them to leverage complex functionalities without writing extensive code.

Input:

get_gorilla_response(prompt=query, functions=[functions])
Output:

[get_current_weather(location='Boston, MA'), get_current_weather(location='San Francisco, CA')]
With the example above, you can use Gorilla OpenFunctions-v2 to provide a well formatted output, or call a function with your own definition! Then you can use this freely within your applications and chatbot!

Note: Gorilla through our hosted end-point is currently only supported with openai==0.28.1. We will migrate to also include support for openai==1.xx soon with which functions is replaced by tool_calls.

bnew · May 3, 2024

Performance on Berkeley Function-Calling Leaderboard

We perform exhaustive and comprehensive evaluation on the Berkeley Function-Calling Leaderboard, we benchmark our model against current state-of-the-art models GPT-4-1106-preview as well as GPT-4 and GPT-3.5-turbo function calling features. In addition, we also compare our model with the other open-source models, demonstrating superior behavior among them. Our evaluation consists of 2k distinct query, API documentation pairs from different domains (including travel, finance, scheduling meetings, etc) and languages (java, javascript, python, REST API).

To dive into details about how our model performs in each category, we provide a detailed table below from the Berkeley Function-Calling Leaderboard. We see that when compared to the current state-of-the-art, GPT-4's function calling, in Python, Gorilla OpenFunctions-v2 does better at simple function calling category, but not as good on function calls that involve multiple and parallel functions. This new feature continues to be an exciting area of research for us, and the open-source community in general. It is worth highlighting that our model provides a very stable executable function calls - function calls that were evaluated by actually executing them - with no intervention in-between. Unsurprisingly, having been trained, our model outperforms GPT-4 on function calls on the programming languages other than Python (e.g., Java, Javascript and REST APIs). For REST APIs, our model provides more stable outputs that includes all the required fields including the url, params and header making our model ripe for immediate adoption.

Gorilla OpenFunctions-v2's performance on Berkeley Function-Calling Leaderboard

Code:

"User": "Can you fetch me the weather data for the coordinates
37.8651 N, 119.5383 W, including the hourly forecast for temperature,
wind speed, and precipitation for the next 10 days?"

"Function":
{
...
"parameters":
{
"type": "object",
"properties":
{
"url":
{
"type": "string",
"description": "The API endpoint for fetching weather
data from the Open-Meteo API for the given latitude
and longitude, default
https://api.open-meteo.com/v1/forecast"
}
...
}
}
}

"Gorilla OpenFunctions-v2 output":
{
"name": "requests.get",
"parameters": {
"url": "https://api.open-meteo.com/v1/forecast",
"params":
{
"latitude": "37.8651",
"longitude": "-119.5383",
"forecast_days": 10
},
}
}

The left hand side is GPT-4 generated, and the right hand side is openfunctions-v2 generated. As we can see from the above mistakes that when GPT-4 function call is dealing with functions involving complex parameter structures (e.g., dict inside a dict) with default values, the model tends to have trouble, especially on parsing default values. Rather than being a corner-case, the example above is a common paradigm for REST APIs.

OpenFunctions Data Composition & Training

Gorilla OpenFunctions-v2 is a 6.91B parameter model trained further upon on the Deepseek-Coder-7B-Instruct-v1.5 6.91B model. To train the model, we collect in total of 65,283 question-function-answer pairs from three different sources: Python packages (19,353), Java repositories (16,586), Javascript Repositories (4,245), public-API (6,009), and Command Line Tools (19,090) from various cloud providers. The data composition is shown in the figure below.

After the data collection, we carry out four data augmentations to diversify our training dataset. First, we change the function names. This is critical to ensure the model does not "memorize" the API mapping. Second, we add random (randomly chosen, and random number of) functions to make our data-set compatible with parallel functions. This way we can generate multiple-function datasets from simple functions. Third, we adopt similar strategies of perturbing the prompt to generate scenarios of parallel-functions. We then extend it to also include multiple- and parallel- functions in the same data-points. Finally, we mix some portion of the dataset in which the functions provided during the input is not sufficient to the task. We flag these as `Relevance Detection` scenarios. As with most LLM training, we extensively varied the extents of each data augmentation to train a robust model.

Function Name Transformation:From the original question-function-answer pairs, we augment this with a differnt function names to avoid the model memorizing the correlation between function names and the question (e.g., 'uber' API is used for transportation).
query + [{'name': 'func1', 'description': 'order takeout'}] -> ans1 =>
query + [{'name': 'func2', 'description': 'order takeout'}] -> [ans2]
Parallel Functions Transformation:To handle a more complex case where multiple functions will be selected to answer the user's request, we change the original question to ask for multiple outputs.
query + [{'name': 'func1', 'description': 'order takeout'}] -> ans1 =>
query + [{'name': 'func1', 'description': 'order takeout'}, {'name': 'func2', 'description': 'get weather'}] -> [ans1]
Multiple Functions Transformation:Transform the original function with multiple function calls included in the training, so that the model can learn to choose which function call to use.
query1 + [{'name': 'func1', 'description': 'order takeout'}] -> ans1 =>
query2 + [{'name': 'func1', 'description': 'order takeout'}] -> [ans1, ans2]
Parallel Multiple Functions Transformation:The combined of the above parallel, and multiple transforms.
query1 + [{'name': 'func1', 'description': 'order takeout'}] -> ans1 =>
query2 + [{'name': 'func1', 'description': 'order takeout'}, {'name': 'func2', 'description': 'get weather'}] -> [ans1, ans2]
Function Relevance Detection Transformation:We also include some portion of the dataset in which the functions provided cant not solve the task. We call this `Relevance Detection`.
query1 + [{'name': 'func1', 'description': 'order takeout'}] -> ans1 =>
query2 + [{'name': 'func1', 'description': 'order takeout'}] -> [Error, the function cannot solve the question.]

Following the completion of the data augmentation process, we further refine the dataset by employing the Rouge score for deduplication, effectively eliminating redundant entries. This step is a recognized standard practice.

Conclusion

We are happy to release gorilla-openfunctions-v2, a 6.91B parameter model trained on top of the Deepseek-Coder-7B-Instruct-v1.5 LLM. It takes-in the users prompt along with multiple API calls and returns the functions with the right arguments. With OpenFunctions we extended native support for parameter types in Python, Java, and JavaScript, and RESTful APIs. For more information, check out our blog on Berkeley Function Calling Leaderboard for evaluation, and our GitHub page for the model. All of the results in the blog are generated using gorilla-openfunctions-v2.

The A.I Megathread (LLM , GPT , Development)

Veteran

Biden-​

Veteran

Veteran

Veteran

Computer Science > Computer Vision and Pattern Recognition​

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation​

Submission history​

Veteran

Smh.

Veteran

Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude​

GPT-4.5 or something entirely different?​

Veteran

Turns out the Rabbit R1 was just an Android app all along​

AI is in its Juicero era.​

Veteran

INSIDE THE RISE OF JESSE LYU AND THE RABBIT R1​

Veteran

Screenshots suggest TikTok is circumventing Apple App Store commissions​

Veteran

Veteran

Full Line Code Completion in JetBrains IDEs: All You Need to Know​

What is full line code completion in JetBrains IDEs?​

How does full line completion work?​

Full line code completion vs. AI Assistant​

Under the hood​

How to tweak the feature​

How to provide feedback​

Veteran

Veteran

Gorilla: Large Language Model Connected with Massive APIs​

Blog 7: Gorilla OpenFunctions v2​

Gorilla OpenFunctions v2​

​

See What's New!! ​

Integrating OpenFunctions-v2 in your App ​

Veteran

Performance on Berkeley Function-Calling Leaderboard ​

OpenFunctions Data Composition & Training ​

Conclusion​

Biden-

Computer Science > Computer Vision and Pattern Recognition

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Submission history

Is OpenAI testing GPT-4.5? "gpt2-chatbot" writes better code than GPT-4 and Claude

GPT-4.5 or something entirely different?

Turns out the Rabbit R1 was just an Android app all along

AI is in its Juicero era.

INSIDE THE RISE OF JESSE LYU AND THE RABBIT R1

Screenshots suggest TikTok is circumventing Apple App Store commissions

Full Line Code Completion in JetBrains IDEs: All You Need to Know

What is full line code completion in JetBrains IDEs?

How does full line completion work?

Full line code completion vs. AI Assistant

Under the hood

How to tweak the feature

How to provide feedback

Gorilla: Large Language Model Connected with Massive APIs

Blog 7: Gorilla OpenFunctions v2

Gorilla OpenFunctions v2

See What's New!!

Integrating OpenFunctions-v2 in your App

Performance on Berkeley Function-Calling Leaderboard

OpenFunctions Data Composition & Training

Conclusion