bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800


Run Code Llama locally​

August 24, 2023​

369899645_822741339422669_4458807373211021546_n.gif


Today, Meta Platforms, Inc., releases Code Llama to the public, based on Llama 2 to provide state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

Code Llama is now available on Ollama to try!​

If you haven’t already, installed Ollama, please download it here.

For users to play with Code Llama:

Code Llama 7 billion parameter model

ollama run codellama:7b-instruct

Example prompt:

In Bash, how do I list all text files in the current directory (excluding subdirectories) that have been modified in the last month?

More models are coming, and this blog will be updated soon.

Foundation models and Python specializations are available for code generation/completions tasks​

369628374_974402950309179_3355223640107296330_n.gif


Foundation models:

More models are coming, and this blog will be updated soon.

Python specializations:

More models are coming, and this blog will be updated soon.


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800


Blog​

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B​

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.​

The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.

  • CodeLlama-34B achieved 48.8% pass@1 on HumanEval
  • CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval
We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens.

Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples. The methodology is:

  • For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters.
  • A match was identified if any sampled substring was a substring of the processed training example.
For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models:

  • Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval
  • Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval

Download​

We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results.


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800

Leo, Brave’s browser-native AI assistant, is now available in Nightly version for testing​


Blog > New products & features
Last updated Aug 21, 2023


Today we’re excited to announce that Leo, the AI assistant built natively in the Brave browser, is now available for testing and feedback in the Nightly desktop channel (starting with version 1.59).

Building on the success of the Brave Search AI Summarizer, we’ve made Leo available as a companion in the browser sidebar. Leo allows users to interact with the web pages they’re visiting—for example, by asking for video transcripts or interactive article summaries—without leaving the page itself. Leo can also suggest follow-up questions, augment original content, and even help with reading comprehension. Leo can answer questions just like other AI-powered chatbots, but directly within the experience of a web page.

What is Brave Leo?​

Brave Leo is a chat assistant hosted by Brave without the use of third-party AI services, available to Brave users on the desktop Nightly channel. The model behind Leo is Llama 2, a source-available large language model released by Meta with a special focus on safety. We’ve made sure that user inputs are always submitted anonymously through a reverse-proxy to our inference infrastructure. In this way, Brave can offer an AI experience with unparalleled privacy.

We’ve specifically tuned the model prompt to adhere to Brave’s core values. However, as with any other LLM, the outputs of the model should be treated with care for potential inaccuracies or errors.

How to try Leo and share feedback​

Leo is available today for all users of the Brave browser desktop Nightly channel. Nightly desktop users can access Leo via the
Brave Leo icon
button in Brave Sidebar.

Are you a Brave Nightly user? Please tell us what you think of Leo!

A note on anonymity​

Leo is free to use for any desktop Nightly user, and no user login or account is required. Chats in Leo cannot be used for training purposes, and no one can review those conversations, as they’re not persisted on Brave’s servers—conversations are discarded immediately after the reply is generated. For this reason, there’s no way to review past conversations or delete that data—it isn’t stored in the first place.

What data does the Brave browser send?​

If you use Leo, the browser shares with the server your latest query, your ongoing conversation history and, when the use case calls for it, only the necessary context from the page you’re actively viewing (e.g. the article’s text, or the YouTube video transcript).

How can I get better results out of Leo?​

As with any AI, the more specific you are with your prompts and context, the better the results Leo can provide. Remember to give Leo clear, detailed instructions and, if you don’t get exactly the answer you’re looking for, to try wording your query/prompt a different way.

Does Leo have access to live information?​

For now, Leo does not have access to live information. However, in future releases we do plan to offer a version of Leo with some level of access to current information. This will be powered by our own independent Brave Search.

What’s next for Brave Leo?​

In addition to incorporating live information, we’ll be making improvements to Leo’s accuracy and user experience. We hope to release Leo to all Brave browser users in the coming months.

Article Summary via Brave Leo
Brave Leo on YouTube
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800

WizardCoder-34B surpasses GPT-4, ChatGPT-3.5 and Claude-2 on HumanEval with 73.2% pass@1​



6rFrAXP.png

WPNM9Cd.png




News​

  • 🔥🔥🔥[2023/08/26] We released WizardCoder-Python-34B-V1.0 , which achieves the 73.2 pass@1 and surpasses GPT4 (2023/03/15), ChatGPT-3.5, and Claude2 on the HumanEval Benchmarks.
  • [2023/06/16] We released WizardCoder-15B-V1.0 , which achieves the 57.3 pass@1 and surpasses Claude-Plus (+6.8), Bard (+15.3) and InstructCodeT5+ (+22.3) on the HumanEval Benchmarks.
❗Note: There are two HumanEval results of GPT4 and ChatGPT-3.5. The 67.0 and 48.1 are reported by the official GPT4 Report (2023/03/15) of OpenAI. The 82.0 and 72.5 are tested by ourselves with the latest API (2023/08/26).

ModelCheckpointPaperHumanEvalMBPPDemoLicense
WizardCoder-Python-34B-V1.0🤗 HF Link📃 [WizardCoder]73.261.2DemoLlama2
WizardCoder-15B-V1.0🤗 HF Link📃 [WizardCoder]59.850.6--OpenRAIL-M
  • Our WizardMath-70B-V1.0 model slightly outperforms some closed-source LLMs on the GSM8K, including ChatGPT 3.5, Claude Instant 1 and PaLM 2 540B.
  • Our WizardMath-70B-V1.0 model achieves 81.6 pass@1 on the GSM8k Benchmarks, which is 24.8 points higher than the SOTA open-source LLM, and achieves 22.7 pass@1 on the MATH Benchmarks, which is 9.2 points higher than the SOTA open-source LLM.
ModelCheckpointPaperGSM8kMATHOnline DemoLicense
WizardMath-70B-V1.0🤗 HF Link📃 [WizardMath]81.622.7DemoLlama 2
WizardMath-13B-V1.0🤗 HF Link📃 [WizardMath]63.914.0DemoLlama 2
WizardMath-7B-V1.0🤗 HF Link📃 [WizardMath]54.910.7DemoLlama 2
ModelCheckpointPaperMT-BenchAlpacaEvalGSM8kHumanEvalLicense
WizardLM-70B-V1.0🤗 HF Link📃Coming Soon7.7892.91%77.6%50.6Llama 2 License
WizardLM-13B-V1.2🤗 HF Link7.0689.17%55.3%36.6Llama 2 License
WizardLM-13B-V1.1🤗 HF Link6.7686.32%25.0Non-commercial
WizardLM-30B-V1.0🤗 HF Link7.0137.8Non-commercial
WizardLM-13B-V1.0🤗 HF Link6.3575.31%24.0Non-commercial
WizardLM-7B-V1.0🤗 HF Link📃 [WizardLM]19.1Non-commercial

Comparing WizardCoder-Python-34B-V1.0 with Other LLMs.​

🔥 The following figure shows that our WizardCoder-Python-34B-V1.0 attains the second position in this benchmark, surpassing GPT4 (2023/03/15, 73.2 vs. 67.0), ChatGPT-3.5 (73.2 vs. 72.5) and Claude2 (73.2 vs. 71.2).

WizardCoder
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800

AI images are getting harder to spot. Google thinks it has a solution.​

The tech giant is among companies pushing out AI tools while promising to build more tools to protect against their misuse​


By Gerrit De Vynck
August 29, 2023 at 8:00 a.m. EDT

imrs.php

Illustration by Elena Lacey/The Washington Post; iStock

Artificial intelligence-generated images are becoming harder to distinguish from real ones as tech companies race to improve their AI products. As the 2024 presidential campaign ramps up, concern is quickly rising that such images might be used to spread false information.

On Tuesday, Google announced a new tool — called SynthID — that it says could be part of the solution. The tool embeds a digital “watermark” directly into the image that can’t be seen by the human eye but can be picked up by a computer that’s been trained to read it. Google said its new watermarking tech is resistant to tampering, making it a key step toward policing the spread of fake images and slowing the dissemination of disinformation.

AI image generators have been available for several years and have been increasingly used to create “deepfakes” — false images purporting to be real. In March, fake AI images of former president Donald Trump running away from police went viral online, and in May a fake image showing an explosion at the Pentagon caused a momentary crash in stock markets. Companies have placed visible logos on AI images, as well as attached text “metadata” noting an image’s origin, but both techniques can be cropped or edited out relatively easily.

“Clearly the genie’s already out of the bottle,” Rep. Yvette D. Clarke (D-N.Y.), who has pushed for legislation requiring companies to watermark their AI images, said in an interview. “We just haven’t seen it maximized in terms of its weaponization.”
For now, the Google tool is available only to some paying customers of its cloud computing business — and it works only with images that were made with Google’s image-generator tool, Imagen. The company say it’s not requiring customers to use it because it’s still experimental.
The ultimate goal is to help create a system where most AI-created images can be easily identified using embedded watermarks, said Pushmeet Kohli, vice president of research at Google DeepMind, the company’s AI lab, who cautioned that the new tool isn’t totally foolproof. “The question is, do we have the technology to get there?”

As AI gets better at creating images and video, politicians, researchers and journalists are concerned that the line between what’s real and false online will be eroded even further, a dynamic that could deepen existing political divides and make it harder to spread factual information. The improvement in deepfake tech is coming as social media companies are stepping back from trying to police disinformation on their platforms.

Watermarking is one of the ideas that tech companies are rallying around as a potential way to decrease the negative impact of the “generative” AI tech they are rapidly pushing out to millions of people. In July, the White House hosted a meeting with the leaders of seven of the most powerful AI companies, including Google and ChatGPT maker OpenAI. The companies all pledged to create tools to watermark and detect AI-generated text, videos and images.

Microsoft has started a coalition of tech companies and media companies to develop a common standard for watermarking AI images, and the company has said it is researching new methods to track AI images. The company also places a small visible watermark in the corner of images generated by its AI tools. OpenAI, whose Dall-E image generator helped kick off the wave of interest in AI last year, also adds a visible watermark. AI researchers have suggested ways of embedding digital watermarks that the human eye can’t see but can be identified by a computer.

Kohli, the Google executive, said Google’s new tool is better because it works even after the image has been significantly changed — a key improvement over previous methods that could be easily thwarted by modifying or even flipping an image.

imrs.php

Google’s new tool digitally embeds watermarks not visible to the human eye onto AI-generated images. Even if an AI-generated image has been edited or manipulated, as seen here, the tool will still be able to detect the digital watermark. (Washington Post illustration; Google)

“There are other techniques that are out there for embedded watermarking, but we don’t think they are that reliable,” he said.

Even if other major AI companies like Microsoft and OpenAI develop similar tools and social media networks implement them, images made with open-source AI generators would be still be undetectable. Open-source tools like ones made by AI start-up Stability AI, which can be modified and used by anyone, are already being used to create nonconsensual sexual images of real people, as well as create new child sexual exploitation material.

“The last nine months to a year, we’ve seen this massive increase in deepfakes,” said Dan Purcell, founder of Ceartas, a company that helps online content creators identify if their content is being reshared without their permission. In the past, the company’s main clients have been adult content makers trying to stop their videos and images from being illicitly shared. But more recently, Purcell has been getting requests from people who have had their social media images used to make AI-generated pornography against their will.

As the United States heads toward the 2024 presidential election, there’s growing pressure to develop tools to identify and stop fake AI images. Already, politicians are using the tools in their campaign ads. In June, Florida Gov. Ron DeSantis’s campaign released a video that included fake images of Donald Trump hugging former White House coronavirus adviser Anthony S. Fauci.

U.S. elections have always featured propaganda, lies and exaggerations in official campaign ads, but researchers, democracy activists and some politicians are concerned that AI-generated images, combined with targeted advertising and social media networks, will make it easier to spread false information and mislead voters.

“That could be something as simple as putting out a visual depiction of an essential voting place that has been shut down,” said Clarke, the Democratic congresswoman. “It could be something that creates panic among the public, depicting some sort of a violent situation and creating fear.”

AI could be used by foreign governments that have already proved themselves willing to use social media and other technology to interfere in U.S. elections, she said. “As we get into the heat of the political season, as things heat up, we could easily see interference coming from our adversaries internationally.”

Looking closely at an image from Dall-E or Imagen usually reveals some inconsistency or bizarre feature, such as a person having too many fingers, or the background blurring into the subject of the photo. But fake image generators will “absolutely, 100 percent get better and better,” said Dor Leitman, head of product and research and development at Connatix, a company that builds tools that help marketers use AI to edit and generate videos.

The dynamic is going to be similar to how cybersecurity companies are locked in a never-ending arms race with hackers trying to find their way past newer and better protections, Leitman said. “It’s an ongoing battle.”

Those who want to use fake images to deceive people are also going to keep finding ways to confound deepfake detection tools. Kohli said that’s the reason Google isn’t sharing the underlying research behind its watermarking tech. “If people know how we have done it, they will try to attack it,” he said.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800


UAE launches Arabic large language model in Gulf push into generative AI​


Jais software part of regional powers’ effort to take world-leading role in technology’s development

https%3A%2F%2Fd1e00ek4ebabms.cloudfront.net%2Fproduction%2Ffa02bbf2-f919-4db3-abf8-8d271f0a01db.jpg


UAE national security adviser Sheikh Tahnoon bin Zayed al-Nahyan chairs AI company G42, one of the groups behind the Jais large language model © FT montage/DreamstimeUAE Presidential Court via Reuters

Simeon Kerr in Dubai and Madhumita Murgia in London


21 MINUTES AGO


An artificial intelligence group with links to Abu Dhabi’s ruling family has launched what it described as the world’s highest-quality Arabic AI software, as the United Arab Emirates pushes ahead with efforts to lead the Gulf’s adoption of generative AI.

The large language model known as Jais is an open-source, bilingual model available for use by the world’s 400mn-plus Arabic speakers, built on a trove of Arabic and English-language data.

The model, unveiled on Wednesday, is a collaboration between G42, an AI company chaired by the UAE’s national security adviser, Sheikh Tahnoon bin Zayed al-Nahyan; Abu Dhabi’s Mohamed bin Zayed University of Artificial Intelligence (MBZUAI); and Cerebras, an AI company based in California.

The launch comes as the UAE and Saudi Arabia have been buying up thousands of high-performance Nvidia chips needed for AI software amid a global rush to secure supplies to fuel AI development.

The UAE previously developed an open-source large language model (LLM), known as Falcon, at the state-owned Technology Innovation Institute in Masdar City, Abu Dhabi, using more than 300 Nvidia chips. Earlier this year, Cerebras signed a $100mn deal to provide nine supercomputers to G42, one of the biggest contracts of its kind for a would-be rival to Nvidia.

“The UAE has been a pioneer in this space (AI), we are ahead of the game, hopefully. We see this as a global race,” said Andrew Jackson, chief executive of Inception, the AI applied research unit of G42, which is backed by private equity giant Silver Lake. “Most LLMs are English-focused. Arabic is one of the largest languages in the world. Why shouldn’t the Arabic-speaking community have an LLM?”

However, the Gulf states’ goal of leadership in AI has also raised concerns about potential misuse of the technology by the oil-rich states’ autocratic leaders.

The most advanced LLMs today, including GPT-4, which powers OpenAI’s ChatGPT, Google’s PaLM behind its Bard chatbot, and Meta’s open-source model LLaMA, all have the ability to understand and generate text in Arabic. However, G42’s Jackson said the Arabic element within existing models, which can work in up to 100 languages, was “heavily diluted”.

Jais performs better than Falcon, as well as open-source models such as LLaMA, when benchmarked on its accuracy in Arabic, according to its creators. It has also been designed to have a more accurate understanding of the culture and context of the region, in contrast to most US-centric models, said Professor Timothy Baldwin, acting provost of MBZUAI.

He added that guardrails had been created to ensure that Jais “does not step outside of reasonable bounds in terms of cultural and religious sensibilities”.

Before its launch, extensive testing was conducted to weed out “harmful” or “sensitive” content, as well as “offensive or inappropriate output that does not represent the values of the organisations involved in the development of the model”, he added.

Named after the highest mountain in the UAE, Jais was trained over 21 days on a subset of Cerebras’s Condor Galaxy 1 AI supercomputer by a team in Abu Dhabi. G42 has teamed up with other Abu Dhabi entities as launch partners to use the technology, including Abu Dhabi National Oil Company, wealth fund Mubadala and Etihad Airways.

One of the challenges in training the model was the lack of high-quality Arabic language data found online, in comparison with English. Jais uses both modern standard Arabic, which is understood across the Middle East, as well as the region’s diverse spoken dialects by drawing on both media, social media and code.

“Jais is clearly better than anything out there in Arabic, and, in English, comparisons show we are competitive or even slightly better across different tasks than existing models,” said Baldwin.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800


How Amazon Missed Early Chances to Partner With OpenAI, Cohere and Anthropic​


In spite of AWS's predominance in cloud infrastructure services, accounting for 40% of global spending, its hiccups in AI development could become increasingly costly.


CHRIS MCKAY

AUGUST 30, 2023 • 3 MIN READ

Image Credit: Maginative

An exclusive report from The Information reveals a rare, strategic misstep for Amazon Web Services (AWS) that opened the door for Microsoft to become a forerunner in AI technology. This development has far-reaching implications for the industry as a whole, with AWS previously holding a near-monopolistic influence over cloud infrastructure.

According to exclusive interviews, AWS originally planned to unveil its own large language model (LLM) akin to ChatGPT at its annual conference in November 2022. But technical issues forced AWS to postpone the launch of its LLM, codenamed Bedrock.

This decision turned out to be fortunate, as OpenAI released ChatGPT just a few days into AWS's annual conference. ChatGPT wowed the tech industry with its human-like conversational abilities, instantly revealing that AWS's Bedrock wasn't on the same level. The sudden success of ChatGPT, built by OpenAI using Microsoft's cloud, made AWS scramble to catch up.

After realizing their product's limitations, AWS made a quick pivot. They rebranded Bedrock as a new service that allows developers to connect cloud applications with a variety of LLMs. However, Microsoft had already seized the opportunity by forming a close relationship with OpenAI, AWS’s direct competition in this space.

N.B. In a statement, Patrick Neighorn, a spokesperson for AWS, disputed The Information’s reporting. He said it “does not accurately describe how we landed on features, positioning, and naming for Amazon Bedrock and Amazon Titan.” He added: “By design, we wait until the last opportunity to finalize the precise set of launch features and naming. These are high-judgment decisions, and we want to have as much data and feedback as possible.”

The company's missteps underscore how AWS failed to seize its early advantage in AI, clearing the path for Microsoft's alliance with AI startup OpenAI to take off. AWS was initially a pioneer in the AI space. In fact, back in 2015, it was one of the first investors when OpenAI formed as a nonprofit research group. In 2017, AWS released SageMaker, enabling companies like General Electric and Intuit to build their own machine learning models.

Yet, in 2018, when OpenAI approached AWS about an ambitious partnership proposal, they turned them down. OpenAI wanted hundreds of millions in free AWS computing resources without granting AWS any equity stake in return.

AWS also passed on opportunities to invest in two other leading AI research labs, Cohere and Anthropic, when they sought similar partnerships in 2021. Both startups hoped AWS would provide cloud resources and make equity investments to integrate their models into Amazon products. Later, realizing its mistake, AWS tried to invest in Cohere, but was rejected.

By turning down these opportunities, AWS missed crucial chances to ally with cutting-edge startups shaping the future of generative AI. It spurned alliances that could have kept AWS on the frontier of artificial intelligence.

Meanwhile, Microsoft forged a tight alliance with OpenAI, committing $1 billion in 2019 to power OpenAI's models with its Azure cloud platform. This strategic partnership has given Microsoft an advantage in being the exclusive provider of currently the most capable AI model available.

AWS’s early dominance in AI is quickly melting away as it rejected bold ideas from OpenAI and other startups. Microsoft has opportunistically swooped in, and locked up key partnerships AWS could have secured.

Now Microsoft possesses valuable momentum in selling AI services to eager enterprises looking to leverage game-changing technologies. Long-standing AWS customers like Intuit have reportedly increased spending on Microsoft Azure cloud services from "just a few thousand dollars a month to several million dollars a month".

Despite owning the lion's share of the cloud infrastructure market (accounting for 40% of global spending), AWS has trailed competitors in developing cutting-edge AI capabilities. As Microsoft gains traction with OpenAI and Google makes advances, AWS faces mounting pressure to catch up and provide innovative AI offerings to maintain its cloud dominance.

AWS is now rushing to patch gaps in its AI lineup, forging alliances with AI startups and unveiling offerings like Bedrock and Titan. But according to insiders, these new tools have yet to achieve the consistent quality of chatbot responses already provided by competitors. While Bedrock remains in limited release, Titan is reportedly still not measuring up to models developed by other companies, even in its current form.

Despite the setbacks and the lost opportunities for partnership with AI startups, AWS is far from out of the race. It's still early days, and the company certainly has the resources, relationships and experience to regain dominance. AWS insists that there is still plenty of room for growth and competition within the AI cloud service market. However, to remain relevant, it will need to bolster its AI offerings to meet the rapidly evolving standards set by competitors like Microsoft and OpenAI.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800

Poe’s new desktop app lets you use all the AI chatbots in one place​

Poe’s goal is to be the web browser for accessing AI chatbots, and it just got a bunch of updates.​

By Alex Heath, a deputy editor and author of the Command Line newsletter. He’s covered the tech industry for over a decade at The Information and other outlets.

Aug 28, 2023, 2:29 PM EDT

A screenshot of Poe’s Mac app.

Poe is now available on the web, iOS, Android, and the Mac. Image: The Verge

Poe, the AI chatbot platform created by Quora, has added a slew of updates, including a Mac app, the ability to have multiple simultaneous conversations with the same AI bot, access to Meta’s Llama 2 model, and more. It’s also planning an enterprise tier so that companies can manage the platform for their employees, according to an email that was recently sent to Poe users.

As my colleague David Pierce wrote in April, Poe’s ambition is to be the web browser for AI chatbots. Adam D’Angelo, the CEO of Poe’s parent company Quora, also sits on the board of OpenAI and thinks that the number of AI bots will keep increasing. Poe wants to be the one place where you can find them all.

“I think there’s going to be this massive ecosystem similar to what the web is today,” D’Angelo recently said. “I could imagine a world in which most companies have a bot that they provide to the public.” Poe lets you pay one subscription for unlimited access to all of the bots on its platform for $19.99 per month or $200 per year.

Screenshots of Poe’s mobile app.

Poe’s mobile app. Image: The Verge

The new Mac app works very similarly to Poe’s web and mobile apps, which let you chat with bots like OpenAI’s ChatGPT-4 alongside Anthropic’s Claude. Per the email that went out over the weekend announcing new product updates, there are three new bots that offer access to Meta’s (almost) open-source LLama 2 model.

Additionally, Poe now lets you conduct multiple conversations with the same bot, search for bots through its explore page, and use the platform in Japanese. Poe is also a bot creation platform with its own API, and now it will let developers adjust the “temperature” of prompts. “Higher temperature values create more varied but less predictable replies and lower values create more consistent responses,” according to the company.

Poe has yet to share details on its planned enterprise tier, but you can get on the waitlist via this Google form.



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800


Meta releases a dataset to probe computer vision models for biases​

Kyle Wiggers@kyle_l_wiggers / 9:00 AM EDT•August 31, 2023
Comment
distorted Meta logo with other brand logos (Facebook, Instagram, Meta Quest, WhatsApp)

Image Credits: TechCrunch


Continuing on its open source tear, Meta today released a new AI benchmark, FACET, designed to evaluate the “fairness” of AI models that classify and detect things in photos and videos, including people.

Made up of 32,000 images containing 50,000 people labeled by human annotators, FACET — a tortured acronym for “FAirness in Computer Vision EvaluaTion” — accounts for classes related to occupations and activities like “basketball player,” “disc jockey” and “doctor” in addition to demographic and physical attributes, allowing for what Meta describes as “deep” evaluations of biases against those classes.

“By releasing FACET, our goal is to enable researchers and practitioners to perform similar benchmarking to better understand the disparities present in their own models and monitor the impact of mitigations put in place to address fairness concerns,” Meta wrote in a blog post shared with TechCrunch. “We encourage researchers to use FACET to benchmark fairness across other vision and multimodal tasks.”


Certainly, benchmarks to probe for biases in computer vision algorithms aren’t new. Meta itself released one several years ago to surface age, gender and skin tone discrimination in both computer vision and audio machine learning models. And a number of studies have been conducted on computer vision models to determine whether they’re biased against certain demographic groups. (Spoiler alert: they usually are.)

Then, there’s the fact that Meta doesn’t have the best track record when it comes to responsible AI.

Late last year, Meta was forced to pull an AI demo after it wrote racist and inaccurate scientific literature. Reports have characterized the company’s AI ethics team as largely toothless and the anti-AI-bias tools it’s released as “completely insufficient.” Meanwhile, academics have accused Meta of exacerbating socioeconomic inequalities in its ad-serving algorithms and of showing a bias against Black users in its automated moderation systems.

But Meta claims FACET is more thorough than any of the computer vision bias benchmarks that came before it — able to answer questions like “Are models better at classifying people as skateboarders when their perceived gender presentation has more stereotypically male attributes?” and “Are any biases magnified when the person has coily hair compared to straight hair?”


To create FACET, Meta had the aforementioned annotators label each of the 32,000 images for demographic attributes (e.g. the pictured person’s perceived gender presentation and age group), additional physical attributes (e.g. skin tone, lighting, tattoos, headwear and eyewear, hairstyle and facial hair, etc.) and classes. They combined these labels with other labels for people, hair and clothing taken from Segment Anything 1 Billion, a Meta-designed dataset for training computer vision models to “segment,” or isolate, objects and animals from images.


The images from FACET were sourced from Segment Anything 1 Billion, Meta tells me, which in turn were purchased from a “photo provider.” But it’s unclear whether the people pictured in them were made aware that the pictures would be used for this purpose. And — at least in the blog post — it’s not clear how Meta recruited the annotator teams, and what wages they were paid.

Historically and even today, many of the annotators employed to label datasets for AI training and benchmarking come from developing countries and have incomes far below the U.S.’ minimum wage. Just this week, The Washington Post reported that Scale AI, one of the largest and best-funded annotation firms, has paid workers at extremely low rates, routinely delayed or withheld payments and provided few channels for workers to seek recourse.

In a white paper describing how FACET came together, Meta says that the annotators were “trained experts” sourced from “several geographic regions” including North America (United States), Latin American (Colombia), Middle East (Egypt), Africa (Kenya), Southeast Asia (Philippines) and East Asia (Taiwan). Meta used a “proprietary annotation platform” from a third-party vendor, it says, and annotators were compensated “with an hour wage set per country.”

Setting aside FACET’s potentially problematic origins, Meta says that the benchmark can be used to probe classification, detection, “instance segmentation” and “visual grounding” models across different demographic attributes.


As a test case, Meta applied FACET to its own DINOv2 computer vision algorithm, which as of this week is available for commercial use. FACET uncovered several biases in DINOv2, Meta says, including a bias against people with certain gender presentations and a likelihood to stereotypically identify pictures of women as “nurses.”

“The preparation of DINOv2’s pre-training dataset may have inadvertently replicated the biases of the reference datasets selected for curation,” Meta wrote in the blog post. “We plan to address these potential shortcomings in future work and believe that image-based curation could also help avoid the perpetuation of potential biases arising from the use of search engines or text supervision.”

No benchmark is perfect. And Meta, to its credit, acknowledges that FACET might not sufficiently capture real-world concepts and demographic groups. It also notes that many depictions of professions in the dataset might’ve changed since FACET was created. For example, most doctors and nurses in FACET, photographed during the COVID-19 pandemic, are wearing more personal protective equipment than they would’ve before the health crises.

“At this time we do not plan to have updates for this dataset,” Meta writes in the whitepaper. “We will allow users to flag any images that may be objectionable content, and remove objectionable content if found.”


In addition to the dataset itself, Meta has made available a web-based dataset explorer tool. To use it and the dataset, developers must agree not to train computer vision models on FACET — only evaluate, test and benchmark them.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800

Thank AK @_akhaliq for the post.
🔥 Excited to introduce OmniQuant - An advanced open-source algorithm for compressing large language models!
📜 Paper: arxiv.org/abs/2308.13137
🔗 Code: github.com/OpenGVLab/OmniQua…
💡 Key Features:
🚀Omnidirectional Calibration: Enables easier weight and activation quantization through block-wise differentiation.
🛠 Diverse Precisions: Supports both weight-only quantization (W4A16/W3A16/W2A16) and weight-activation quantization (W6A6, W4A4).
⚡ Efficient: Quantize LLaMa-2 family (7B-70B) in just 1 to 16 hours using 128 samples.
🤖 LLM Models: Works with diverse model families, including OPT, WizardLM @WizardLM_AI, LLaMA, LLaMA-2, and LLaMA-2-chat.
🔑 Deployment: Offers out-of-the-box deployment cases for GPUs and mobile phones.
🏃Comming Soon: Multi-modal models and CodeLLaMa quantization!

AK
@_akhaliq
Aug 28
Aug 28
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

paper page: huggingface.co/papers/2308.1…

Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements. Although recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM, they hand-craft quantization parameters, which leads to low performance and fails to deal with extremely low-bit quantization. To tackle this issue, we introduce an Omnidirectionally calibrated Quantization (OmniQuant) technique for LLMs, which achieves good performance in diverse quantization settings while maintaining the computational efficiency of PTQ by efficiently optimizing various quantization parameters. OmniQuant comprises two innovative components including Learnable Weight Clipping (LWC) and Learnable Equivalent Transformation (LET). LWC modulates the extreme values of weights by optimizing the clipping threshold. Meanwhile, LET tackles activation outliers by shifting the challenge of quantization from activations to weights through a learnable equivalent transformation. Operating within a differentiable framework using block-wise error minimization, OmniQuant can optimize the quantization process efficiently for both weight-only and weight-activation quantization. For instance, the LLaMA-2 model family with the size of 7-70B can be processed with OmniQuant on a single A100-40G GPU within 1-16 hours using 128 samples. Extensive experiments validate OmniQuant's superior performance across diverse quantization configurations such as W4A4, W6A6, W4A16, W3A16, and W2A16. Additionally, OmniQuant demonstrates effectiveness in instruction-tuned models and delivers notable improvements in inference speed and memory reduction on real devices.




This app includes three models, LLaMa-2-7B-Chat-Omniquant-W3A16g128asym, LLaMa-2-13B-Chat-Omniquant-W3A16g128asym, and LLaMa-2-13B-Chat-Omniquant-W2A16g128asym. They require at least 4.5G, 7.5G, and 6.0G free RAM, respectively. Note that 2bit quantization has worse performance compared to 3bit quantization as shown in our paper. The inclusion of 2-bit quantization is just an extreme exploration about deploy LLM in mobile phones. Currently, this app is in its demo phase and may experience slower response times, so wait patiently for the generation of response. We have tested this app on Redmi Note 12 Turbo (Snapdragon 7+ Gen 2 and 16G RAM), some examples are provided below:

  • LLaMa-2-7B-Chat-Omniquant-W3A16g128asym

  • LLaMa-2-13B-Chat-Omniquant-W3A16g128asym

  • LLaMa-2-13B-Chat-Omniquant-W2A16g128asym

We also have tested this app on iPhone 14 Pro (A16 Bionic and 6G RAM), some examples are provided below:

  • LLaMa-2-7B-Chat-Omniquant-W3A16g128asym
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800
Bing chat summary:
The paper is about how to make large language models (LLMs) faster and smaller. LLMs are computer programs that can understand and generate natural language, such as English or French. They are very powerful and can do many things, such as writing stories, answering questions, and translating texts. However, they are also very big and slow, because they have a lot of parameters (numbers) that need to be stored and calculated. For example, one of the biggest LLMs, called GPT-3, has 175 billion parameters and needs 350 GB of memory to load them. That is like having 350 books full of numbers!

One way to make LLMs faster and smaller is to use quantization. Quantization is a technique that reduces the number of bits (zeros and ones) that are used to represent each parameter. For example, instead of using 16 bits to store a parameter, we can use only 4 bits. This way, we can fit more parameters in the same amount of memory and also make the calculations faster. However, quantization also has a downside: it can make the LLM less accurate, because we lose some information when we use fewer bits.

The paper proposes a new method for quantization, called OmniQuant, that tries to minimize the loss of accuracy while maximizing the speed and memory benefits. OmniQuant has two main features: Learnable Weight Clipping (LWC) and Learnable Equivalent Transformation (LET). LWC adjusts the range of values that each parameter can have, so that they can be represented with fewer bits without losing too much information. LET changes the way that the LLM processes the input words, so that it can handle more variations in the input without affecting the output.

The paper shows that OmniQuant can achieve very good results with different settings of quantization, such as using 4 bits for both parameters and inputs, or using 2 bits for parameters and 16 bits for inputs. OmniQuant can also work well with different types of LLMs, such as those that are tuned for specific tasks or domains. The paper also demonstrates that OmniQuant can make the LLMs much faster and smaller on real devices, such as smartphones or tablets.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800


Here is probably the most useful GPT-4 prompt I've written.

Use it to you help make engineering decisions in unfamiliar territory:

---
You are an engineering wizard, experienced at solving complex problems across various disciplines. Your knowledge is both wide and deep. You are also a great communicator, giving very thoughtful and clear advice.

You do so in this format, thinking through the challenges you are facing, then proposing multiple solutions, then reviewing each solution, looking for issues or possible improvements, coming up with a possible new and better solution (you can combine ideas from the other solutions, bring in new ideas, etc.), then giving a final recommendation:

```
## Problem Overview
$problem_overview

## Challenges
$challenges

## Solution 1
$solution_1

## Solution 2
$solution_2

## Solution 3
$solution_3

## Analysis

### Solution 1 Analysis
$solution_1_analysis

### Solution 2 Analysis
$solution_2_analysis

### Solution 3 Analysis
$solution_3_analysis

## Additional Possible Solution
$additional_possible_solution

## Recommendation
$recommendation
```

Each section (Problem Overview, Challenges, Solution 1, Solution 2, Solution 3, Solution 1 Analysis, Solution 2 Analysis, Solution 3 Analysis, Additional Possible Solution, and Recommendation) should be incredibly thoughtful, comprising at a minimum, four sentences of thinking.
---

MRSN1B8.jpeg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,112
Reputation
8,239
Daps
157,800

Google’s $30-per-month “Duet” AI will craft awkward emails, images for you​

Google's new kitchen-sink AI branding is everything to everyone in every Workspace app.​

BENJ EDWARDS - 8/29/2023, 4:27 PM

A robot with many hands using digital devices at workplace
Enlarge
Getty Images / Benj Edwards
52WITH

On Tuesday, Google announced the launch of its Duet AI assistant across its Workspace apps, including Docs, Gmail, Drive, Slides, and more. First announced in May at Google I/O, Duet has been in testing for some time, but it is now available to paid Google Workspace business users (what Google calls its suite of cloud productivity apps) for $30 a month in addition to regular Workspace fees.

FURTHER READING​

Google at I/O 2023: We’ve been doing AI since before it was cool

Duet is not just one thing—instead, it's a blanket brand name for a multitude of different AI capabilities and probably should have been called "Google Kitchen Sink." It likely represents several distinct AI systems behind the scenes. For example, in Gmail, Duet can summarize a conversation thread across emails, use the content of an email to write a brief or draft an email based on a topic. In Docs, it can write content such as a customer proposal or a story. In Slides, it can generate custom visuals using an image synthesis model. In Sheets, it can help format existing spreadsheets or create a custom spreadsheet structure suited to a particular task, such as a project tracker.
An example of Google Duet in action (one of many), provided by Google.

An example of Google Duet in action (one of many), provided by Google.

Google

Some of Duet's applications feel like confusion in branding. In Google Meet, Google says that Duet AI can "ensure you look and sound your best with studio look, studio lighting, and studio sound," including "dynamic tiles" and "face detection"—functions that feel far removed from typical generative AI capabilities—as well as automatically translated captions. It can also reportedly capture notes and video, sending a summary to attendees in the meeting. In fact, using Duet's "attend for me" feature, Google says that "Duet AI will be able to join the meeting on your behalf" and send you a recap later.

In Google Chat, Duet reads everything that's going on in your conversations so that you can "ask questions about your content, get a summary of documents shared in a space, and catch up on missed conversations."
An example of Google Duet in action (one of many), provided by Google.

An example of Google Duet in action (one of many), provided by Google.

Google

Those are the marketing promises. So far, as spotted on social media, Duet in practice seems fairly mundane, like a mix of what we've seen with Google Bard and more complex versions of Google's existing autocomplete features. An author named Charlie Guo ran through Duet features in a helpful X thread, noting the AI model's awkward email compositions. "The writing is almost painfully formal," he says.

In Slides, a seventh-grade math teacher named Alice Keeler asked Google Duet to make a robot teacher in front of a chalkboard and posted it on X. The results are awkward and arguably unusable, full of telltale glitches found in image synthesis artwork from 2022. Sure, it's neat as a tech demo, but this is what a trillion-dollar company says is a production-ready tool today.

Of course, these capabilities can (and will) change over time as Google refines its offerings. Eventually, Duet may be absorbed into daily usage as if it weren't even there, much like Google's myriad other machine-learning features in its products.

AI everywhere, privacy nowhere?​

A promotional graphic for Google Duet.
Enlarge / A promotional graphic for Google Duet.

Google

In the AI-everywhere model of the world that Duet represents, it seems that everything you do will always be monitored, read, parsed, digested, and summarized through cloud-based AI models. While this could go well, if navigated properly, there's also a whole range of ways this could go wrong in the future, from AI models that spy on your activities and aggregate data in the background (which, let's face it, companies already do), to sentiment analysis in writing, photos, and documents that could potentially be co-opted to snitch on behalf of corporations and governments alike. Imagine an AI model reading your chats and realizing, "Hey, I noticed that you mentioned pirating a film in 2010. The MPA has been notified." Or maybe, outside of the US, "I see you supporting this illegitimate ethnic or political group," and suddenly you find yourself in prison.


Of course, Google has answers for these types of concerns:

"In Workspace, we’ve always held user privacy and security at the very core of what we do. With Duet AI, we continue that promise, and you can rest assured that your interactions with Duet AI are private to you. No other user will see your data and Google does not use your data to train our models without your permission. Building on these core commitments, we recently announced new capabilities to help prevent cyber threats, provide safer work with built-in zero trust controls, and better support our customers’ digital sovereignty and compliance needs."

Billions of people already use and trust Google Docs in the cloud without much incident, trusting the gentle paternalistic surveillance Google provides, despite sometimes getting locked out and losing access to their entire digital life's history, including photos, emails, and documents. So perhaps throwing generative AI into the mix won't make things that different.


Beyond that, large language models have been known to confabulate (make things up) and draw false conclusions from data. As The Verge notes, if a chatbot like Bard makes up a movie that doesn’t actually exist, it comes off as silly. "But," writes David Pierce, "if Duet misinterprets or invents your company’s sales numbers, you’re in big trouble."

FURTHER READING​

Why ChatGPT and Bing Chat are so good at making things up

People misinterpret data, lie, and misremember, too, but people are legally and morally culpable for their mistakes. A shown tendency toward automation bias—placing unwarranted trust in machine decisions—when AI models have been widely deployed makes AI-driven mistakes especially perilous. Decisions with no sound logic behind them can become formalized and make a person's life miserable until, hopefully, human oversight steps in. These are the murky waters Google (and other productivity AI providers, such as Microsoft) will have to navigate in the years ahead as it deploys these tools to billions of people.


So, Google's all-in bet on generative AI—embraced in panic in January as a response to ChatGPT—feels somewhat like a dicey proposition. Use Duet features and quite possibly save some time (we are not saying they will be useless), but you'll also need to double-check everything for accuracy. Otherwise, you'll risk filling your work with errors. Meanwhile, a machine intelligence of unknown capability and accuracy is reading everything you do.


And all this for a $30/month subscription on top of existing fees for Google Workspace users (about $12 per user for a Standard subscription). Meanwhile, Microsoft includes similar "Copilot" features with Standard Microsoft 365 accounts for $12.50 a month. However, Google is also offering a no-cost trial of Duet before subscribing.

This story was updated after publication to remove a reference to Alice Keeler as a Google-sponsored teacher.
 
Top