The A.I Megathread (LLM , GPT , Development)

bnew · Jan 31, 2024

LLaVA-1.6: Improved reasoning, OCR, and world knowledge

LLaVA team presents LLaVA-1.6, with improved reasoning, OCR, and world knowledge. LLaVA-1.6 even exceeds Gemini Pro on several benchmarks.

llava-vl.github.io

LLaVA-v1.6-34B

bnew · Jan 31, 2024

https://archive.is/XqJ0b

Whatever Miqu is, it has some sort of special sauce. It gets an 83.5 on EQ-Bench (evaluated locally), surpassing *every other LLM in the world except GPT-4*. EQ-Bench has a 0.97 correlation w/ MMLU, and a 0.94 correlation w/ Arena Elo. It *beats* Mistral Medium - at Q4_K_M. I would strongly encourage @lmsysorg to add miqu to the leaderboard so we can properly test it.

I originally saw this intriguing EQ-Bench result in a random anon tweet that I can't find - replicated it myself, but if someone knows the link to it - please post in comments so I can credit the idea!

Also would be awesome if someone could check miqu for dataset contamination with EQ-Bench.

bnew · Jan 31, 2024

Mistral CEO confirms ‘leak’ of new open source AI model nearing GPT-4 performance

An anonymous user on ***** posted a link to the miqu-1-70b files on *****. The open source model approaches GPT-4 performance.

venturebeat.com

Mistral CEO confirms ‘leak’ of new open source AI model nearing GPT-4 performance

Carl Franzen @carlfranzen

January 31, 2024 10:44 AM

Overhead view of Eiffel tower in a Paris made of circuit boards.

Credit: VentureBeat made with Midjourney V6

The past few days have been a wild ride for the growing open source AI community — even by its fast-moving and freewheeling standards.

Here’s the quick chronology: on or about January 28, a user with the handle “Miqu Dev” posted a set of files on HuggingFace, the leading open source AI model and code sharing platform, that together comprised a seemingly new open source large language model (LLM) labeled “miqu-1-70b.”

The HuggingFace entry, which is still up at the time of this article’s posting, noted that new LLM’s “Prompt format,” how users interact with it, was the same as Mistral, the well-funded open source Parisian AI company behind Mixtral 8x7b, viewed by many to be the top performing open source LLM presently available, a fine-tuned and retrained version of Meta’s Llama 2.

Posted on *****

The same day, an anonymous user on ***** (possibly “Miqu Dev”) posted a link to the miqu-1-70b files on *****, the notoriously longstanding haven of online memes and toxicity, where users began to notice it.

Some took to X, Elon Musk’s social network formerly known as Twitter, to share the discovery of the model and what appeared to be its exceptionally high performance at common LLM tasks (measured by tests known as benchmarks), approaching the previous leader, OpenAI’s GPT-4 on the EQ-Bench.

Mistral quantized?

Machine learning (ML) researchers took notice on LinkedIn, as well.

“Does ‘miqu’ stand for MIstral QUantized? We don’t know for sure, but this quickly became one of, if not the best open-source LLM,” wrote Maxime Labonne, an ML scientist at JP Morgan & Chase, one of the world’s largest banking and financial companies. “Thanks to @152334H, we also now have a good unquantized version of miqu here: LinkedIn

The investigation continues. Meanwhile, we might see fine-tuned versions of miqu outperforming GPT-4 pretty soon.“

Quantization in ML refers to a technique used to make it possible to run certain AI models on less powerful computers and chips by replacing specific long numeric sequences in a model’s architecture with shorter ones.

Users speculated “Miqu” might be a new Mistral model being covertly “leaked” by the company itself into the world — especially since Mistral is known for dropping new models and updates without fanfare through esoteric and technical means — or perhaps an employee or customer gone rouge.

Confirmation from the top

Well, today it appears we finally have confirmation of the latter of those possibilities: Mistral co-founder and CEO Arthur Mensch took to X to clarify: “An over-enthusiastic employee of one of our early access customers leaked a quantised (and watermarked) version of an old model we trained and distributed quite openly…

To quickly start working with a few selected customers, we retrained this model from Llama 2 the minute we got access to our entire cluster — the pretraining finished on the day of Mistral 7B release. We’ve made good progress since — stay tuned!“

Hilariously, Mensch also appears to have taken to the illicit HuggingFace post not to demand a takedown, but leaving a comment that the poster “might consider attribution.”

Still, with Mensch’s note to “stay tuned!” it appears that not only is Mistral training a version of this so-called “Miqu” model that approaches GPT-4 level performance, but it may, in fact, match or exceed it, if his comments are to be interpreted generously.

A pivotal moment in open source AI and beyond?

That would be a watershed moment not just for open source generative AI but the entire field of AI and computer science: since its release back in March 2023, GPT-4 has remained the most powerful and highest performing LLM in the world by most benchmarks. Not even any of Google’s presently available, long-rumored Gemini models have been able to eclipse it — yet (according to some measures, the current Gemini models are actually worse than the older OpenAI GPT-3.5 model).

The release of an open source GPT-4 class model, which would presumably be functionally free to use, would likely place enormous competitive pressure on OpenAI and its subscription tiers, especially as more enterprises look to open source models, or a mixture of open source and closed source, to power their applications, as VentureBeat’s founder and CEO Matt Marshall recently reported. OpenAI may retain the edge with its faster GPT-4 Turbo and GPT-4V (vision), but the writing on the wall is pretty clear: the open source AI community is catching up fast. Will OpenAI have enough of a head start, and a metaphorical “moat” with its GPT Store and other features, to remain in the top spot for LLMs?

miqudev/miqu-1-70b · 🚩 Report: Ethical issue(s)

As per https://twitter.com/arthurmensch/status/1752737462663684344?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1752734898476007821%7Ctwgr%5E248b7b3af46b13ef4a1843ab2fcd7b68e3517283%7Ctwcon...

huggingface.co

bnew · Jan 31, 2024

Groq absolutely crushes on performance. You can try yourself on public chat http... | Hacker News

news.ycombinator.com

Groq LPU™ Inference Engine Crushes First Public LLM Benchmark - Groq

Hey Groq Prompters! We're thrilled to announce that Groq is now on the LLMPerf Leaderboard by Anyscale, a developer innovator and friendly competitor in

groq.com

Groq LPU™ Inference Engine Crushes First Public LLM Benchmark

Written by:
Groq

Groq Delivers up to 18x Faster LLM Inference Performance on Anyscale’s LLMPerf Leaderboard Compared to Top Cloud-based Providers

Source: GitHub - ray-project/llmperf-leaderboard
Hey Groq Prompters! We’re thrilled to announce that Groq is now on the LLMPerf Leaderboard by Anyscale, a developer innovator and friendly competitor in the Large Language Model (LLM) inference benchmark space. This benchmark includes a selection of LLM inference providers and the analysis focuses on evaluating for performance, reliability, and efficiency measured by:

Output Tokens Throughput (tokens/s): The average number of output tokens returned per second. This metric is important for applications that require high throughput, such as summarization and translation, and easy to compare across different models and providers.
Time to first token (TTFT): The duration of time that LLM returns the first token. TTFT is especially important for streaming applications that require low latency such as chatbots.

Not only is this our first public benchmark – it was a huge success. Meta AI’s Llama 2 70B running on the Groq LPU™ Inference Engine outperformed all other cloud-based inference providers at up to 18x faster for output tokens throughput.
Let’s walk through the Anyscale methodology in a bit more detail. This benchmark leverages:

A 550 input token count and a 150 output token count
The first metric, Output Tokens Throughput (aka the output speed) is determined by dividing the count of output tokens by the overall end-to-end time, which includes input tokens processing time and overall network latency.
For a full list of caveats and disclaimers for this benchmark, please refer to the documentation here.

On our end, we’d like to note:

All Llama 2 calculations on the LPU are done in FP16, but we store some of the weights in FP8.
We have no sparsity (i.e. we’re doing ALL of the Llama 2 matrix calculations and thus processing the entire model as provided by Meta AI).
This is noteworthy in general as FP16 should provide a higher quality of results for inference.

Now let’s look a bit more closely at the results for each metric.
For Output Tokens Throughput, Groq achieved an average of 185 tokens/s, a result that ranges 3-18x faster than any other cloud-based inference provider contributing to the leaderboard.
For Time to First Token, we hit 0.22s. Because of the deterministic design of the LPU, response times are consistent resulting in our API providing the smallest range of variability. This means more repeatability and less effort designing around potential latency issues or slow responses.

Source: GitHub - ray-project/llmperf-leaderboard
We’re proud and excited to be leading this leaderboard in the initial phase of our ongoing roadmap for performance enhancements.
Now, we already know what you’re thinking – “Groq has been saying they’re getting 270+ tokens per second per user for Llama-2 70B. What’s up with the difference?”
As mentioned, this benchmark leverages a 150 output token count and includes input processing time as part of the calculation, rather than just solely the output tokens throughput. For example, if you were to test with 1000 output tokens, the result would be closer to the 270+ tokens/s per user you see on chat.groq.com.
All in all, we couldn’t be more excited to participate in our first public benchmark results with the world, thanks to the work of our team at Groq and the help of the great team at Anyscale. We look forward to providing benchmarking for Llama 2 7B, and who knows, we just might mix things up, with a variety of experts, beyond that. (Much) more to come.

Interested in Alpha API Early Access?

On Monday, January 15th, we will start granting early access to the Groq API, enabling approved users to experiment with models like Llama 2-70B running on the Groq LPU Inference Engine. We will be approving select users weekly and will be increasing users until general access is available in the next sprint. For those interested in our API solutions, please reach out to us at api@groq.com.

bnew · Jan 31, 2024

Your Child’s Next Playmate Could Be An AI Toy Powered By ChatGPT

A host of startups are building robots and stuffed toys that can have full-fledged conversations with children, thanks to generative AI.

www.forbes.com

Your Child’s Next Playmate Could Be An AI Toy Powered By ChatGPT

ILLUSTRATION BY PHILIP SMITH FOR FORBES

Jan 20, 2024,06:30am EST

A host of startups are building robots and stuffed toys that can have full-fledged conversations with children, thanks to generative AI.

By Rashi Shrivastava, Forbes Staff

Six-year-old Sophia Valentina sits under a decorated Christmas tree as she unwraps her present: a tiny lavender-colored robot, whose face is a display and whose body is embedded with a speaker. “Hey Miko,” Sophia says, and the gadget lights up with round eyes and blue eyebrows.

In early December, Sara Galvan bought Miko Mini, a $99 robotic companion embedded with in-house AI models as well as OpenAI’s GPT-3.5 and GPT-4, with the hopes that it would help homeschool her daughters. Over the last month, Sophia has used Miko to solve math problems, listen to princess stories and ask questions like “how is Christmas celebrated,” Galvan said. “They begin to learn self-guided learning, which is huge for us with homeschool and helps expand their curiosity and their minds,” she said.

Miko, which can also play games like hide and seek, is part of a growing group of pricey GPT-powered robots rolling into the toy market. Some AI toys are touted as a screen-free form of entertainment that can engage children in conversations and playful learning, like Grok, a $99 AI plushie that can answer general questions (not to be confused with Elon Musk’s ChatGPT competitor Grok, though the toy Grok is voiced by his former girlfriend Grimes). Others claim to offer additional features beyond storytelling and learning activities. There’s Fawn, a $199 cuddly baby deer intended to provide emotional support, and Moxie, a $799 turquoise-colored robot that can recite affirmations and conduct mindfulness exercises. These robotic pals are designed to not only help children grow academically and improve communication skills but also teach them how to deal with their emotions during times of distress.

Sneh Vaswani, cofounder and CEO at Miko

COURTESY OF MIKO

Fostering social and emotional well-being is one of Miko’s intended functions, said CEO and cofounder Sneh Vaswani, who participated in several international robotics competitions before starting his company in 2015 and launching the first iteration of AI companion Miko in 2017. “Our goal is to help parents raise kids in the modern world by engaging, educating and entertaining children through multimodal interactions with robotics and AI,” he told Forbes.

Vaswani has sold almost 500,000 devices to date across more than 100 countries and expects to cross $50 million in revenue in the fiscal year ending in March 2024, he told Forbes. His Mumbai-based startup has raised more than $50 million and was last valued at about $290 million, according to Pitchbook.

Miko 3

COURTESY OF MIKO

Miko is trained on data curated from school curriculum, books and content from partners like Oxford University Press and is built using proprietary technology including facial and voice recognition, recommendation algorithms and a natural language processing layer, Vaswani said. The bot is programmed to detect different accents and provide educational content catered to the geographic region where it’s sold. The company has also teamed up with media giants like Disney and Paramount, allowing them to publish their content on Miko.

“There could be a storytelling app from Disney or a Ninja Turtles app from Paramount,” he told Forbes, adding, “It’s like a Netflix plus an iPhone on wheels given to a child.”

Other toys were built out of a desire to bring fictional characters to life. Misha Sallee and Sam Eaton, the cofounders of startup Curio Interactive — and the creators of Grok — were inspired to create the rocket-shaped AI plushie thanks to fond childhood memories of watching movies like Toy Story. But making toys speak intelligently was a far-fetched idea until ChatGPT came out, Sallee said. Grok is built on a variety of large language models that help it act like a talkative playmate and an encyclopedia for children. Canadian musician Grimes invested in the startup and voiced the characters, which are a part of what Sallee calls a “character universe.”

“As a mother, it resonated with her. It was something that she wanted to lean in and collaborate on,” Sallee said. “She wanted a screen-free experience for her kids and for kids around the world.” (Grimes did not respond to a request for comment.)

“It’s like a Netflix plus an iPhone on wheels given to a child.”

Sneh Vaswani, CEO and cofounder of Miko

Another plush AI toy is Fawn, a baby deer programmed with OpenAI’s large language model GPT-3.5 Turbo and text-to-speech AI from synthetic speech startup ElevenLabs. Launched in July 2023 by husband and wife duo Peter Fitzpatrick and Robyn Campbell, Fawn was designed to help children learn about and process their emotions while maintaining the tone and personality of an eight-year-old. Still in its early stages, the startup plans to ship its first orders before the end of March 2024.

Robyn Campbell, cofounder of AI startup Fawn Friends

COURTESY OF FAWN

“[Fawn] is very much like a cartoon character come to life,” said Campbell, who previously worked as a screenwriter at The LEGO Group. “We’ve created this character that has feelings, likes and dislikes that the child relates to.”

While generative AI is capable of spinning up make-believe characters and content, it tends to conjure inaccurate responses to factual questions. ChatGPT, for instance, struggles with simple math problems — and some of these AI toys have the same weakness. For instance, in a recent video review of GPT-powered robot Moxie, it incorrectly said 100 times 100 is 10. Paolo Pirjanian, CEO and founder of Embodied, Inc., the company behind Moxie, said that a “tutor mode” feature along with academic capabilities was announced in early January and will be available in the robots later this year. “Academic questions — paired with environmental factors like multiple speakers or background noise — can sometimes cause Moxie's AI to need further prompting,” Pirjanian said.

“If… the model invents an answer that’s not correct, that can create a serious misconception and these misconceptions are much harder to correct,” said Stefania Druga, a researcher at the Center for Applied AI at the University of Chicago.

In Fawn’s case, Campbell said the AI model has been stress tested to prevent it from veering into inappropriate topics of conversation. But, if the model makes up information, it’s often a desired outcome, Campbell said. “[Fawn] is not designed to be an educational toy. She's designed to be a friend who can tell you an elaborate story about a platypus. Her hallucinations are actually not a bug. They’re a feature,” she said.

bnew · Jan 31, 2024

THE CASE FOR THERAPY

For Moxie, the stakes are higher than other AI toys because it’s being marketed as a tool for social and emotional development. In 2021, Kristen Walmsley bought the robot on sale for $700 for her 10-year-old son, Oliver Keller who has an intellectual disability and ADHD. “We were really struggling with my son, and I was really desperate to find something that could help him. I bought it because it was advertised as a therapeutic device,” Walmsley tells Forbes.

Walmsley said that Oliver, who at first found the robot “creepy” and eventually warmed up to it, now uses it to share his feelings and recite positive affirmations. In one instance, when Oliver was overwhelmed and said he was feeling sad, the robot, which was already active and listening to the conversation, chimed in. “Sometimes I have to remind myself that I deserve to be happy. Please repeat this back to me: ‘I deserve to be happy,’” Moxie said.

Moxie
COURTESY OF EMBODIED, INC.

In another instance, Moxie and Oliver had a conversation about embarrassment and Moxie replied with affirmations about being confident. “It was impressive to see that it could do that because my son really struggles with low self esteem,” Walmsley said, adding that her son has repeated these affirmations to himself even when the robot is not around.

Paolo Pirjanian, CEO and founder of Embodied, Inc.
COURTESY OF EMBODIED, INC.

Moxie’s latest version is embedded with large language models like OpenAI’s GPT-4 and GPT-3.5. Pirjanian claims that the robot can conduct conversations that are modeled after cognitive behavior therapy sessions, which can help children identify and speak about their source of anxiety or stress, and offer mindfulness exercises. Valued at $135 million, the Pasadena-based startup has raised $80 million in total funding from entities like Sony, Toyota Ventures, Intel Capital and Amazon Alexa Fund. “We have this thing called animal breathing where Moxie will breathe like different kinds of animals just to make it fun for children,” he said.

Miko, whose screen can be used to receive video calls through a parent app, will also offer a therapeutic experience for kids. Vaswani told Forbes that he plans to introduce a new feature that would allow human therapists to conduct teletherapy on the robot’s screen. Parents would have to grant access to the therapist to access Miko.

As of now, the tiny robot isn’t suited for emotional support. In a Youtube reviewof the robot, Sasha Shtern, the CEO of Goally, a company that builds devices for children with ADHD and autism, tells Miko “I am nervous.” The robot responds “It’s okay to feel nervous about medical procedures but doctors and nurses are there to help you.” Miko spoke about medical procedures even though Shtern never mentioned anything related to that.

“It was like talking to an adult who's watching a football game and heard half my question," Shtern said in the video.

And Fawn can coach a child about how to talk about stressful situations (like getting bullied in school) with an adult without feeling embarrassed, Campbell said. She told Forbes that Fawn’s conversational AI has been fine-tuned with scripts she wrote based on child development frameworks derived from books like Brain Rules for Baby and peer reviewed research. The duo also consulted an expert in child development while developing their product.

“[Fawn’s] hallucinations are actually not a bug. They’re a feature.”

Robyn Campbell, cofounder at Fawn Friends

Moxie’s potential as a replacement for expensive therapists is part of the reason why the almost $800 robot is priced much higher than its competitors, Pirjanian said. He said the steep price is largely due to everything under the hood: a camera and sensors to detect and analyze facial expressions, a mechanical body that moves depending on the mood of the conversation and algorithms that screen out any harmful and inappropriate responses. “The technology that's in Moxie is more costly than what you find in an iPhone,” he said.

However, experts say generative AI has not yet reached a stage where it can be safely used for crucial tasks like therapy. “Providing therapy to a vulnerable population like kids or elders is very difficult to do for a human who specializes in this domain,” Druga told Forbes. “Delegating that responsibility to a system that we cannot fully understand or control is irresponsible.”

Then, there’s the privacy question. Other, less advanced versions of these toys haven’t had strong security measures to protect children’s data. For instance, Mattel’s Hello Barbie toy, an AI-powered doll that could tell jokes and sing songs, was deemed a “privacy nightmare” because hackers could easily access the recordings of children. Another doll, My Friend Cayla, raised alarms among privacy experts who found that it could be hacked via Bluetooth and could be used to send voice messages directly to children.

Newer startups have implemented guardrails to protect data privacy. Pirjanian said Moxie’s visual data is processed and stored on the device locally instead of the cloud. Transcripts of conversations are stripped of personally identifiable information and encrypted in the cloud before being used to retrain the AI model. Similarly, at Miko, children’s data is processed on the device itself. Hey Curio cofounder Sallee said that he and his team “take privacy seriously” and that its toys are compliant with the Children’s Online Privacy Protection Rule (COPPA). Fawn Friends does not record or store any data itself but is subject to OpenAI’s privacy policy, cofounder Fitzpatrick said.

Despite these precautions, some parents like Walmsley are concerned about their personal data leaking. Moxie has large round green eyes that follow a person around a room, she said, and the fact that it has a camera that can record everything happening in a room and her child’s emotional responses, makes her “a little uncomfortable.” But, she still thinks it could be a valuable tool for parents with special needs children.

“Seeing it come alive and actually help him regulate his emotions has made it worth every penny,” she said. “It’s done more than some of the therapies that we've tried for him.”

bnew · Jan 31, 2024

bnew · Feb 1, 2024

Coqui

Coqui, Freeing Speech.

coqui.ai

https://archive.is/cjvtL

https://archive.is/tSvoP

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production - coqui-ai/TTS

github.com

coqui/XTTS-v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Feb 1, 2024

AI2 open sources text-generating AI models -- and the data used to train them | TechCrunch

AI2, the AI research institute, has released several text-generating models in open source, along with the data used to train them.

techcrunch.com

AI2 open sources text-generating AI models — and the data used to train them

Kyle Wiggers @kyle_l_wiggers / 9:30 AM EST•February 1, 2024

Futuristic digital blockchain background. Abstract connections technology and digital network. 3d illustration of the Big data and communications technology.

Image Credits: v_alex / Getty Images

The Allen Institute for AI (AI2), the nonprofit AI research institute founded by late Microsoft co-founder Paul Allen, is releasing several GenAI language models it claims are more “open” than others — and, importantly, licensed in such a way that developers can use them unfettered for training, experimentation and even commercialization

Called OLMo, an acronym for “Open Language MOdels,” the models and the data set used to train them, Dolma — one of the largest public data sets of its kind — were designed to study the high-level science behind text-generating AI, according to AI2 senior software engineer Dirk Groeneveld.

“‘Open’ is an overloaded term when it comes to [text-generating models],” Groeneveld told TechCrunch in an email interview. “We expect researchers and practitioners will seize the OLMo framework as an opportunity to analyze a model trained on one of the largest public data sets released to date, along with all the components necessary for building the models.”

Open source text-generating models are becoming a dime a dozen, with organizations from Meta to Mistral releasing highly capable models for any developer to use and fine-tune. But Groeneveld makes the case that many of these models can’t really be considered open because they were trained “behind closed doors” and on proprietary, opaque sets of data.

By contrast, the OLMo models, which were created with the help of partners including Harvard, AMD and Databricks, ship with the code that was used to produce their training data as well as training and evaluation metrics and logs.

In terms of performance, the most capable OLMo model, OLMo 7B, is a “compelling and strong” alternative to Meta’s Llama 2, Groeneveld asserts — depending on the application. On certain benchmarks, particularly those touching on reading comprehension, OLMo 7B edges out Llama 2. But in others, particularly question-answering tests, OLMo 7B is slightly behind.

The OLMo models have other limitations, like low-quality outputs in languages that aren’t English (Dolma contains mostly English-language content) and weak code-generating capabilities. But Groeneveld stressed that it’s early days.

“OLMo is not designed to be multilingual — yet,” he said. “[And while] at this stage, the primary focus of the OLMo framework [wasn’t] code generation, to give a head start to future code-based fine-turning projects, OLMo’s data mix currently contains about 15% code.”

I asked Groeneveld whether he was concerned that the OLMo models, which can be used commercially and are performant enough to run on consumer GPUs like the Nvidia 3090, might be leveraged in unintended, possibly malicious ways by bad actors. A recent study by Democracy Reporting International’s Disinfo Radar project, which aims to identify and address disinformation trends and technologies, found that two popular open text-generating models, Hugging Face’s Zephyr and Databricks’ Dolly, reliably generate toxic content — responding to malevolent prompts with “imaginative” harmful content.

Groeneveld believes that the benefits outweigh the harms in the end.

“Building this open platform will actually facilitate more research on how these models can be dangerous and what we can do to fix them,” he said. “Yes, it’s possible open models may be used inappropriately or for unintended purposes. [However, this] approach also promotes technical advancements that lead to more ethical models; is a prerequisite for verification and reproducibility, as these can only be achieved with access to the full stack; and reduces a growing concentration of power, creating more equitable access.”

In the coming months, AI2 plans to release larger and more capable OLMo models, including multimodal models (i.e. models that understand modalities beyond text), and additional data sets for training and fine-tuning. As with the initial OLMo and Dolma release, all resources will be made available for free on GitHub and the AI project hosting platform Hugging Face.

bnew · Feb 1, 2024

Google launches an AI-powered image generator | TechCrunch

Google has launched a new image generation tool, ImageFX, powered by its recently released Imagen 2 GenAI model.

techcrunch.com

Google launches an AI-powered image generator

Kyle Wiggers @kyle_l_wiggers / 10:00 AM EST•February 1, 2024

Image Credits: Sean Gallup / Getty Images

Taylor Swift deepfakes be damned, Google is releasing a new AI-powered tool, ImageFX, for image creation.

Underpinned by Imagen 2, a GenAI image model developed by Google’s DeepMind team, ImageFX offers a prompt-based UI to create and edit images. That’s no different than tools like OpenAI’s DALL-E 3, Midjourney, Meta’s Imagine with Meta AI and Microsoft Designer. But ImageFX’s unique twist is “expressive chips” — basically a list of keyword suggestions that let users experiment with “adjacent dimensions” of their creations and ideas.

“Designed for experimentation and creativity, ImageFX lets you create images with a simple text prompt, then easily modify them with a new take on prompting using expressive chips,” Google writes in a blog post.

But what of the potential for abuse — especially in light of recent events?

Image Credits: Google

Google claims that it’s taken steps to ensure that ImageFX can’t be used in ways that it wasn’t intended, for example by adding “technical safeguards” to limit “problematic outputs” like violent, offensive and sexually explicit content. ImageFX also has a prompt-level filter for “named people,” presumably public figures — although Google wasn’t especially clear on that point in its press materials.

“We invested in the safety of training data from the outset,” Google said. “Consistent with our AI principles, we also conducted extensive adversarial testing and red teaming to identify and mitigate potential harmful and problematic content.”

As an additional safety measure, Google’s tagging images produced using ImageFX with SynthID, a digital watermark that’s allegedly robust against image edits and crops.

An image sample from Imagen 2.

“SynthID watermarks are imperceptible to the human eye but detectable for identification,” Google continues in the blog post. “With added insights in ‘About this image,’ you’ll know if an image may have been generated with Google’s AI tools when you come across it in Google Search or Chrome.”

You’ll find ImageFX in AI Test Kitchen, Google’s web app for experimental AI projects.

Imagen 2 expanded

In related news today, Google said that it’s bringing Imagen 2 to more of its products and services starting this week, including to its next-gen AI search experience and family of managed AI services Vertex AI.

Imagen 2 — which also now powers text-to-image capabilities in Google Ads and Duet AI in Workspace, Google’s GenAI suite of products for productivity — has made its way into Google’s SGE (Search Generative Experience). SGE, which began surfacing image generation tools for users in Google Image Search last October, now taps Imagen 2 for generating images. Users can enter a prompt specifying what sort of image they want and SGE will return four results directly in the SGE conversational experience.

Another sample from Imagen 2.

In Vertex AI, Imagen 2 is available through an API to Google Cloud customers. Elsewhere, Imagen 2 is now invokable through Bard, Google’s AI-driven chatbot.

“With Imagen 2, Bard understands simple or complex prompts so that you can generate a range of high-quality images,” Google explains. “Just type in a description — like ‘create an image of a dog riding a surfboard’ — and Bard will generate custom, wide-ranging visuals to help bring your idea to life.”

Google still hasn’t revealed the data it used to train Imagen 2, which — while disappointing — doesn’t exactly come as a surprise. It’s an open legal question as to whether GenAI vendors like Google can train a model on publicly available — even copyrighted — data and then turn around and commercialize that model.

Image Credits: Google

Relevant lawsuits are working their way through the courts, with vendors arguing that they’re protected by fair use doctrine. But it’ll be some time before the dust settles.

In the meantime, Google’s playing it safe by keeping quiet on the matter.

bnew · Feb 1, 2024

FCC moves to outlaw AI-generated robocalls | TechCrunch

No one likes robocalls to begin with, but using AI-generated voices of people like President Biden makes them even worse. As such the FCC is proposing

techcrunch.com

FCC moves to outlaw AI-generated robocalls

Devin Coldewey @techcrunch / 3:23 PM EST•January 31, 2024

An illustration of a humanoid robot emerging from a smartphone screen

Image Credits: Golden Sikorka / Getty Images

No one likes robocalls to begin with, but using AI-generated voices of people like President Biden makes them even worse. As such the FCC is proposing that using voice cloning tech in robocalls be ruled fundamentally illegal, making it easier to charge the operators of these frauds.

You may ask why it’s necessary if robocalls are illegal to begin with. In fact some automated calls are necessary and even desirable, and it’s only when a call operation is found to be breaking the law in some way that it becomes the business of the authorities.

For example, regarding the recent fake Biden calls in New Hampshire telling people not to vote, the attorney general there can (and did) say with confidence that the messages “appear to be an unlawful attempt to disrupt the New Hampshire Presidential Primary Election and to suppress New Hampshire voters.”

Under the law there, voter suppression is illegal and so, when they track down the perpetrators (and I’m emailing them constantly to find out if they have, by the way) that will be what they are charged with, likely among other things. But it remains that a crime must be committed, or reasonably suspected to have been committed, for the authorities to step in.

If employing voice cloning tech in automated calls, like what was obviously used on Biden, is itself illegal, that makes charging robocallers that much easier.

“That’s why the FCC is taking steps to recognize this emerging technology as illegal under existing law, giving our partners at State Attorneys General offices across the country new tools they can use to crack down on these scams and protect consumers,” said FCC Chairwoman Jessica Rosenworcel in a news release. They previously announced that they were looking into this back when the problem was relatively fresh.

The FCC already uses the Telephone Consumer Protection Act as the basis for charging robocallers and other telephone scammers. The TCPA already prohibits “artificial” voices, but it is not clear that cloned voices fall under that category. It’s arguable, for instance, that a company could use the generated voice of its CEO for legitimate business purposes.

But the fact is that legal applications of the tech are fewer in number and less immediately important than the illegal applications. Therefore the FCC proposes to issue a Declaratory Ruling that AI-powered voice cloning causes a call to fall under the “artificial” heading.

The law here is being rapidly iterated as telephone, messaging and generative voice tech all evolve. So don’t be surprised if it isn’t entirely clear what is and isn’t illegal, or why despite being obviously illegal, some calls or scams seem to operate with impunity. It’s a work in progress.

Update: FCC spokesman Will Wiquist told me that procedurally, this proposal will be propagated internally and voted on at Commissioners’ discretion. It will only be public when and if it is adopted.

bnew · Feb 1, 2024

Uncouth Savage · Feb 1, 2024

bnew said:
Your Child’s Next Playmate Could Be An AI Toy Powered By ChatGPT

A host of startups are building robots and stuffed toys that can have full-fledged conversations with children, thanks to generative AI.

www.forbes.com

Your Child’s Next Playmate Could Be An AI Toy Powered By ChatGPT

ILLUSTRATION BY PHILIP SMITH FOR FORBES

Miko, which can also play games like hide and seek, is part of a growing group of pricey GPT-powered robots rolling into the toy market. Some AI toys are touted as a screen-free form of entertainment that can engage children in conversations and playful learning, like Grok, a $99 AI plushie that can answer general questions (not to be confused with Elon Musk’s ChatGPT competitor Grok, though the toy Grok is voiced by his former girlfriend Grimes). Others claim to offer additional features beyond storytelling and learning activities. There’s Fawn, a $199 cuddly baby deer intended to provide emotional support, and Moxie, a $799 turquoise-colored robot that can recite affirmations and conduct mindfulness exercises. These robotic pals are designed to not only help children grow academically and improve communication skills but also teach them how to deal with their emotions during times of distress.

Seems intentional. Musk probably using his girlfriend to push this to kids.
Maybe a testing ground for other things he has planned.
Maybe thinking too much into it.
I do not trust the man.

bnew · Feb 1, 2024

bnew · Feb 1, 2024

Proof of Achievement of the First Artificial General Intelligence (AGI)

Creators

Olsher, Daniel1

Show affiliations

Description

We have created the first-ever Artificial General Intelligence (AGI) and first superintelligence.
Extensively proven for the US Government and in real-world application over many years, this paper provides the first detailed explanations of how and why the system works and conclusively proves that it does in fact deliver true AGI.
After first deriving the requirements for real-world AGI from first principles, the paper sets forth the techniques required for AGI achievement and presents the first-ever definitive test for AGI - the Olsher Test.
It engages key issues such as provable safety and responsibility and overcomes the work of Fjelland, Dreyfus, and other scholars who have previously argued that AGI can never be realized.
It then further updates Newell and Simon's Physical Symbol System hypothesis and learning theory for the modern era.
Finally, the paper explains a key corollary of the present work - that traditional approaches have been proven incapable of ever achieving AGI.

Proof of Achievement of the First Artificial General Intelligence (AGI)

We have created the first-ever Artificial General Intelligence (AGI) and first superintelligence. Extensively proven for the US Government and in real-world application over many years, this paper provides the first detailed explanations of how and why the system works and conclusively proves...

zenodo.org

https://zenodo.org/records/10528228/files/Proof%20of%20Achievement%20of%20First%20Artificial%20General%20Intelligence%20(AGI)%20-%20Olsher.pdf

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Veteran

Mistral CEO confirms ‘leak’ of new open source AI model nearing GPT-4 performance​

Posted on *****​

Mistral quantized?​

Confirmation from the top​

A pivotal moment in open source AI and beyond?​

Veteran

Groq LPU™ Inference Engine Crushes First Public LLM Benchmark​

Groq Delivers up to 18x Faster LLM Inference Performance on Anyscale’s LLMPerf Leaderboard Compared to Top Cloud-based Providers​

Interested in Alpha API Early Access?​

Veteran

Your Child’s Next Playmate Could Be An AI Toy Powered By ChatGPT​

A host of startups are building robots and stuffed toys that can have full-fledged conversations with children, thanks to generative AI.​

By Rashi Shrivastava, Forbes Staff​

Veteran

THE CASE FOR THERAPY​

Veteran

Veteran

Veteran

AI2 open sources text-generating AI models — and the data used to train them​

Veteran

Google launches an AI-powered image generator​

Imagen 2 expanded​

Veteran

FCC moves to outlaw AI-generated robocalls​

Veteran

All Star

Your Child’s Next Playmate Could Be An AI Toy Powered By ChatGPT​

Veteran

Veteran

Proof of Achievement of the First Artificial General Intelligence (AGI)​

Creators​

Description​

Mistral CEO confirms ‘leak’ of new open source AI model nearing GPT-4 performance

Posted on *****

Mistral quantized?

Confirmation from the top

A pivotal moment in open source AI and beyond?

Groq LPU™ Inference Engine Crushes First Public LLM Benchmark

Groq Delivers up to 18x Faster LLM Inference Performance on Anyscale’s LLMPerf Leaderboard Compared to Top Cloud-based Providers

Interested in Alpha API Early Access?

Your Child’s Next Playmate Could Be An AI Toy Powered By ChatGPT

A host of startups are building robots and stuffed toys that can have full-fledged conversations with children, thanks to generative AI.

By Rashi Shrivastava, Forbes Staff

THE CASE FOR THERAPY

AI2 open sources text-generating AI models — and the data used to train them

Google launches an AI-powered image generator

Imagen 2 expanded

FCC moves to outlaw AI-generated robocalls

Your Child’s Next Playmate Could Be An AI Toy Powered By ChatGPT

Proof of Achievement of the First Artificial General Intelligence (AGI)

Creators

Description