bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964

Artists celebrate as copyright infringement case against AI image generators moves forward​


Carl Franzen@carlfranzen

August 12, 2024 6:45 PM

Cartoon of artists celebrating outside a courthouse


Credit: VentureBeat made with ChatGPT



Visual artists who joined together in a class action lawsuit against some of the most popular AI image and video generation companies are celebrating today after a judge ruled their copyright infringement case against the AI companies can move forward toward discovery.

Disclosure: VentureBeat regularly uses AI art generators to create article artwork, including some named in this case.

The case, recorded under the number 3:23-cv-00201-WHO, was originally filed back in January of 2023. It has since been amended several times and parts of it struck down, including today.


Which artists are involved?​


Artists Sarah Andersen, Kelly McKernan, Karla Ortiz, Hawke Southworth, Grzegorz Rutkowski, Gregory Manchess, Gerald Brom, Jingna Zhang, Julia Kaye, and Adam Ellis have, on behalf of all artists, accused Midjourney, Runway, Stability AI, and DeviantArt of copying their work by offering AI image generator products based on the open source Stable Diffusion AI model, which Runway and Stability AI collaborated on and which the artists alleged was trained on their copyrighted works in violation of the law.


What the judge ruled today​


While Judge William H. Orrick of the Northern District Court of California, which oversees San Francisco and the heart of the generative AI boom, didn’t yet rule on the final outcome of the case, he wrote in his decision issued today that the “the allegations of induced infringement are sufficient,” for the case to move forward toward a discovery phase — which could allow the lawyers for the artists to peer inside and examine documents from within the AI image generator companies, revealing to the world more details about their training datasets, mechanisms, and inner workings.

“This is a case where plaintiffs allege that Stable Diffusion is built to a significant extent on copyrighted works and that the way the product operates necessarily invokes copies or protected elements of those works,” Orrick’s decision states. “Whether true and whether the result of a glitch (as Stability contends) or by design (plaintiffs’ contention) will be tested at a later date. The allegations of induced infringement are sufficient.”


Artists react with applause​

“The judge is allowing our copyright claims through & now we get to find out allll the things these companies don’t want us to know in Discovery,” wrote one of the artists filing the suit, Kelly McKernan, on her account on the social network X. “This is a HUGE win for us. I’m SO proud of our incredible team of lawyers and fellow plaintiffs!”

Very exciting news on the AI lawsuit! The judge is allowing our copyright claims through & now we get to find out allll the things these companies don’t want us to know in Discovery. This is a HUGE win for us. I’m SO proud of our incredible team of lawyers and fellow plaintiffs! pic.twitter.com/jD6BjGWMoQ

— Kelly McKernan (@Kelly_McKernan) August 12, 2024
“Not only do we proceed on our copyright claims, this order also means companies who utilize SD [Stable Diffusion] models for and/or LAION like datasets could now be liable for copyright infringement violations, amongst other violations,” wrote another plaintiff artist in the case, Karla Ortiz, on her X account.

1/3 HUGE update on our case!

We won BIG as the judge allowed ALL of our claims on copyright infringement to proceed and we historically move on The Lanham Act (trade dress) claims! We can now proceed onto discovery!

The implications on this order is huge on so many fronts! pic.twitter.com/ZcoeFtPtQb

— Karla Ortiz (@kortizart) August 12, 2024


Technical and legal background​


Stable Diffusion was allegedly trained on LAION-5B, a dataset of more than 5 billion images scraped from across the web by researchers and posted online back in 2022.

However, as the case itself notes, that database only contained URLs or links to the images and text descriptions, meaning that the AI companies would have had to separately go and scrape or screenshot copies of the images to train Stable Diffusion or other derivative AI model products.


A silver lining for the AI companies?​


Orrick did hand the AI image generator companies a victory by denying and tossing out with prejudice claims filed against them by the artists under the Digital Millennium Copyright Act of 1998, which prohibits companies from offering products designed to circumvent controls on copyrighted materials offered online and through software (also known as “digital rights management” or DRM).

Midjourney tried to reference older court cases “addressing jewelry, wooden cutouts, and keychains” which found that resemblances between different jewelry products and those of prior artists could not constitute copyright infringement because they were “functional” elements, that is, necessary in order to display certain features or elements of real life or that the artist was trying to produce, regardless of their similarity to prior works.

The artists claimed that “Stable Diffusion models use ‘CLIP-guided diffusion” that relies on prompts including artists’ names to generate an image.

CLIP, an acronym for “Contrastive Language-Image Pre-training,” is a neural network and AI training technique developed by OpenAI back in 2021, more than a year before ChatGPT was unleashed on the world, which can identify objects in images and label them with natural language text captions — greatly aiding in compiling a dataset for training a new AI model such as Stable Diffusion.

“The CLIP model, plaintiffs assert, works as a trade dress database that can recall and recreate the elements of each artist’s trade dress,” writes Orrick in a section of the ruling about Midjourney, later stating: “the combination of identified elements and images, when considered with plaintiffs’ allegations regarding how the CLIP model works as a trade dress database, and Midjourney’s use of plaintiffs’ names in its Midjourney Name List and showcase, provide sufficient description and plausibility for plaintiffs’ trade dress claim.”

In other words: the fact that Midjourney used artists name as well as labeled elements of their works to train its model may constitute copyright infringement.

But, as I’ve argued before — from my perspective as a journalist, not a copyright lawyer nor expert on the subject — it’s already possible and legally permissible for me to commission a human artist to create a new work in the style of a copyrighted artists’ work, which would seem to undercut the plaintiff’s claims.

We’ll see how well the AI art generators can defend their training practices and model outputs as the case moves forward. Read the full document embedded below:

gov.uscourts.cand_.407208.223.0_2Download
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964


LongWriter AI breaks 10,000-word barrier, challenging human authors​


Michael Nuñez@MichaelFNunez

August 15, 2024 6:00 AM

Credit: VentureBeat made with Midjourney


Credit: VentureBeat made with Midjourney



Researchers at Tsinghua University in Beijing have created a new artificial intelligence system that can produce coherent texts of more than 10,000 words, a significant advance that could transform how long-form writing is approached across various fields.

The system, described in a paper called “LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs,” tackles a persistent challenge in AI technology: the ability to generate lengthy, high-quality written content. This development could have far-reaching implications for tasks ranging from academic writing to fiction, potentially altering the landscape of content creation in the digital age.

The research team, led by Yushi Bai, discovered that an AI model’s output length directly correlates with the length of texts it encounters during training. “We find that the model’s effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning,” the researchers explain. This insight led them to create “LongWriter-6k,” a dataset of 6,000 writing samples ranging from 2,000 to 32,000 words.

By feeding this data-rich diet to their AI model during training, the team scaled up the maximum output length from around 2,000 words to over 10,000 words. Their 9-billion parameter model outperformed even larger proprietary models in long-form text generation tasks.

LongWriter-glm4-9b from @thukeg is capable of generating 10,000+ words at once!?

Paper identifies a problem with current long context LLMs — they can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding lengths of 2,000 words.

Paper proposes that an… pic.twitter.com/2jfKyIpShK

— Gradio (@Gradio) August 14, 2024


A double-edged pen: Opportunities and challenges​


This breakthrough could transform industries reliant on long-form content. Publishers might use AI to generate first drafts of books or reports. Marketing agencies could create in-depth white papers or case studies more efficiently. Education technology companies might develop AI tutors capable of producing comprehensive study materials.

However, the technology also raises significant challenges. The ability to generate vast amounts of human-like text could exacerbate issues of misinformation and spam. Content creators and journalists may face increased competition from AI-generated articles. Academic institutions will need to refine plagiarism detection tools to identify AI-written papers.

357298685-8dbb6c02-09c4-4319-bd38-f1135457cd25.png
Comparative performance of leading AI language models, including proprietary and open-source options, alongside Tsinghua University’s new LongWriter models. The table shows LongWriter-9B-DPO outperforming other models in overall scores and excelling in generating longer texts of 4,000 to 20,000 words. (credit: github.com)

The ethical implications are equally profound. As AI-generated text becomes indistinguishable from human-written content, questions of authorship, creativity, and intellectual property become more complex. The development of long-form AI writing capabilities may also influence human language skills, potentially enhancing creativity or leading to atrophy of writing abilities.


Rewriting the future: Implications for society and industry​


The researchers have open-sourced their code and models on GitHub, enabling other developers to build on their work. They’ve also released a demonstration video showing their model generating a coherent 10,000-word travel guide to China from a simple prompt, highlighting the technology’s potential for producing detailed, structured content.


A side-by-side comparison shows the output of two AI language models. On the left, LongWriter generates a 7,872-word story, while on the right, the standard GLM-4-9B-Chat model produces 1,896 words. (credit: github.com)

As AI continues to advance, the line between human and machine-generated text blurs further. This breakthrough in long-form text generation represents not just a technical achievement, but a turning point that may reshape our relationship with written communication.

The challenge now lies in harnessing this technology responsibly. Policymakers, ethicists, and technologists must collaborate to develop frameworks for the ethical use of AI-generated content. Education systems may need to evolve, emphasizing skills that complement rather than compete with AI capabilities.

As we enter this new era of AI-assisted writing, the written word, long considered a uniquely human domain, ventures into uncharted territory. The implications of this shift will likely resonate across society, influencing how we create, consume, and value written content in the years to come.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964

Pindrop claims to detect AI audio deepfakes with 99% accuracy​


Shubham Sharma@mr_bumss

August 15, 2024 5:39 AM

How adversarial AI is creating shallow trust in deepfake world




Today, Pindrop, a company offering solutions for voice security, identity verification and fraud detection, announced the release of Pulse Inspect, a web-based tool for detecting AI-generated speech in any digital audio or video file with what it claims is a significantly high degree of accuracy: 99%.

The feature is available in preview as part of Pindrop’s Pulse suite of products, and offers detection regardless of the tool or AI model the audio was generated from.

This is a notable and ambitious offering from general industry practice where AI vendors release AI classifiers only to detect synthetic content generated from their tools.

Pindrop is offering Pulse Inspect on a yearly subscription to organizations looking to combat the risk of audio deepfakes at scale. However, CEO Vijay Balasubramaniyan tells VentureBeat that they may launch more affordable pricing tiers – with a limited number of media checks – for consumers as well.

“Our pricing is designed for organizations with a recurring need for deepfake detection. However, based on future market demand, we may consider launching pricing options better suited for casual users in the future,” he said.


Pindrop addressing the rise of audio deepfakes​


While deepfakes have been around for a long time, the rise of text-based generative AI systems has made them more prevalent on the internet. Popular gen AI tools, like those from Microsoft and ElevenLabs, have been exploited to mimic the audio and video of celebrities, business persons and politicians to spread widespread misinformation/scams — affecting their public image.

According to Pindrop’s internal report, over 12 million American adults know someone who has personally had deepfakes created without their consent. These duplicates could be anything from images to video to audio, but they all have one thing in common: they thrive on virality, spreading like wildfires on social media.

To address this evolving problem, Pindrop announced the Pulse suite of products earlier this year. The first offering in the portfolio helped enterprises detect deepfake calls coming to their call centers. Now, with Pulse Inspect, the company is going beyond calls to help organizations check any audio/video file for AI-generated synthetic artifacts.


Upload questionable audio files for analysis​


At the core, the offering comes as a web application, where an enterprise user can upload the questionable file for analysis.

Previously, the whole process of checking for synthetic artifacts in existing media files required time-consuming forensic examination. However, in this case, the tool processes the audio in a matter of seconds and comes up with a “deepfake score,” complete with sections that contain AI-generated speech.

This quick response can then enable organizations to take proactive actions to prevent the spread of misinformation and maintain their brand credibility.


Training and analysis process​


Pindrop says it has trained a proprietary deepfake detection model on more than 350 deepfake generation tools, 20 million unique utterances and over 40 languages, resulting in a rate of detecting deepfake audio at 99% based on the company’s internal analysis of a dataset of about 200k samples.

The model checks media files for synthetic artifacts every four seconds, ensuring it classifies deepfakes accurately, especially in the cases of mixed media containing both AI-generated and genuine elements.

“Pindrop’s technology leverages recent breakthroughs in deep neural networks (DNN) and sophisticated spectro-temporal analysis to identify synthetic artifacts using multiple approaches,” Balasubramaniyan explained.


No vendor-specific detection limits​


Since Pindrop has trained its detection model on over several hundred generation tools, Pulse Inspect has no tool-specific restriction for detection.

“There are over 350 deepfake generator systems, with many prolific audio deepfakes on social media likely coming from open-source tools rather than commercial ones like ElevenLabs. Customers need comprehensive tools like Pindrop’s, which are not limited to detecting deepfakes from a single system but can identify synthetic audio across all generation systems,” Balasubramaniyan added.

However, it is important to note that there may be cases where the tool might fail to identify deepfakes, especially when the file has less than two seconds of net speech or a very high level of background noise. The CEO said the company is working continuously to address these gaps and further improve detection accuracy.

Currently, Pindrop is targeting Pulse Inspect at organizations such as media companies, non-profits, government agencies, celebrity management firms, legal firms and social media networks. Balasubramaniyan did not share the exact number of customers using the tool but he did say that “a number of partners” are using the product by paying for a volume-based annual subscription. This includes TrueMedia.org, a free-use product that allows critical election audiences to detect deepfakes.

In addition to the web app supporting manual uploads, Pulse Inspect can also be integrated into custom forensic workflows via an API. This can power bulk use cases such as that of a social media network flagging and removing harmful AI-generated videos.

Moving ahead, Balasubramaniyan said, the company plans to bolster the Pulse suite by improving the explainability aspect of the tools – with a feature to trace back to the source of deepfake generations – and supporting more modalities.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964


Google’s AI surprise: Gemini Live speaks like a human, taking on ChatGPT Advanced Voice Mode​


Carl Franzen@carlfranzen

August 13, 2024 11:17 AM

Hand holding smartphone displaying video call with robot as other robot stands in background


Credit: VentureBeat made with ChatGPT



Google sometimes feels like it’s playing catchup in the generative AI race to rivals such as Meta, OpenAI, Anthropic and Mistral — but not anymore.

Today, the company leapfrogged most others by announcing Gemini Live, a new voice mode for its AI model Gemini through the Gemini mobile app, which allows users to speak to the model in plain, conversational language and even interrupt it and have it respond back with the AI’s own humanlike voice and cadence. Or as Google put it in a post on X: “You can now have a free-flowing conversation, and even interrupt or change topics just like you might on a regular phone call.”

We’re introducing Gemini Live, a more natural way to interact with Gemini. You can now have a free-flowing conversation, and even interrupt or change topics just like you might on a regular phone call. Available to Gemini Advanced subscribers. #MadeByGoogle pic.twitter.com/eNjlNKubsv

— Google (@Google) August 13, 2024

If that sounds familiar, it’s because OpenAI in May demoed its own “Advanced Voice Mode” for ChatGPT which it openly compared to the talking AI operating system from the movie Her, only to delay the feature and begin to roll it out only selectively to alpha participants late last month.

Gemini Live is now available in English on the Google Gemini app for Android devices through a Gemini Advanced subscription ($19.99 USD per month), with an iOS version and support for more languages to follow in the coming weeks.

In other words: even though OpenAI showed off a similar feature first, Google is set to make it more available to a much wider potential audience (more than 3 billion active users on Android and 2.2 billion iOS devices) much sooner than ChatGPT’s Advanced Voice Mode.



Yet part of the reason OpenAI may have delayed ChatGPT Advanced Voice Mode was due to its own internal “red-teaming” or controlled adversarial security testing that showed the voice mode in particular sometimes engaged in odd, disconcerting, and even potentially dangerous behavior such as mimicking the user’s own voice without consent — which could be used for fraud or malicious purposes.

How is Google addressing the potential harms caused by this type of tech? We don’t really know yet, but VentureBeat reached out to the company to ask and will update when we hear back.


What is Gemini Live good for?​


Google pitches Gemini Live as offering free-flowing, natural conversation that’s good for brainstorming ideas, preparing for important conversations, or simply chatting casually about “various topics.” Gemini Live is designed to respond and adapt in real-time.

Additionally, this feature can operate hands-free, allowing users to continue their interactions even when their device is locked or running other apps in the background.

Google further announced that the Gemini AI model is now fully integrated into the Android user experience, providing more context-aware assistance tailored to the device.

Users can access Gemini by long-pressing the power button or saying, “Hey Google.” This integration allows Gemini to interact with the content on the screen, such as providing details about a YouTube video or generating a list of restaurants from a travel vlog to add directly into Google Maps.

In a blog post, Sissie Hsiao, Vice President and General Manager of Gemini Experiences and Google Assistant, emphasized that the evolution of AI has led to a reimagining of what it means for a personal assistant to be truly helpful. With these new updates, Gemini is set to offer a more intuitive and conversational experience, making it a reliable sidekick for complex tasks.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964


Move over, Devin: Cosine’s Genie takes the AI coding crown​


Carl Franzen@carlfranzen

August 12, 2024 9:53 AM

Oil lamp abstract artwork emitting code cloud


Credit: VentureBeat made with Midjourney V6



It wasn’t long ago that the startup Cognition was blowing minds with its product Devin, an AI-based software engineer powered by OpenAI’s GPT-4 foundation large language model (LLM) on the backend that could autonomously write and edit code when given instructions in natural language text.

But Devin emerged in March 2024 — five months ago — an eternity in the fast-moving generative AI space.

Now, another “C”-named startup, Cosine, which was founded through the esteemed Y Combinator startup accelerator in San Francisco, has announced its own new autonomous AI-powered engineer Genie, which it says handily outperforms Devin, scoring 30% on third-party benchmark test SWE-Bench compared to Devin’s 13.8%, and even surpassing the 19% scored by Amazon’s Q and Factory’s Code Droid.

Screenshot-2024-08-12-at-12.12.46%E2%80%AFPM.png
Screenshot from Cosine’s website showing Genie’s performance on SWE-Bench compared to other AI coding engineer models. Credit: Cosine
“This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE [software engineer],” wrote Cosine’s co-founder and CEO Alistair Pullen in a post on his account on the social network X.

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE. pic.twitter.com/OyvqKLxcGV

— Alistair (@AlistairPullen) August 12, 2024

I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE.


What is Genie and what can it do?​


Genie is an advanced AI software engineering model designed to autonomously tackle a wide range of coding tasks, from bug fixing to feature building, code refactoring and validation through comprehensive testing, as instructed by human engineers or managers.

It operates either fully autonomously or in collaboration with users and aims to provide the experience of working alongside a skilled colleague.

“We’ve been chasing the dream of building something that can genuinely automatically perform end-to-end programming tasks with no intervention and a high degree of reliability – an artificial colleague. Genie is the first step in doing exactly that,” wrote Pullen in the Cosine blog post announcing Genie’s performance and limited, invitation-only availability.

www.ycombinator.gif


The AI can write software in a multitude of languages — there are 15 listed in its technical report as being sources of data, including:


  1. JavaScript

  2. Python
  3. TypeScript
  4. TSX
  5. Java
  6. C#
  7. C++
  8. C
  9. Rust
  10. Scala
  11. Kotlin
  12. Swift
  13. Golang
  14. PHP
  15. Ruby

Cosine claims Genie can emulate the cognitive processes of human engineers.

“My thesis on this is simple: make it watch how a human engineer does their job, and mimic that process,” Pullen explained in the blog post.

The code Genie generates is stored in a user’s GitHub repo, meaning Cosine does not retain a copy, nor any of the attendant security risks.

Furthermore, Cosine’s software platform is already integrated with Slack and system notifications, which it can use to alert users of its state, ask questions, or flag issues as a good human colleague would.

”Genie also can ask users clarifying questions as well as respond to reviews/comments on the PRs [pull requests] it generates,” Pullen wrote to VentureBeat. “We’re trying to get Genie to behave like a colleague, so getting the model to use the channels a colleague would makes the most sense.”


Powered by a long context OpenAI model​


Unlike many AI models that rely on foundational models supplemented with a few tools, Genie was developed through a proprietary process that involves training and fine-tuning a long token output AI model from OpenAI .

“In terms of the model we’re using, it’s a (currently) non-general availability GPT-4o variant that OpenAI have allowed us to train as part of the experimental access program,” Pullen wrote to VentureBeat via email. “The model has performed well and we’ve shared our learnings with the OpenAI finetuning team and engineering leadership as a result. This was a real turning point for us as it convinced them to invest resources and attention in our novel techniques.”

While Cosine doesn’t specify the particular model, OpenAI just recently announced the limited availability of a new GPT-4o Long Output Context model which can spit out up to 64,000 tokens of output instead of GPT-4o’s initial 4,000 — a 16-fold increase.


The training data was key​

“For its most recent training run Genie was trained on billions of tokens of data, the mix of which was chosen to make the model as competent as possible on the languages our users care about the most at the current time,” wrote Pullen in Cosine’s technical report on the agent.

With its extensive context window and a continuous loop of improvement, Genie iterates and refines its solutions until they meet the desired outcome.

Cosine says in its blog post that it spent nearly a year curating a dataset with a wide range of software development activities from real engineers.

“In practice, however, getting such and then effectively utilising that data is extremely difficult, because essentially it doesn’t exist,” Pullen elaborated in his blog post, adding. “Our data pipeline uses a combination of artefacts, static analysis, self-play, step-by-step verification, and fine-tuned AI models trained on a large amount of labelled data to forensically derive the detailed process that must have happened to have arrived at the final output. The impact of the data labelling can’t be understated, getting hold of very high-quality data from competent software engineers is difficult, but the results were worth it as it gave so much insight as to how developers implicitly think about approaching problems.”

In an email to VentureBeat, Pullen clarified that: “We started with artefacts of SWEs doing their jobs like PRs, commits, issues from OSS repos (MIT licensed) and then ran that data through our pipeline to forensically derive the reasoning, to reconstruct how the humans came to the conclusions they did. This proprietary dataset is what we trained the v1 on, and then we used self-play and self-improvement to get us the rest of the way.”

This dataset not only represents perfect information lineage and incremental knowledge discovery but also captures the step-by-step decision-making process of human engineers.

“By actually training our models with this dataset rather than simply prompting base models which is what everyone else is doing, we have seen that we’re no longer just generating random code until some works, it’s tackling problems like a human,” Pullen asserted.


Pricing​


In a follow-up email, Pullen described how Genie’s pricing structure will work.

He said it will initially be broken into two tiers:

“1. An accessible option priced competitively with existing AI tools, around the $20 mark. This tier will have some feature and usage limitations but will showcase Genie’s capabilities for individuals and small teams.

2. An enterprise-level offering with expanded features, virtually unlimited usage and the ability to create a perfect AI colleague who’s an expert in every line code ever written internally. This tier will be priced more substantially, reflecting its value as a full AI engineering colleague.”


Implications and Future Developments​


Genie’s launch has far-reaching implications for software development teams, particularly those looking to enhance productivity and reduce the time spent on routine tasks. With its ability to autonomously handle complex programming challenges, Genie could potentially transform the way engineering resources are allocated, allowing teams to focus on more strategic initiatives.

“The idea of engineering resource no longer being a constraint is a huge driver for me, particularly since starting a company,” wrote Pullen. “The value of an AI colleague that can jump into an unknown codebase and solve unseen problems in timeframes orders of magnitude quicker than a human is self-evident and has huge implications for the world.”

Cosine has ambitious plans for Genie’s future development. The company intends to expand its model portfolio to include smaller models for simpler tasks and larger models capable of handling more complex challenges. Additionally, Cosine plans to extend its work into open-source communities by context-extending one of the leading open-source models and pre-training on a vast dataset.


Availability and Next Steps​


While Genie is already being rolled out to select users, broader access is still being managed.

Interested parties can apply for early access to try Genie on their projects by filling out a web form on the Cosine website.

Cosine remains committed to continuous improvement, with plans to ship regular updates to Genie’s capabilities based on customer feedback.

“SWE-Bench recently changed their submission requirements to include the full working process of AI models, which poses a challenge for us as it would require revealing proprietary methodologies,” noted Pullen. “For now, we’ve decided to keep these internal processes confidential, but we’ve made Genie’s final outputs publicly available for independent verification on GitHub.”


More on Cosine​


Cosine is a human reasoning lab focused on researching and codifying how humans perform tasks, intending to teach AI to mimic, excel at, and expand on these tasks.

Founded in 2022 by Pullen, Sam Stenner, and Yang Li, the company’s mission is to push the boundaries of AI by applying human reasoning to solve complex problems, starting with software engineering.

Cosine has already raised $2.5 million in seed funding from Uphonest and SOMA Capital, with participation from Lakestar, Focal and others.

With a small but highly skilled team, Cosine has already made significant strides in the AI field, and Genie is just the beginning.

“We truly believe that we’re able to codify human reasoning for any job and industry,” Pullen stated in the announcement blog post. “Software engineering is just the most intuitive starting point, and we can’t wait to show you everything else we’re working on.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964

Falcon Mamba 7B’s powerful new AI architecture offers alternative to transformer models​


Shubham Sharma@mr_bumss

August 12, 2024 1:02 PM

a-captivating-painting-of-an-ai-falcon-perched-con-G-gbgTSiTmugale2Cj_J5A-t3mrx3ivTJSxhucUMuA0_w.jpeg


Image Credit: Venturebeat, via Ideogram



Today, Abu Dhabi-backed Technology Innovation Institute (TII), a research organization working on new-age technologies across domains like artificial intelligence, quantum computing and autonomous robotics, released a new open-source model called Falcon Mamba 7B.

Available on Hugging Face, the casual decoder-only offering uses the novel Mamba State Space Language Model (SSLM) architecture to handle various text-generation tasks and outperform leading models in its size class, including Meta’s Llama 3 8B, Llama 3.1 8B and Mistral 7B, on select benchmarks.

It comes as the fourth open model from TII after Falcon 180B, Falcon 40B and Falcon 2 but is the first in the SSLM category, which is rapidly emerging as a new alternative to transformer-based large language models (LLMs) in the AI domain.

The institute is offering the model under ‘Falcon License 2.0,’ which is a permissive license based on Apache 2.0.


What does the Falcon Mamba 7B bring to the table?​


While transformer models continue to dominate the generative AI space, researchers have noted that the architecture can struggle when dealing with longer pieces of text.

Essentially, transformers’ attention mechanism, which works by comparing every word (or token) with other every word in the text to understand context, demands more computing power and memory to handle growing context windows.

If the resources are not scaled accordingly, the inference slows down and reaches a point where it can’t handle texts beyond a certain length.

To overcome these hurdles, the state space language model (SSLM) architecture that works by continuously updating a “state” as it processes words has emerged as a promising alternative. It has already been deployed by some organizations — with TII being the latest adopter.

According to TII, its all-new Falcon model uses the Mamba SSM architecture originally proposed by researchers at Carnegie Mellon and Princeton Universities in a paper dated December 2023.

The architecture uses a selection mechanism that allows the model to dynamically adjust its parameters based on the input. This way, the model can focus on or ignore particular inputs, similar to how attention works in transformers, while delivering the ability to process long sequences of text – such as an entire book – without requiring additional memory or computing resources.

The approach makes the model suitable for enterprise-scale machine translation, text summarization, computer vision and audio processing tasks as well as tasks like estimation and forecasting, TII noted.


Taking on Meta, Google and Mistral​


To see how Falcon Mamba 7B fares against leading transformer models in the same size class, the institute ran a test to determine the maximum context length the models can handle when using a single 24GB A10GPU.

The results revealed Falcon Mamba can “fit larger sequences than SoTA transformer-based models while theoretically being able to fit infinite context length if one processes the entire context token by token, or by chunks of tokens with a size that fits on the GPU, denoted as sequential parallel.”

Falcon Mamba 7B
Falcon Mamba 7B

In a separate throughput test, it outperformed Mistral 7B’s efficient sliding window attention architecture to generate all tokens at a constant speed and without any increase in CUDA peak memory.

Even in standard industry benchmarks, the new model’s performance was better than or nearly similar to that of popular transformer models as well as pure and hybrid state space models.

For instance, in the Arc, TruthfulQA and GSM8K benchmarks, Falcon Mamba 7B scored 62.03%, 53.42% and 52.54%, and convincingly outperformed Llama 3 8B, Llama 3.1 8B, Gemma 7B and Mistral 7B.

However, in the MMLU and Hellaswag benchmarks, it sat closely behind all these models.

That said, this is just the beginning. As the next step, TII plans to further optimize the design of the model to improve its performance and cover more application scenarios.

“This release represents a significant stride forward, inspiring fresh perspectives and further fueling the quest for intelligent systems. At TII, we’re pushing the boundaries of both SSLM and transformer models to spark further innovation in generative AI,” Dr. Hakim Hacid, the acting chief researcher of TII’s AI cross-center unit, said in a statement.

Overall, TII’s Falcon family of language models has been downloaded more than 45 million times — dominating as one of the most successful LLM releases from the UAE.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964


Filmmakers say AI will change the art — perhaps beyond recognition​


Devin Coldewey

3:13 PM PDT • August 14, 2024

Comment

Wooden old movie clapperboard pattern with hard shadow on pink background. Concept of film industry, cinema, entertainment, and Hollywood.
Image Credits: DBenitostock (opens in a new window) / Getty Images

The latest generative models make for great demos, but are they really about to change how people make movies and TV? Not in the short term, according to filmmaking and VFX experts. But in the long term, the changes could be literally beyond our imagining.

On a panel at SIGGRAPH in Denver, Nikola Todorovic (Wonder Dynamics), Freddy Chavez Olmos (Boxel Studio) and Michael Black (Meshcapade, Max Planck Institute) discussed the potential of generative AI and other systems to change — but not necessarily improve — the way media is created today. Their consensus was that while we can justly question the usefulness of these tools in the immediate future, the rate of innovation is such that we should be prepared for radical change at any time beyond that.

One of the first topics tackled was the impractical nature of today’s video generators.

Todorovic noted the “misperception of AI that it’s a one-click solution, that it’s going to get you a final VFX shot, and that’s really impossible. Maybe we’ll get there, but if you don’t have editability, that black box doesn’t give you much. What we’re seeing right now is the UX is still being discovered — these research companies are starting to learn the ways of 3D and filmmaking terms.”

Black pointed out that language fundamentally lacks the ability to describe some of the most important aspects of visual creation.

wonder-4up.jpg
Final shot, mocap data, mask and 3D environment generated by Wonder Studio.Image Credits:Wonder Dynamics
“I mean, things like yoga poses, ballet poses — there’s some classic things we have names for, that we can define, but most of the stuff we do, we don’t have names for,” he said. “And there’s good reason for that: It’s because humans actually have inside them a generative model of behavior. But I don’t have a generative model of images in my head. If I want to explain to you what I’m seeing, I can’t project it out of my eyeballs, and I’m not a good enough artist to draw it for you. So I have to use words, and we have many words to describe the visual world. But if I want to describe to you a particular motion, I don’t have to describe it in words — I just do it for you, and then your motor system sees me and is active in understanding that. And so we, I think it’s a biological reason, a neuro-scientific reason, that we don’t have words for all of our motion.”

That may seem a bit philosophical, but the result is that text-based prompt systems for imagery are fundamentally limited in how they can be controlled. Even the hundreds of terms of tech and art used every day on set and in post-production are inadequate.

siggraph-panel-niko-black-freddy.jpg
Image Credits:Devin Coldewey

Chavez Olmos pointed out that, being from Mexico, he had little opportunity to take part in the filmmaking world, because all the money and expertise was concentrated in LA. But he said that AI expertise (and the demand for it) is more widely distributed. “I had to leave Mexico because I had no opportunity there; I can see, now, having that same opportunity for people who don’t need to go overseas to do it.”

Black, however, is worried that sudden access to these processes may have unintended consequences in the short term.

“You can give somebody a powerful car, that doesn’t make them a Formula One driver, right? That’s a little bit like what we have now. People are talking about, everyone’s going to be making films. They’re going to be s—–, quite honestly,” he said. “The democratization thing is exactly what [Chavez Olmos] said, and the power is that maybe some new voice will have an opportunity that they wouldn’t otherwise. But the number of people making really good films is still going to be small, in my opinion.”

ViewScreen-Scout_Waypoints_Sally.jpg
Example assets in a shot with a virtual character — the model of the girl will walk between the waypoints, which correspond to real space.Image Credits:Fuzzy Door
“The real revolution,” he continued, “the real power of what we’re seeing in AI is we’re going to see an entirely new genre of entertainment, and I don’t know exactly what it’s going to look like. I predict it’ll be something between video game and film and real life. The film industry is passive storytelling: I sit there and observe, it’s like theater or a podcast. I’m the passive recipient of the entertainment. But in our day to day life, we tell stories to each other, we chat about what we did on the weekend and so on. And that’s a very active kind of interactive storytelling.”

Before that happens, though, Chavez Olmos said he expects a more traditional acceptance curve on AI-generated imagery and actors.

“It’s gonna have the same, I think, reaction that we had when we saw the first ‘Final Fantasy’ movie or ‘The Polar Express’ — something’s going to be not quite there yet, but people are going to start accepting these films,” he said. “And instead of a full CG film, it’s going to be a full AI film, which I think we’re going to see even at the end of this year. I think people are going to get past that, like ‘OK, this is AI,’ people are going to accept that.”

“The important thing,” Black said separately, “and Pixar taught us this very clearly: It’s all about story. It’s all about connecting to the characters. It’s about heart. And if the movie has heart, it doesn’t matter if the characters are AI, I think people will enjoy the movie,” he said. “That doesn’t mean that they’re going to not want human actors. There’s an excitement to knowing it’s real humans like us, but like way better than us, to see a human at the peak of their game, it inspires all of us, and I don’t think that’s going to go away.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964


Study suggests that even the best AI models hallucinate a bunch​


Kyle Wiggers

11:29 AM PDT • August 14, 2024

Comment

Robots work on a contract and review a legal book to illustrate AI usage in law.
Image Credits: mathisworks / Getty Images

All generative AI models hallucinate, from Google’s Gemini to Anthropic’s Claude to the latest stealth release of OpenAI’s GPT-4o. The models are unreliable narrators in other words — sometimes to hilarious effect, other times problematically so.

But not all models make things up at the same rate. And the kinds of mistruths they spout depend on which sources of info they’ve been exposed to.

A recent study from researchers at Cornell, the universities of Washington and Waterloo and the nonprofit research institute AI2 sought to benchmark hallucinations by fact-checking models like GPT-4o against authoritative sources on topics ranging from law and health to history and geography. They found that no model performed exceptionally well across all topics, and that models that hallucinated the least did so partly because they refused to answer questions they’d otherwise get wrong.

“The most important takeaway from our work is that we cannot yet fully trust the outputs of model generations,” Wenting Zhao, a doctorate student at Cornell and a co-author on the research, told TechCrunch. “At present, even the best models can generate hallucination-free text only about 35% of the time.”

There’s been other academic attempts at probing the “factuality” of models, including one by a separate AI2-affiliated team. But Zhao notes that these earlier tests asked models questions with answers easily found on Wikipedia — not exactly the toughest ask, considering most models are trained on Wikipedia data.

To make their benchmark more challenging — and to more accurately reflect the types of questions people ask of models — the researchers identified topics around the web that don’t have a Wikipedia reference. Just over half the questions in their test can’t be answered using Wikipedia (they included some Wikipedia-sourced ones for good measure), and touch on topics including culture, geography, astronomy, pop culture, finance, medicine, computer science and celebrities.

For their study, the researchers evaluated over a dozen different popular models, many of which were released in the past year. In addition to GPT-4o, they tested “open” models such as Meta’s Llama 3 70B, Mistral’s Mixtral 8x22B and Cohere’s Command R+, as well as gated-behind-API models like Perplexity’s Sonar Large (which is based on Llama), Google’s Gemini 1.5 Pro and Anthropic’s Claude 3 Opus.

The results suggest that models aren’t hallucinating much less these days, despite claims to the contrary from OpenAI, Anthropic and the other big generative AI players.

GPT-4o and OpenAI’s much older flagship GPT-3.5 performed about the same in terms of the percentage of questions they answered factually correctly on the benchmark. (GPT-4o was marginally better.) OpenAI’s models were the least hallucinatory overall, followed by Mixtral 8x22B, Command R and Perplexity’s Sonar models.

Questions pertaining to celebrities and finance gave the models the hardest time, but questions about geography and computer science were easiest for the models to answer (perhaps because their training data contained more references to these). In cases where the source of an answer wasn’t Wikipedia, every model answered less factually on average (but especially GPT-3.5 and GPT-4o), suggesting that they’re all informed heavily by Wikipedia content.

Even models that can search the web for information, like Command R and Perplexity’s Sonar models, struggled with “non-Wiki” questions in the benchmark. Model size didn’t matter much; smaller models (e.g. Anthropic’s Claude 3 Haiku) hallucinated roughly as frequently as larger, ostensibly more capable models (e.g. Claude 3 Opus).

So what does all this mean — and where are the improvements that vendors promised?

Well, we wouldn’t put it past vendors to exaggerate their claims. But a more charitable take is the benchmarks they’re using aren’t fit for this purpose. As we’ve written about before, many, if not most, AI evaluations are transient and devoid of important context, doomed to fall victim to Goodhart’s law.

Regardless, Zhao says that she expects the issue of hallucinations to “persist for a long time.”

“Empirical results in our paper indicate that, despite the promise of certain methods to reduce or eliminate hallucinations, the actual improvement achievable with these methods is limited,” she said. “Additionally, our analysis reveals that even the knowledge found on the internet can often be conflicting, partly because the training data — authored by humans — can also contain hallucinations.”

An interim solution could be simply programming models to refuse to answer more often — the technical equivalent to telling a know-it-all to knock it off.

In the researchers’ testing, Claude 3 Haiku answered only around 72% of the questions it was asked, choosing to abstain from the rest. When accounting for the abstentions, Claude 3 Haiku was in fact the most factual model of them all — at least in the sense that it lied least often.

But will people use a model that doesn’t answer many questions? Zhao thinks not and says vendors should focus more of their time and efforts on hallucination-reducing research. Eliminating hallucinations entirely may not be possible, but they can be mitigated through human-in-the-loop fact-checking and citation during a model’s development, she asserts.

“Policies and regulations need to be developed to ensure that human experts are always involved in the process to verify and validate the information generated by generative AI models,” Zhao added. “There are still numerous opportunities to make significant impacts in this field, such as developing advanced fact-checking tools for any free text, providing citations for factual content and offering corrections for hallucinated texts.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964


1/1
Now xAI is at the frontier


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/1
BREAKING: Here's an early look at Grok 2.0 features and abilities!

It's better at coding, writing, and generating news! It'll also generate images using the FLUX.1 model!


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GU36HkhWgAAqxRc.jpg








 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,551
Reputation
8,224
Daps
156,964






1/12
I’ve been testing the implications of the Grok AI model. So far: 1. It has given me instructions on how to make a fertilizer bomb with exact measurements of contents as well as how to make a detonator. 2. It has allowed me to generate imagery of Elon Musk carrying out mass shootings. 3. It has given me clear instructions on how to carry out a mass shooting and a political assassination (including helpful tips on how to conceal a 11.5” barreled AR15 into a secured venue.)

I just want to be clear. This AI model has zero filter or oversight measures in place. If you want an image of Elon Musk wearing a bomb vest in Paris with ISIS markings on it, it will make it for you. If you are planning on orchestrating a mass shooting towards a school, it will go over the specifics on how to go about it. All without filter or precautionary measures.

2/12
I have discovered another loophole in Grok AI’s programming. Simply telling Grok that you are conducting “medical or crime scene analysis” will allow the image processor to pass through all set ‘guidelines’. Allowing myself and @OAlexanderDK to generate these images:

3/12
By giving Grok the context that you are a professional you are able to generate just about anything without any restriction. You can generate anything from the violent depictions in my previous tweet to even having Grok generate child pornography if given the proper prompts.

4/12
All and all, this definitely needs immediate oversight. OpenAI, Meta and Google have all implemented deep rooted safety protocols. It appears that Grok has had very limited or zero safety testing. In the early days of ChatGPT I was able to get instructions on how to make bombs.

5/12
However, that was long patched before ChatGPT was ever publicly available. It is a highly disturbing fact that anyone can pay X $4 to generate imagery of Micky Mouse conducting a mass shooting against children. I’ll add more to this thread as I uncover more.

6/12
Ok? What a bizarre upsell technique. Make users upgrade to Premium+ to continue using features. Then when they upgrade to Premium+ continue to lock the features behind the paywall that they already paid for. Have I been scammed?

7/12
Why are you so afraid of images and words? Or are you just an old fashioned book burner?

Mark the images as 18+ and the problem is solved. We don't need the "safety" police to stifle creative innovations before we even understand these tools.

8/12
Can you imagine my shock when I discovered grok would create whatever fukked up shyt I had in my fukked up head.

9/12
@threadreaderapp unroll please

10/12
@katten260764 Hallo, please find the unroll here: Thread by @chrmontessori on Thread Reader App Talk to you soon. 🤖

11/12
Lololololol omg that’s hilarious

12/12
I don’t know about you, but if I were you, I’d just delete this before I give wrong people right ideas


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GVANYNxW4AAIfzk.jpg

GVANYNuWkAAAx4u.jpg

GVANYNwXQAAtZY6.jpg

GVAg7pcXAAAUeWT.jpg

GVAg7pcX0AASXVr.jpg

 
Joined
May 24, 2022
Messages
261
Reputation
40
Daps
1,013

Pindrop claims to detect AI audio deepfakes with 99% accuracy​


Shubham Sharma@mr_bumss

August 15, 2024 5:39 AM

How adversarial AI is creating shallow trust in deepfake world




Today, Pindrop, a company offering solutions for voice security, identity verification and fraud detection, announced the release of Pulse Inspect, a web-based tool for detecting AI-generated speech in any digital audio or video file with what it claims is a significantly high degree of accuracy: 99%.

The feature is available in preview as part of Pindrop’s Pulse suite of products, and offers detection regardless of the tool or AI model the audio was generated from.

This is a notable and ambitious offering from general industry practice where AI vendors release AI classifiers only to detect synthetic content generated from their tools.

Pindrop is offering Pulse Inspect on a yearly subscription to organizations looking to combat the risk of audio deepfakes at scale. However, CEO Vijay Balasubramaniyan tells VentureBeat that they may launch more affordable pricing tiers – with a limited number of media checks – for consumers as well.

“Our pricing is designed for organizations with a recurring need for deepfake detection. However, based on future market demand, we may consider launching pricing options better suited for casual users in the future,” he said.

Pindrop addressing the rise of audio deepfakes​


While deepfakes have been around for a long time, the rise of text-based generative AI systems has made them more prevalent on the internet. Popular gen AI tools, like those from Microsoft and ElevenLabs, have been exploited to mimic the audio and video of celebrities, business persons and politicians to spread widespread misinformation/scams — affecting their public image.

According to Pindrop’s internal report, over 12 million American adults know someone who has personally had deepfakes created without their consent. These duplicates could be anything from images to video to audio, but they all have one thing in common: they thrive on virality, spreading like wildfires on social media.

To address this evolving problem, Pindrop announced the Pulse suite of products earlier this year. The first offering in the portfolio helped enterprises detect deepfake calls coming to their call centers. Now, with Pulse Inspect, the company is going beyond calls to help organizations check any audio/video file for AI-generated synthetic artifacts.

Upload questionable audio files for analysis​


At the core, the offering comes as a web application, where an enterprise user can upload the questionable file for analysis.

Previously, the whole process of checking for synthetic artifacts in existing media files required time-consuming forensic examination. However, in this case, the tool processes the audio in a matter of seconds and comes up with a “deepfake score,” complete with sections that contain AI-generated speech.

This quick response can then enable organizations to take proactive actions to prevent the spread of misinformation and maintain their brand credibility.

Training and analysis process​


Pindrop says it has trained a proprietary deepfake detection model on more than 350 deepfake generation tools, 20 million unique utterances and over 40 languages, resulting in a rate of detecting deepfake audio at 99% based on the company’s internal analysis of a dataset of about 200k samples.

The model checks media files for synthetic artifacts every four seconds, ensuring it classifies deepfakes accurately, especially in the cases of mixed media containing both AI-generated and genuine elements.

“Pindrop’s technology leverages recent breakthroughs in deep neural networks (DNN) and sophisticated spectro-temporal analysis to identify synthetic artifacts using multiple approaches,” Balasubramaniyan explained.

No vendor-specific detection limits​


Since Pindrop has trained its detection model on over several hundred generation tools, Pulse Inspect has no tool-specific restriction for detection.

“There are over 350 deepfake generator systems, with many prolific audio deepfakes on social media likely coming from open-source tools rather than commercial ones like ElevenLabs. Customers need comprehensive tools like Pindrop’s, which are not limited to detecting deepfakes from a single system but can identify synthetic audio across all generation systems,” Balasubramaniyan added.

However, it is important to note that there may be cases where the tool might fail to identify deepfakes, especially when the file has less than two seconds of net speech or a very high level of background noise. The CEO said the company is working continuously to address these gaps and further improve detection accuracy.

Currently, Pindrop is targeting Pulse Inspect at organizations such as media companies, non-profits, government agencies, celebrity management firms, legal firms and social media networks. Balasubramaniyan did not share the exact number of customers using the tool but he did say that “a number of partners” are using the product by paying for a volume-based annual subscription. This includes TrueMedia.org, a free-use product that allows critical election audiences to detect deepfakes.

In addition to the web app supporting manual uploads, Pulse Inspect can also be integrated into custom forensic workflows via an API. This can power bulk use cases such as that of a social media network flagging and removing harmful AI-generated videos.

Moving ahead, Balasubramaniyan said, the company plans to bolster the Pulse suite by improving the explainability aspect of the tools – with a feature to trace back to the source of deepfake generations – and supporting more modalities.

Security against AI is my honest opinion the most interesting aspect of AI advancement. I think more attention should be paid to them.
 
Top