bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914


Computer Science > Computation and Language​

[Submitted on 18 Jan 2024]

ChatQA: Building GPT-4 Level Conversational QA Models​

Zihan Liu, Wei Ping, Rajarshi Roy, Peng Xu, Mohammad Shoeybi, Bryan Catanzaro
In this work, we introduce ChatQA, a family of conversational question answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.
Subjects:Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:arXiv:2401.10225 [cs.CL]
(or arXiv:2401.10225v1 [cs.CL] for this version)

Submission history​

From: Wei Ping [view email]
[v1] Thu, 18 Jan 2024 18:59:11 UTC (558 KB)



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914

Microsoft sets 16GB default for RAM for AI PCs – machines will also need 40 TOPS of AI compute: Report​

News

By Mark Tyson

published 1 day ago

You might thank AI for raising the entry-level PC RAM quota.

DDR5 memory modules

(Image credit: G.Skill)

Microsoft is seemingly lining up a number of new minimum specification levels for AI PCs that it hasn’t yet broadcast via official channels. We have heard from our own sources that AI PCs will move up the bar concerning minimum RAM configuration, and TrendForce appears to have heard the same thing and says that 16GB will be the minimum RAM configuration for Windows AI PCs. Meanwhile, both our own sources and TrendForce agree that new Windows PCs will require at least 40 TOPS of compute power to make the grade for labeling as an AI PC.

“Microsoft has set the baseline for DRAM in AI PCs at 16GB,” stated TrendForce in a press release about Microsoft Copilot on Wednesday. Thus, Windows will again be instrumental in driving growth for the minimum memory capacity acceptable in new PCs. Desktop users with easily accessible upgrade options might shrug, but those buying laptops and discovering they aren’t upgradeable due to soldered RAM should no longer have to filter out memory-starved systems - simply look for AI PCs.

Memory makers should be happy with a boost in the number of PCs sold with more memory as standard. Last year we reported on some of the biggest players in the industry slowing production to constrain supplies and achieve better prices. That seems to have hurt the revenue generation of Samsung and SK hynix during 2023, but the damage was partially self-inflicted.

As mentioned in the intro, this won’t be a one-dimensional change of PC system requirements. The expectation that a new PC will run Microsoft Copilot AI assistance in a slick and responsive manner also relies on adequate local acceleration. A minimum of 40 TOPS of computational power has been decided upon by Microsoft HQ. That might be provided by a discrete GPU, but PC processors are almost all now up to speed in building-in efficient NPUs that can meet or exceed that compute performance target.

We said 'almost' above, and that is an important caveat, as the combined CPU, GPU, and NPU power within Intel’s current Meteor Lake chips are said to reach 34 TOPS at best. TrendForce speculates that Intel Lunar Lake will address this baseline underperformance for AI PCs. Intel itself has said that Lunar will have three times the AI performance of its predecessor, Meteor Lake.

Other Windows PC processor makers like AMD and Qualcomm aren’t quite as far behind. The AMD Ryzen 8000 series (Strix Point) is expected to be capable of 45 TOPS. Qualcomm’s Snapdragon X Elite platform is also thought to deliver around 45 TOPS. It will be interesting to see if the Arm architecture processors from Qualcomm are as competitive using other performance metrics.

Windows 11 Copilot key

(Image credit: Microsoft)

There has been quite a lot of speculation about the upcoming wave of AI PCs, as the industry seems to be quite excited by the hardware refresh cycle they think it will inspire. When the Windows Copilot key was unveiled, we wondered about any minimum spec a device might need. However, it turned out that even entry-level modern PCs without onboard NPUs were given the green light to equip this key. Perhaps, more stringent AI PC labeling and minimum specs will come in the summer with Windows 12, we mused. Now, perhaps, we have at least a partial answer to AI PC and Windows 12 minimum specs: a system must have at least 16GB of RAM and a processor that can achieve at least 40 TOPS of AI compute.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914

A ‘Shocking’ Amount of the Web Is Already AI-Translated Trash, Scientists Determine​

Researchers warn that most of the text we view online has been poorly translated into one or more languages—usually by a machine.


By Jules Roscoe

January 17, 2024, 12:57pm
1705514234136-gettyimages-1434945213.jpeg

IMAGE:
DELMAINE DONSON VIA GETTY IMAGES


A “shocking” amount of the internet is machine-translated garbage, particularly in languages spoken in Africa and the Global South, a new study has found.

Researchers at the Amazon Web Services AI lab found that over half of the sentences on the web have been translated into two or more languages, often with increasingly worse quality due to poor machine translation (MT), which they said raised “serious concerns” about the training of large language models.

“We actually got interested in this topic because several colleagues who work in MT and are native speakers of low resource languages noted that much of the internet in their native language appeared to be MT generated,” Mehak Dhaliwal, a former applied science intern at AWS and current PhD student at the University of California, Santa Barbara, told Motherboard. “So the insight really came from the low-resource language speakers, and we did the study to understand the issue better and see how widespread it was.”

“With that said, everyone should be cognizant that content they view on the web may have been generated by a machine,” Dhaliwal added.

The study, which was submitted to the pre-print server arXiv last Thursday, generated a corpus of 6.38 billion sentences scraped from the web. It looked at patterns of multi-way parallelism, which describes sets of sentences that are direct translations of one another in three or more languages. It found that most of the internet is translated, as 57.1 percent of the sentences in the corpus were multi-way parallel in at least three languages.

Like all machine learning efforts, machine translation is impacted by human bias, and skews toward languages spoken in the Western world and the Global North. Because of this, the quality of the translations varies wildly, with “low-resource” languages from places like Africa having insufficient training data to produce accurate text.

“In general, we observed that most languages tend to have parallel data in the highest-resource languages,” Dhaliwal told Motherboard in an email. “Sentences are more likely to have translations in French than a low resource language, simply by virtue of there being much more data in French than a low resource language.”

High-resource languages, like English or French, tended to have an average parallelism of 4, meaning that sentences had translational equivalents in three other languages. Low-resource languages, like the African languages Wolof or Xhosa, had an average parallelism of 8.6. Additionally, lower-resource languages tended to have much worse translations.

“We find that highly multi-way parallel translations are significantly lower quality than 2-way parallel translation,” the researchers state in the paper. “The more languages a sentence has been translated into, the lower quality the translations are, suggesting a higher prevalence of machine translation.”

In highly multi-way parallel languages, the study also found a selection bias toward shorter, “more predictable” sentences of between 5-10 words. Because of how short the sentences were, researchers found it difficult to characterize their quality. However, “searching the web for the sentences was enlightening,” the study stated. “The vast majority came from articles that we characterized as low quality, requiring little or no expertise or advance effort to create, on topics like being taken more seriously at work, being careful about your choices, six tips for new boat owners, deciding to be happy, etc.”

The researchers argued that the selection bias toward short sentences from low-quality articles was due to “low quality content (likely produced to generate ad revenue) being translated via MT en masse into many lower resource languages (again likely for the purpose of generating ad revenue). It also suggests that such data originates in English and is translated into other languages.”

This means that a large portion of the internet in lower-resource languages is poorly machine-translated, which poses questions for the development of large language models in those languages, the researchers said.

“Modern AI is enabled by huge amounts of training data, typically several hundred billion tokens to a few trillion tokens,” the study states. “Training at this scale is only possible with web-scraped data. Our findings raise numerous concerns for multilingual model builders: Fluency (especially across sentences) and accuracy are lower for MT data, which could produce less fluent models with more hallucinations, and the selection bias indicates the data may be of lower quality, even before considering MT errors.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914

The rabbit r1 will use Perplexity AI’s tech to answer your queries​

Ivan Mehta @indianidle / 8:35 AM EST•January 19, 2024

rabbit-left

Image Credits: rabbit

One of the standout gadgets of this year’s Consumer Electronics Show (CES), the rabbit r1, will use Perplexity AI’s tech to answer user queries, both companies said in an announcement.

Perplexity noted that the first 100,000 r1 buyers will get one year of Perplexity Pro for free.


We're thrilled to announce our partnership with Rabbit: Together, we are introducing real-time, precise answers to Rabbit R1, seamlessly powered by our cutting-edge PPLX online LLM API, free from any knowledge cutoff. Plus, for the first 100,000 Rabbit R1 purchases, we're… pic.twitter.com/hJRehDlhtv

— Perplexity (@perplexity_ai) January 18, 2024


The $200 r1 made rounds at the CES show as an AI-first gadget that saves you the hassle of taking your phone out for tasks like performing web searches, playing a song on Spotify, and ordering a cab. The device doesn’t have a monthly subscription fee at the moment.

The device, designed by Teenage Engineering, has a 2.88-inch touchscreen, a push-to-talk button, a camera, a speaker, and two mics.

The company has already sold 50,000 devices in pre-orders. Earlier today, it opened pre-orders for the 6th production batch with another 50,000 devices. Rabbit said that customers living in the EU and UK will all receive their device by the end of July even if they just pre-order a device from the 6th batch.

Perplexity uses a mix of its own AI model as well as third-party models — Google’s Gemini, Mistra 7B, Anthropic’s Claude 2.1, and OpenAI’s GPT-4 — to get accurate information from the web. The tool has a chatbot interface on the web and mobile apps to let users ask questions in natural language. While Perplexity’s solution is different than traditional search engines, it competes with Google’s Bard and Microsoft’s Copilot along with You.com in the GenAI search space.

Earlier this month, Perplexity AI raised $73.6 million in investment — at a $520 million valuation — led by IVP with additional investments from NEA, Databricks Ventures, Nvidia, former Twitter VP Elad Gil, Shopify CEO Tobi Lutke, ex-GitHub CEO Nat Friedman, Vercel founder Guillermo Rauch, and Jeff Bezos.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914

Stability AI unveils smaller, more efficient 1.6B language model as part of ongoing innovation​

Sean Michael Kerner @TechJournalist

January 19, 2024 3:57 PM

Credit: VentureBeat made with Midjourney

Credit: VentureBeat made with Midjourney

Size certainly matters when it comes to large language models (LLMs) as it impacts where a model can run.


Stability AI, the vendor that is perhaps best known for its stable diffusion text to image generative AI technology, today released one of its smallest models yet, with the debut of Stable LM 2 1.6B. Stable LM is a text content generation LLM that Stability AI first launched in April 2023 with both 3 billion and 7 billion parameter models. The new StableLM model is actually the second model released in 2024 by Stability AI, following the company’s Stable Code 3B launched earlier this week.

The new compact yet powerful Stable LM model aims to lower barriers and enable more developers to participate in the generative AI ecosystem incorporating multilingual data in seven languages – English, Spanish, German, Italian, French, Portuguese, and Dutch. The model utilizes recent algorithmic advancements in language modeling to strike what Stability AI hopes is an optimal balance between speed and performance.

“In general, larger models trained on similar data with a similar training recipe tend to do better than smaller ones,” Carlos Riquelme, Head of the Language Team at Stability AI told VentureBeat. ” However, over time, as new models get to implement better algorithms and are trained on more and higher quality data, we sometimes witness recent smaller models outperforming older larger ones.”



Why smaller is better (this time) with Stable LM​

According to Stability AI, the model outperforms other small language models with under 2 billion parameters on most benchmarks, including Microsoft’s Phi-2 (2.7B), TinyLlama 1.1B,and Falcon 1B.

The new smaller Stable LM is even able to surpass some larger models, including Stability AI’s own earlier Stable LM 3B model.

“Stable LM 2 1.6B performs better than some larger models that were trained a few months ago,” Riquelme said. “If you think about computers, televisions or microchips, we could roughly see a similar trend, they got smaller, thinner and better over time.”

To be clear, the smaller Stable LM 2 1.6B does have some drawbacks due to its size. Stability AI in its release for the new model cautions that,”… due to the nature of small, low-capacity language models, Stable LM 2 1.6B may similarly exhibit common issues such as high hallucination rates or potential toxic language.”



Transparency and more data are core to the new model release​

The more toward smaller more powerful LLM options is one that Stability AI has been on for the last few months.

In December 2023, the StableLM Zephyr 3B model was released, providing more performance to StableLM with a smaller size than the initial iteration back in April.

Riquelme explained that the new Stable LM 2 models are trained on more data, including multilingual documents in 6 languages in addition to English (Spanish, German, Italian, French, Portuguese and Dutch). Another interesting aspect highlighted by Riquelme is the order in which data is shown to the model during training. He noted that it may pay off to focus on different types of data during different training stages.

Going a step further, Stability AI is making the new models available in with pre-trained and fine-tuned options as well as a format that the researchers describe as , “…the last model checkpoint before the pre-training cooldown.”

“Our goal here is to provide more tools and artifacts for individual developers to innovate, transform and build on top of our current model,” Riquelme said. “Here we are providing a specific half-cooked model for people to play with.”

Riquelme explained that during training, the model gets sequentially updated and its performance increases. In that scenario, the very first model knows nothing, while the last one has consumed and hopefully learned most aspects of the data. At the same time, Riquelme said that models may become less malleable towards the end of their training as they are forced to wrap up learning.

“We decided to provide the model in its current form right before we started the last stage of training, so that –hopefully– it’s easier to specialize it to other tasks or datasets people may want to use,” he said. “We are not sure if this will work well, but we really believe in people’s ability to leverage new tools and models in awesome and surprising ways.”





 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914

Google’s new ASPIRE system teaches AI the value of saying ‘I don’t know’​

Michael Nuñez @MichaelFNunez

January 18, 2024 2:06 PM

Credit: VentureBeat made with Midjourney

Credit: VentureBeat made with Midjourney

Google researchers are shaking up the AI world by teaching artificial intelligence to say “I don’t know.” This new approach, dubbed ASPIRE, could revolutionize how we interact with our digital helpers by encouraging them to express doubt when they’re unsure of an answer.

The innovation, showcased at the EMNLP 2023 conference, is all about instilling a sense of caution in AI responses. ASPIRE, which stands for “ Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs,” acts like a built-in confidence meter for AI, helping it to assess its own answers before offering them up.

Imagine you’re asking your smartphone for advice on a health issue. Instead of giving a potentially wrong answer, the AI might respond with, “I’m not sure,” thanks to ASPIRE. This system trains the AI to assign a confidence score to its answers, signaling how much trust we should put in its response.

The team behind this, including Jiefeng Chen and Jinsung Yoon from Google, is pioneering a shift towards more reliable digital decision-making. They argue that it’s crucial for AI, especially when it comes to critical information, to know its limits and communicate them clearly.

“LLMs can now understand and generate language at unprecedented levels, but their use in high-stakes applications is limited because they sometimes make mistakes with high confidence,” said Chen, a researcher at the University of Wisconsin-Madison and co-author of the paper.

Their research indicates that even smaller AI models equipped with ASPIRE can surpass larger ones that lack this introspective feature. This system essentially creates a more cautious and, ironically, a more reliable AI that can acknowledge when a human might be better suited to answer.

By promoting honesty over guesswork, ASPIRE is set to make AI interactions more trustworthy. It paves the way for a future where your AI assistant can be a thoughtful advisor rather than an all-knowing oracle, a future where saying “I don’t know” is actually a sign of advanced intelligence.[/SIZE]
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914

Beyond chatbots: The wide world of embeddings​

Ben dikkson @BenDee983

January 18, 2024 12:23 PM

Credit: VentureBeat made with Midjourney

Credit: VentureBeat made with Midjourney

The growing popularity of large language models (LLM) has also created interest in embedding models, deep learning systems that compress the features of different data types into numerical representations.

Embedding models are one of the key components of retrieval augmented generation (RAG), one of the important applications of LLMs for the enterprise. But the potential of embedding models goes beyond current RAG applications. The past year has seen impressive advances in embedding applications, and 2024 promises to have even more in stock.



How embeddings work​

The basic idea of embeddings is to transform a piece of data such as an image or text document into a list of numbers representing its most important features. Embedding models are trained on large datasets to learn the most relevant features that can tell different types of data apart.

For example, in computer vision, embeddings can represent important features such as the presence of certain objects, shapes, colors, or other visual patterns. In text applications, embeddings can encode semantic information such as concepts, geographical locations, persons, companies, objects, and more.

In RAG applications, embedding models are used to encode the features of a company’s documents. The embedding of each document is then stored in a vector store, a database that specializes in recording and comparing embeddings. At inference time, the application computes the embedding of new prompts and sends them to the vector database to retrieve the documents whose embedding values are closest to that of the prompt. The content of the relevant documents is then inserted into the prompt and the LLM is instructed to generate its responses based on those documents.

This simple mechanism plays a great role in customizing LLMs to respond based on proprietary documents or information that was not included in their training data. It also helps address problems such as hallucinations, where LLMs generate false facts due to a lack of proper information.



Beyond basic RAG​

While RAG has been an important addition to LLMs, the benefits of retrieval and embeddings go beyond matching prompts to documents.

“Embeddings are primarily used for retrieval (and maybe for nice visualizations of concepts),” Jerry Liu, CEO of LlamaIndex, told VentureBeat. “But retrieval itself is actually quite broad and extends beyond simple chatbots for question-answering.”

Retrieval can be a core step in any LLM use case, Liu says. LlamaIndex has been creating tools and frameworks to allow users to match LLM prompts to other types of tasks and data, such as sending commands to SQL databases, extracting information from structured data, long-form generation, or agents that can automate workflows.

“[Retrieval] is a core step towards augmenting the LLM with relevant context, and I imagine most enterprise LLM use cases will need to have retrieval in at least some form,” Liu said.

Embeddings can also be used in applications beyond simple document retrieval. For example, in a recent study, researchers at the University of Illinois at Urbana-Champaign and Tsinghua University used embedding models to reduce the costs of training coding LLMs. They developed a technique that uses embeddings to choose the smallest subset of a dataset that is also diverse and representative of the different types of tasks that the LLM must accomplish. This allowed them to train the model at a high quality with fewer examples.



Embeddings for enterprise applications​

“Vector embeddings introduced the possibility of working with any unstructured and semi-structured data. Semantic search—and, to be honest, RAG is a type of semantic search application—is just one use case,” Andre Zayarni, CEO of Qdrant, told VentureBeat. “Working with data other than textual (image, audio, video) is a big topic, and new multimodal transformers will make it happen.”

Qdrant is already providing services for using embeddings in different applications, including anomaly detection, recommendation, and time-series processing.

“In general, there are a lot of untapped use cases, and the number will grow with upcoming embedding models,” Zayarni said.

More companies are exploring the use of embedding models to examine the large amounts of unstructured data they are generating. For example, embeddings can help companies categorize millions of customer feedback messages or social media posts to detect trends, common themes, and sentiment changes.

“Embeddings are ideal for enterprises looking to sort through huge amounts of data to identify trends and develop insights,” Nils Reimers, Embeddings Lead at Cohere, told VentureBeat.



Fine-tuned embeddings​

2023 saw a lot of progress around fine-tuning LLMs with custom datasets. However, fine-tuning remains a challenge, and few companies with great data and expertise are doing it so far.

“I think there will always be a funnel from RAG to finetuning; people will start with the easiest thing to use (RAG), and then look into fine-tuning as an optimization step,” Liu said. “I anticipate more people will do finetuning this year for LLMs/embeddings as open-source models themselves also improve, but this number will be smaller than the number of people that do RAG unless we somehow have a step-change in making fine-tuning super easy to use.”

Fine-tuning embeddings also has its challenges. For example, embeddings are sensitive to data shifts. If you train them on short search queries, they will not do as well on longer queries, and vice versa. Similarly, if you train them on “what” questions they will not perform as well on “why” questions.

“Currently, enterprises would need very strong in-house ML teams to make embedding finetuning effective, so it’s usually better to use out-of-the-box options, in contrast to other facets of LLM use cases,” Reimers said.

Nonetheless, there have been advances in making the training process for embedding models more efficient. For example, a recent study by Microsoft shows that pre-trained LLMs such as Mistral-7B can be fine-tuned for embedding tasks with a small dataset generated by a strong LLM. This is much simpler than the traditional multi-step process that requires heavy manual labor and expensive data acquisition.

The pace at which LLMs and embedding models are advancing, we can expect more exciting developments in the coming months.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914

Anthropic hits back at music publishers in AI copyright lawsuit, accusing them of ‘volitional conduct’​

Bryson Masse @Bryson_M

January 18, 2024 3:56 PM

A robot lawyer holds a document in front of a counsel bench in a courtroom in a graphic novel style image.

Credit: VentureBeat made with OpenAI DALL-E 3 via ChatGPT Plus

Anthropic, a major generative AI startup, laid out its case why accusations of copyright infringement from a group of music publishers and content owners are invalid in a new court filing on Wednesday.

In fall 2023, music publishers including Concord, Universal, and ABKCO filed a lawsuit against Anthropic accusing it of copyright infringement over its chatbot Claude (now supplanted by Claude 2).

The complaint, filed in federal court in Tennessee (home of Nashville, one of America’s “Music Cities” and to many labels and musicians), alleges that Anthropic’s business profits from “unlawfully” scraping song lyrics from the internet to train its AI models, which then reproduce the copyrighted lyrics for users in the form of chatbot responses.

Responding to a motion for preliminary injunction — a measure that, if granted by the court, would force Anthropic to stop making its Claude AI model available — Anthropic laid out familiar arguments that have emerged in numerous other copyright disputes involving AI training data.

Gen AI companies like OpenAI and Anthropic rely heavily on scraping massive amounts of publicly available data, including copyrighted works, to train their models but they maintain this use constitutes fair use under the law. It’s expected the question of data scraping copyright will reach the Supreme Court.



Song lyrics only a ‘miniscule fracion’ of training data​

In its response, Anthropic argues its “use of Plaintiffs’ lyrics to train Claude is a transformative use” that adds “a further purpose or different character” to the original works.

To support this, the filing directly quotes Anthropic research director Jared Kaplan, stating the purpose is to “create a dataset to teach a neural network how human language works.”

Anthropic contends its conduct “has no ‘substantially adverse impact’ on a legitimate market for Plaintiffs’ copyrighted works,” noting song lyrics make up “a minuscule fraction” of training data and licensing the scale required is incompatible.

Joining OpenAI, Anthropic claims licensing the vast troves of text needed to properly train neural networks like Claude is technically and financially infeasible. Training demands trillions of snippets across genres may be an unachievable licensing scale for any party.

Perhaps the filing’s most novel argument claims the plaintiffs themselves, not Anthropic, engaged in the “volitional conduct” required for direct infringement liability regarding outputs.

Volitional conduct” in copyright law refers to the idea that a person accused of committing infringement must be shown to have control over the infringing content outputs. In this case, Anthropic is essentially saying that the label plaintiffs caused its AI model Claude to produce the infringing content, and thus, are in control of and responsible for the infringement they report, as opposed to Anthropic or its Claude product, which reacts to inputs of users autonomously.

The filing points to evidence the outputs were generated through the plaintiffs’ own “attacks” on Claude designed to elicit lyrics.



Irreparable harm?​

On top of contesting copyright liability, Anthropic maintains the plaintiffs cannot prove irreparable harm.

Citing a lack of evidence that song licensing revenues have decreased since Claude launched or that qualitative harms are “certain and immediate,” Anthropic pointed out that the publishers themselves believe monetary damages could make them whole, contradicting their own claims of “irreparable harm” (as, by definition, accepting monetary damages would indicate the harms do have a price that could be quantified and paid).

Anthropic asserts the “extraordinary relief” of an injunction against it and its AI models is unjustified given the plaintiffs’ weak showing of irreparable harm. It also argued that any output of lyrics by Claude was an unintentional “bug” that has now been fixed through new technological guardrails.

Specifically, Anthropic claims it has implemented additional safeguards in Claude to prevent any further display of the plaintiffs’ copyrighted song lyrics. Because the alleged infringing conduct cannot reasonably occur again, the model maker says the plaintiffs’ request for relief preventing Claude from outputting lyrics is moot.

It contends the music publishers’ request is overbroad, seeking to restrain use not just of the 500 representative works in the case, but millions of others that the publishers further claim to control.

As well, the AI start up pointed to the Tennessee venue and claimed the lawsuit was filed in the incorrect jurisdiction. Anthropic maintained that it has no relevant business connections to Tennessee. The company noted that its headquarters and principal operations are based in California.

Further, Anthropic stated that none of the allegedly infringing conduct cited in the suit, such as training its AI technology or providing user responses, took place within Tennessee’s borders.

The filing pointed out users of Anthropic’s products agreed any disputes would be litigated in California courts.



Copyright fight far from over​

The copyright battle in the burgeoning generative AI industry continues to intensify.

More artists joined lawsuits against art generators like Midjourney and OpenAI with the latter’s DALL-E model, bolstering evidence of infringement from diffusion model reconstructions.

The New York Times recently filed a copyright infringement lawsuit against OpenAI and Microsoft, alleging that their use of scraped Times’ content to train models for ChatGPT and other AI systems violated its copyrights. The suit calls for billions in damages and demands that any models or data trained on Times content be destroyed.

Amid these debates, a nonprofit group called “Fairly Trained” launched this week advocating for a “licensed model” certification for data used to train AI models — supported by Concord and Universal Music Group, among others.

Platforms have also stepped in, with Anthropic, Google and OpenAI as well as content companies like Shutterstock and Adobe pledging legal defenses for enterprise users of AI generated content.

Creators are undaunted though, fighting bids to dismiss claims from authors like Sarah Silverman’s against OpenAI. Judges will need to weigh technological progress and statutory rights in nuanced disputes.

Furthermore, regulators are listening to worries over datamining scopes. Lawsuits and congressional hearings may decide whether fair use shelters proprietary appropriations, frustrating some while enabling others. Overall, negotiations seem inevitable to satisfy all involved as generative AI matures.

What comes next remains unclear, but this week’s filing suggests generative AI companies are coalescing around a core set of fair use and harm-based defenses, forcing courts to weigh technological progress against rights owners’ control.

As VentureBeat reported previously, no copyright plaintiffs so far have won a preliminary injunction in these types of AI disputes. Anthropic’s arguments aim to ensure this precedent will persist, at least through this stage in one of many ongoing legal battles. The endgame remains to be seen.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914

Stability AI releases Stable Code 3B to fill in blanks of AI-powered code generation​

Sean Michael Kerner @TechJournalist

January 16, 2024 4:24 PM

Computer monitor leaks strings of code out into the air.

Credit: VentureBeat made with Visual Electric/Stable Diffusion

Generative AI powered code generation is getting more powerful and more compact.


Stability AI, the vendor that is still perhaps best known for its stable diffusion text to image generative AI technology today announced its first new AI model of 2024: the commercially licensed (via membership) Stable Code 3B.

As the model name implies Stable Code 3B is a 3-billion parameter model, and it is focused on code completion capabilities for software development.

At only 3 billion parameters, Stable Code 3B can run locally on laptops without dedicated GPUs while still providing competitive performance and capabilities against larger models like Meta’s CodeLLaMA 7B.

The push toward smaller, more compact and capable models is one that Stability AI began to push forward at the end of 2023 with models like StableLM Zephyr 3B for text generation.

Stability AI first previewed Stable Code in August 2023 with the code generation LLM’s initial release and has been steadily working on improving the technology ever since.



How Stability AI improved Stable Code 3B​

Stability AI has improved Stable Code in a number of ways since the initial release.

With the new Stable Code 3B not only does the model suggest new lines of code, but it can also fill in larger missing sections in existing code.

The ability to fill in missing sections of code is an advanced code completion capability known as Fill in the Middle (FIM).

The training for the model was also optimized with an expanded context size using a technique known as Rotary Position Embeddings (RoPE ), optionally allowing context length up to 100k tokens. The RoPE technique is one that other LLMs also use, including Meta’s Llama 2 Long.

Stable Code 3B is built on Stability AI’s Stable LM 3B natural language model. With further training focused on software engineering data, the model gained code completion skills while retaining strengths in general language tasks.

Its training data included code repositories, programmer forums, and other technical sources.

It also trained on 18 different programming languages, and Stability AI claims that Stable Code 3B demonstrates leading performance on benchmark tests across multiple languages.

The model covers popular languages like Python, Java, JavaScript, Go, Ruby, and C++. Early benchmarks indicate it matches or exceeds the completion quality of models over twice its size.

The market for generative AI code generation tools is competitive with multiple tools including Meta’s CodeLLaMA 7B being one of the larger and most popular options.

On the 3-billion parameter side, the StarCoder LLM — which is co-developed as an open source effort with the participation of IBM, HuggingFace and ServiceNow — is another popular option.

Stability AI claims Stable Code 3B outperforms StarCoder across Python, C++, JavaScript, Java, PHP and Rust programming languages.



Part of Stability AI’s membership subscription offering​

Stable Code 3B is being made available for commercial use as part of Stability AI’s new membership subscription service that was first announced in December.

Members gain access to Stable Code 3B alongside other AI tools in Stability AI’s portfolio including the SDXL stable diffusion image generation tools, StableLM Zephyr 3B for text content generation, Stable Audio for audio generation, Stable Video for video generation.

image_3dc7d1.png

Image / credit: Stability AI
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914

Runway Gen-2 adds multiple motion controls to AI videos with Multi Motion Brush​

Shubham Sharma @mr_bumss

January 18, 2024 7:44 AM

Screenshot of Runway Motion Brush

Screenshot of Runway Motion Brush

Image Credit: Runway

AI video is still in its early days but the tech is advancing fast. Case in point, New York City-based Runway, a generative AI startup enabling individuals and enterprises to produce videos in different styles, today updated its Gen-2 foundation model with a new tool, Muti Motion Brush, which allows creators to add multiple directions and types of motion to their AI video creations.

The advance is a first-of-its-kind for commercially available AI video projects: all other rival productrs on the market at this stage simply use AI to add motion to the entire image or a selected highlighted area, not multiple areas.

The offering builds upon the Motion Brush feature that first debuted in November 2023, which allowed creators to add only a single type of motion to their video at a time.

Multi Motion Brush was previewed earlier this year through Runway’s Creative Partners Program, which rewards select power users with unlimited plans and pre-release features. But it’s now available for every user of Gen-2, adding to the 30+ tools the model already has on offer for creative video producers.

The move strengthens the product in the rapidly growing creative AI market, which includes players such as Pika Labs and Leonardo AI.



What to expect from Motion Brush?​

The idea with Multi Motion Brush is simple: give users better control over the AI videos they generate by allowing them to add independent motion to areas of choice.

This could be anything, from the movement of a face to the direction of clouds in the sky.

The user starts by uploading a still image as a prompt and “painting” it with a digital brush controlled by their computer cursor.

The user then uses slider controls in Runway’s web interface to select which way they want the painted portions of their image to move and how much (intensity), with multiple paint colors each being controlled independently.

The user can adjust the horizontal, vertical and proximity sliders to define the direction in which the motion is supposed to be executed – left/right, up/down, or closer/further – and hit save.

“Each slider is controlled with a decimal point value with a range from -10 to +10. You can manually input numerical value, drag the text field left or right or use the sliders. If you need to reset everything, click the ‘Clear’ button to reset everything back to 0,” Runway notes on its website.



Gen-2 has been getting improved controls​

The introduction of Multi Motion Brush strengthens the set of tools Runway has on offer to control the video outputs from the Gen-2 model.

Originally unveiled in March 2023, the model introduced text, video and image-based generation and came as a major upgrade over Gen-1, which only supported video-based outputs.

However, in the initial stage, it generated clips only up to four seconds. This changed in August when the company added the ability to extend clips up to 18 seconds.

It also debuted additional features such as a “Director Mode,” allowing users to choose the direction and intensity/speed of the camera movement in generated videos, as well as options to choose the style of the video to be produced – from 3D cartoon and render to cinematic to advertising.

In the space of AI-driven video generation, Runway takes on players like Pika Labs, which recently debuted its web platform Pika 1.0 for video generation, as well as Stability AI’s Stable Video Diffusion models.

The company also offers a text-to-image tool, which takes on offerings like Midjourney and DALL-E 3. However, it is important to note that while the outputs from these tools have improved over time, they are still not perfect and can generate images/videos that are blurred, incomplete or inconsistent in different ways.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
55,517
Reputation
8,215
Daps
156,914



Computer Science > Computer Vision and Pattern Recognition​

[Submitted on 18 Jan 2024]

The Manga Whisperer: Automatically Generating Transcriptions for Comics​

Ragav Sachdeva, Andrew Zisserman
In the past few decades, Japanese comics, commonly referred to as Manga, have transcended both cultural and linguistic boundaries to become a true worldwide sensation. Yet, the inherent reliance on visual cues and illustration within manga renders it largely inaccessible to individuals with visual impairments. In this work, we seek to address this substantial barrier, with the aim of ensuring that manga can be appreciated and actively engaged by everyone. Specifically, we tackle the problem of diarisation i.e. generating a transcription of who said what and when, in a fully automatic way.
To this end, we make the following contributions: (1) we present a unified model, Magi, that is able to (a) detect panels, text boxes and character boxes, (b) cluster characters by identity (without knowing the number of clusters apriori), and (c) associate dialogues to their speakers; (2) we propose a novel approach that is able to sort the detected text boxes in their reading order and generate a dialogue transcript; (3) we annotate an evaluation benchmark for this task using publicly available [English] manga pages. The code, evaluation datasets and the pre-trained model can be found at: this https URL.
Subjects:Computer Vision and Pattern Recognition (cs.CV)
Cite as:arXiv:2401.10224 [cs.CV]
(or arXiv:2401.10224v1 [cs.CV] for this version)

Submission history​

From: Ragav Sachdeva [view email]
[v1] Thu, 18 Jan 2024 18:59:09 UTC (34,898 KB)



About​

Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.

The Manga Whisperer: Automatically Generating Transcriptions for Comics

[arXiv]

Ragav Sachdeva, Andrew Zisserman

TLDR​

ZCY6KHX.jpeg
 
Top