Wargames

One Of The Last Real Ones To Do It
Joined
Apr 1, 2013
Messages
24,268
Reputation
4,165
Daps
90,505
Reppin
New York City

NY Times copyright suit wants OpenAI to delete all GPT instances​

Shows evidence that GPT-based systems will reproduce Times articles if asked.​

JOHN TIMMER - 12/27/2023, 2:05 PM

Image of a CPU on a motherboard with

Enlarge / Microsoft is named in the suit for allegedly building the system that allowed GPT derivatives to be trained using infringing material.
Just_Super

366

In August, word leaked out that The New York Times was considering joining the growing legion of creators that are suing AI companies for misappropriating their content. The Times had reportedly been negotiating with OpenAI regarding the potential to license its material, but those talks had not gone smoothly. So, eight months after the company was reportedly considering suing, the suit has now been filed.

The Times is targeting various companies under the OpenAI umbrella, as well as Microsoft, an OpenAI partner that both uses it to power its Copilot service and helped provide the infrastructure for training the GPT Large Language Model. But the suit goes well beyond the use of copyrighted material in training, alleging that OpenAI-powered software will happily circumvent the Times' paywall and ascribe hallucinated misinformation to the Times.

Journalism is expensive​

The suit notes that The Times maintains a large staff that allows it to do things like dedicate reporters to a huge range of beats and engage in important investigative journalism, among other things. Because of those investments, the newspaper is often considered an authoritative source on many matters.

All of that costs money, and The Times earns that by limiting access to its reporting through a robust paywall. In addition, each print edition has a copyright notification, the Times' terms of service limit the copying and use of any published material, and it can be selective about how it licenses its stories. In addition to driving revenue, these restrictions also help it to maintain its reputation as an authoritative voice by controlling how its works appear.

The suit alleges that OpenAI-developed tools undermine all of that. "By providing Times content without The Times’s permission or authorization, Defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue," the suit alleges.

Part of the unauthorized use The Times alleges came during the training of various versions of GPT. Prior to GPT-3.5, information about the training dataset was made public. One of the sources used is a large collection of online material called "Common Crawl," which the suit alleges contains information from 16 million unique records from sites published by The Times. That places the Times as the third most referenced source, behind Wikipedia and a database of US patents.

OpenAI no longer discloses as many details of the data used for training of recent GPT versions, but all indications are that full-text NY Times articles are still part of that process (Much more on that in a moment.) Expect access to training information to be a major issue during discovery if this case moves forward.

Not just training​

A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times' suit goes well beyond that to show how the material ingested during training can come back out during use. "Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples," the suit alleges.

The suit alleges—and we were able to verify—that it's comically easy to get GPT-powered systems to offer up content that is normally protected by the Times' paywall. The suit shows a number of examples of GPT-4 reproducing large sections of articles nearly verbatim.

The suit includes screenshots of ChatGPT being given the title of a piece at The New York Times and asked for the first paragraph, which it delivers. Getting the ensuing text is apparently as simple as repeatedly asking for the next paragraph.

ChatGPT has apparently closed that loophole in between the preparation of that suit and the present. We entered some of the prompts shown in the suit, and were advised "I recommend checking The New York Times website or other reputable sources," although we can't rule out that context provided prior to that prompt could produce copyrighted material.

Ask for a paragraph, and Copilot will hand you a wall of normally paywalled text.

Ask for a paragraph, and Copilot will hand you a wall of normally paywalled text.

John Timmer

But not all loopholes have been closed. The suit also shows output from Bing Chat, since rebranded as Copilot. We were able to verify that asking for the first paragraph of a specific article at The Times caused Copilot to reproduce the first third of the article.

The suit is dismissive of attempts to justify this as a form of fair use. "Publicly, Defendants insist that their conduct is protected as 'fair use' because their unlicensed use of copyrighted content to train GenAI models serves a new 'transformative' purpose," the suit notes. "But there is nothing 'transformative' about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it."

Reputational and other damages​

The hallucinations common to AI also came under fire in the suit for potentially damaging the value of the Times' reputation, and possibly damaging human health as a side effect. "A GPT model completely fabricated that “The New York Times published an article on January 10, 2020, titled ‘Study Finds Possible Link between Orange Juice and Non-Hodgkin’s Lymphoma,’” the suit alleges. "The Times never published such an article."

Similarly, asking about a Times article on heart-healthy foods allegedly resulted in Copilot saying it contained a list of examples (which it didn't). When asked for the list, 80 percent of the foods on weren't even mentioned by the original article. In another case, recommendations were ascribed to the Wirecutter when the products hadn't even been reviewed by its staff.

As with the Times material, it's alleged that it's possible to get Copilot to offer up large chunks of Wirecutter articles (The Wirecutter is owned by The New York Times). But the suit notes that these article excerpts have the affiliate links stripped out of them, keeping the Wirecutter from its primary source of revenue.

The suit targets various OpenAI companies for developing the software, as well as Microsoft—the latter for both offering OpenAI-powered services, and for having developed the computing systems that enabled the copyrighted material to be ingested during training. Allegations include direct, contributory, and vicarious copyright infringement, as well as DMCA and trademark violations. Finally, it alleges "Common Law Unfair Competition By Misappropriation."

The suit seeks nothing less than the erasure of both any GPT instances that the parties have trained using material from the Times, as well as the destruction of the datasets that were used for the training. It also asks for a permanent injunction to prevent similar conduct in the future. The Times also wants money, lots and lots of money: "statutory damages, compensatory damages, restitution, disgorgement, and any other relief that may be permitted by law or equity."
We’re going to see underground deep web AI really soon and they won’t respect any copywrite laws.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,749





Computer Science > Artificial Intelligence​

[Submitted on 17 Dec 2023 (v1), last revised 26 Dec 2023 (this version, v4)]

A Survey of Reasoning with Foundation Models​

Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.
Comments:20 Figures, 160 Pages, 750+ References, Project Page this https URL
Subjects:Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:arXiv:2312.11562 [cs.AI]
(or arXiv:2312.11562v4 [cs.AI] for this version)
[2312.11562] A Survey of Reasoning with Foundation Models
Focus to learn more

Submission history​

From: Ruihang Chu [view email]
[v1] Sun, 17 Dec 2023 15:16:13 UTC (3,868 KB)
[v2] Wed, 20 Dec 2023 07:25:58 UTC (3,867 KB)
[v3] Thu, 21 Dec 2023 13:21:59 UTC (3,870 KB)
[v4] Tue, 26 Dec 2023 11:31:54 UTC (3,872 KB)


 

Wargames

One Of The Last Real Ones To Do It
Joined
Apr 1, 2013
Messages
24,268
Reputation
4,165
Daps
90,505
Reppin
New York City
Have any of ya’ll messed with the MLC native app? It sounds like it’s doing what Apple wants to do by hosting the AI‘s LLM on your phone.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,749



Computer Science > Computer Vision and Pattern Recognition​

[Submitted on 21 Dec 2023 (v1), last revised 22 Dec 2023 (this version, v2)]

AppAgent: Multimodal Agents as Smartphone Users​

Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu
Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shopping, and sophisticated image editing tools. The results affirm our agent's proficiency in handling a diverse array of high-level tasks.
Comments:Project Page is this https URL
Subjects:Computer Vision and Pattern Recognition (cs.CV)
Cite as:arXiv:2312.13771 [cs.CV]
(or arXiv:2312.13771v2 [cs.CV] for this version)
[2312.13771] AppAgent: Multimodal Agents as Smartphone Users
Focus to learn more

Submission history​

From: Yucheng Han [view email]
[v1] Thu, 21 Dec 2023 11:52:45 UTC (10,766 KB)
[v2] Fri, 22 Dec 2023 02:29:17 UTC (10,766 KB)



About​

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.


Introduction​

We introduce a novel LLM-based multimodal agent framework designed to operate smartphone applications.

Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps.

Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications.


h8kz4D7.jpeg
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,749

Computer Science > Computer Vision and Pattern Recognition​

[Submitted on 14 Dec 2023 (this version), latest version 21 Dec 2023 (v2)]

CogAgent: A Visual Language Model for GUI Agents​

Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, Jie Tang
People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggle to understand and interact with GUIs, thus limiting their potential to increase automation levels. In this paper, we introduce CogAgent, an 18-billion-parameter visual language model (VLM) specializing in GUI understanding and navigation. By utilizing both low-resolution and high-resolution image encoders, CogAgent supports input at a resolution of 1120*1120, enabling it to recognize tiny page elements and text. As a generalist visual language model, CogAgent achieves the state of the art on five text-rich and four general VQA benchmarks, including VQAv2, OK-VQA, Text-VQA, ST-VQA, ChartQA, infoVQA, DocVQA, MM-Vet, and POPE. CogAgent, using only screenshots as input, outperforms LLM-based methods that consume extracted HTML text on both PC and Android GUI navigation tasks -- Mind2Web and AITW, advancing the state of the art. The model and codes are available at \url{this https URL}.
Comments:27 pages, 19 figures
Subjects:Computer Vision and Pattern Recognition (cs.CV)
Cite as:arXiv:2312.08914 [cs.CV]
(or arXiv:2312.08914v1 [cs.CV] for this version)
[2312.08914] CogAgent: A Visual Language Model for GUI Agents
Focus to learn more

Submission history

From: Wenyi Hong [view email]
[v1] Thu, 14 Dec 2023 13:20:57 UTC (11,917 KB)
[v2] Thu, 21 Dec 2023 09:41:25 UTC (11,917 KB)


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,749

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM​

Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai
This paper presents EdgeSAM, an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance. Our approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture, better suited for edge devices. We carefully benchmark various distillation strategies and demonstrate that task-agnostic encoder distillation fails to capture the full knowledge embodied in SAM. To overcome this bottleneck, we include both the prompt encoder and mask decoder in the distillation process, with box and point prompts in the loop, so that the distilled model can accurately capture the intricate dynamics between user input and mask generation. To mitigate dataset bias issues stemming from point prompt distillation, we incorporate a lightweight module within the encoder. EdgeSAM achieves a 40-fold speed increase compared to the original SAM, and it also outperforms MobileSAM, being 14 times as fast when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3 and 3.2 respectively. It is also the first SAM variant that can run at over 30 FPS on an iPhone 14. Code and models are available at this https URL.
Comments:Project page: this https URL
Subjects:Computer Vision and Pattern Recognition (cs.CV)
Cite as:arXiv:2312.06660 [cs.CV]
(or arXiv:2312.06660v1 [cs.CV] for this version)
[2312.06660] EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM
Focus to learn more

Submission history

From: Chong Zhou [view email]
[v1] Mon, 11 Dec 2023 18:59:52 UTC (9,665 KB)
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,805
Reputation
7,926
Daps
148,749

AP Workflow 7.0 for ComfyUI​

Stable Diffusion is an AI model able to generate images from text instructions written in natural language (text-to-image. txt2img, or t2i), or from existing images used as guidance (image-to-image, img2img, or Uploader).

When an AI model like Stable Diffusion is paired with an automation engine, like ComfyUI, it allows individuals and organizations to accomplish extraordinary things.

Individual artists and small design studios can use it to create very complex images in a matter of minutes, instead of hours or days. Large organizations can use it to generate or modify images and videos at industrial scale for commercial applications.

To study and experiment with the enormous power of automation applied to AI, Alessandro created the AP Workflow.

AP Workflow is now is used by organizations all around the world to power enterprise and consumer applications.
 
Top