The A.I Megathread (LLM , GPT , Development)

Wargames · Dec 29, 2023

bnew said:
NY Times sues Open AI, Microsoft over copyright infringement

Shows evidence that GPT-based systems will reproduce Times articles if asked.

arstechnica.com

NY Times copyright suit wants OpenAI to delete all GPT instances
Shows evidence that GPT-based systems will reproduce Times articles if asked.
JOHN TIMMER - 12/27/2023, 2:05 PM

Enlarge / Microsoft is named in the suit for allegedly building the system that allowed GPT derivatives to be trained using infringing material.
Just_Super

366

In August, word leaked out that The New York Times was considering joining the growing legion of creators that are suing AI companies for misappropriating their content. The Times had reportedly been negotiating with OpenAI regarding the potential to license its material, but those talks had not gone smoothly. So, eight months after the company was reportedly considering suing, the suit has now been filed.

The Times is targeting various companies under the OpenAI umbrella, as well as Microsoft, an OpenAI partner that both uses it to power its Copilot service and helped provide the infrastructure for training the GPT Large Language Model. But the suit goes well beyond the use of copyrighted material in training, alleging that OpenAI-powered software will happily circumvent the Times' paywall and ascribe hallucinated misinformation to the Times.

Journalism is expensive
The suit notes that The Times maintains a large staff that allows it to do things like dedicate reporters to a huge range of beats and engage in important investigative journalism, among other things. Because of those investments, the newspaper is often considered an authoritative source on many matters.

All of that costs money, and The Times earns that by limiting access to its reporting through a robust paywall. In addition, each print edition has a copyright notification, the Times' terms of service limit the copying and use of any published material, and it can be selective about how it licenses its stories. In addition to driving revenue, these restrictions also help it to maintain its reputation as an authoritative voice by controlling how its works appear.

The suit alleges that OpenAI-developed tools undermine all of that. "By providing Times content without The Times’s permission or authorization, Defendants’ tools undermine and damage The Times’s relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue," the suit alleges.

Part of the unauthorized use The Times alleges came during the training of various versions of GPT. Prior to GPT-3.5, information about the training dataset was made public. One of the sources used is a large collection of online material called "Common Crawl," which the suit alleges contains information from 16 million unique records from sites published by The Times. That places the Times as the third most referenced source, behind Wikipedia and a database of US patents.

OpenAI no longer discloses as many details of the data used for training of recent GPT versions, but all indications are that full-text NY Times articles are still part of that process (Much more on that in a moment.) Expect access to training information to be a major issue during discovery if this case moves forward.

Not just training
A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times' suit goes well beyond that to show how the material ingested during training can come back out during use. "Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples," the suit alleges.

The suit alleges—and we were able to verify—that it's comically easy to get GPT-powered systems to offer up content that is normally protected by the Times' paywall. The suit shows a number of examples of GPT-4 reproducing large sections of articles nearly verbatim.

The suit includes screenshots of ChatGPT being given the title of a piece at The New York Times and asked for the first paragraph, which it delivers. Getting the ensuing text is apparently as simple as repeatedly asking for the next paragraph.

ChatGPT has apparently closed that loophole in between the preparation of that suit and the present. We entered some of the prompts shown in the suit, and were advised "I recommend checking The New York Times website or other reputable sources," although we can't rule out that context provided prior to that prompt could produce copyrighted material.

Ask for a paragraph, and Copilot will hand you a wall of normally paywalled text.

John Timmer

But not all loopholes have been closed. The suit also shows output from Bing Chat, since rebranded as Copilot. We were able to verify that asking for the first paragraph of a specific article at The Times caused Copilot to reproduce the first third of the article.

The suit is dismissive of attempts to justify this as a form of fair use. "Publicly, Defendants insist that their conduct is protected as 'fair use' because their unlicensed use of copyrighted content to train GenAI models serves a new 'transformative' purpose," the suit notes. "But there is nothing 'transformative' about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it."

Reputational and other damages
The hallucinations common to AI also came under fire in the suit for potentially damaging the value of the Times' reputation, and possibly damaging human health as a side effect. "A GPT model completely fabricated that “The New York Times published an article on January 10, 2020, titled ‘Study Finds Possible Link between Orange Juice and Non-Hodgkin’s Lymphoma,’” the suit alleges. "The Times never published such an article."

Similarly, asking about a Times article on heart-healthy foods allegedly resulted in Copilot saying it contained a list of examples (which it didn't). When asked for the list, 80 percent of the foods on weren't even mentioned by the original article. In another case, recommendations were ascribed to the Wirecutter when the products hadn't even been reviewed by its staff.

As with the Times material, it's alleged that it's possible to get Copilot to offer up large chunks of Wirecutter articles (The Wirecutter is owned by The New York Times). But the suit notes that these article excerpts have the affiliate links stripped out of them, keeping the Wirecutter from its primary source of revenue.

The suit targets various OpenAI companies for developing the software, as well as Microsoft—the latter for both offering OpenAI-powered services, and for having developed the computing systems that enabled the copyrighted material to be ingested during training. Allegations include direct, contributory, and vicarious copyright infringement, as well as DMCA and trademark violations. Finally, it alleges "Common Law Unfair Competition By Misappropriation."

The suit seeks nothing less than the erasure of both any GPT instances that the parties have trained using material from the Times, as well as the destruction of the datasets that were used for the training. It also asks for a permanent injunction to prevent similar conduct in the future. The Times also wants money, lots and lots of money: "statutory damages, compensatory damages, restitution, disgorgement, and any other relief that may be permitted by law or equity."

We’re going to see underground deep web AI really soon and they won’t respect any copywrite laws.

bnew · Dec 29, 2023

https://archive.is/010kT

A Survey of Reasoning with Foundation Models

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing...

arxiv.org

Computer Science > Artificial Intelligence

[Submitted on 17 Dec 2023 (v1), last revised 26 Dec 2023 (this version, v4)]

A Survey of Reasoning with Foundation Models

Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI.

Comments:	20 Figures, 160 Pages, 750+ References, Project Page this https URL
Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2312.11562 [cs.AI]
	(or arXiv:2312.11562v4 [cs.AI] for this version)
	[2312.11562] A Survey of Reasoning with Foundation Models Focus to learn more

Submission history

From: Ruihang Chu [view email]
[v1] Sun, 17 Dec 2023 15:16:13 UTC (3,868 KB)
[v2] Wed, 20 Dec 2023 07:25:58 UTC (3,867 KB)
[v3] Thu, 21 Dec 2023 13:21:59 UTC (3,870 KB)
[v4] Tue, 26 Dec 2023 11:31:54 UTC (3,872 KB)

https://arxiv.org/pdf/2312.11562v4.pdf

Wargames · Dec 29, 2023

Have any of ya’ll messed with the MLC native app? It sounds like it’s doing what Apple wants to do by hosting the AI‘s LLM on your phone.

bnew · Dec 29, 2023

https://archive.is/2CAx6

[2312.13771] AppAgent: Multimodal Agents as Smartphone Users

Computer Science > Computer Vision and Pattern Recognition

[Submitted on 21 Dec 2023 (v1), last revised 22 Dec 2023 (this version, v2)]

AppAgent: Multimodal Agents as Smartphone Users

Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu

Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shopping, and sophisticated image editing tools. The results affirm our agent's proficiency in handling a diverse array of high-level tasks.

Comments:	Project Page is this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.13771 [cs.CV]
	(or arXiv:2312.13771v2 [cs.CV] for this version)
	[2312.13771] AppAgent: Multimodal Agents as Smartphone Users Focus to learn more

Submission history

From: Yucheng Han [view email]
[v1] Thu, 21 Dec 2023 11:52:45 UTC (10,766 KB)
[v2] Fri, 22 Dec 2023 02:29:17 UTC (10,766 KB)

GitHub - mnotgod96/AppAgent: AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps. - GitHub - mnotgod96/AppAgent: AppAgent: Multimodal Agents as Smartphon...

github.com

About

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Introduction

We introduce a novel LLM-based multimodal agent framework designed to operate smartphone applications.

Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps.

Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications.

AVXL · Dec 29, 2023

h94/IP-Adapter-FaceID · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

bnew · Dec 29, 2023

https://archive.is/dkbZ7

bnew · Dec 29, 2023

https://archive.is/mdpib

bnew · Dec 29, 2023

[2312.08914v1] CogAgent: A Visual Language Model for GUI Agents

Computer Science > Computer Vision and Pattern Recognition

[Submitted on 14 Dec 2023 (this version), latest version 21 Dec 2023 (v2)]

CogAgent: A Visual Language Model for GUI Agents

Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxiao Dong, Ming Ding, Jie Tang

People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggle to understand and interact with GUIs, thus limiting their potential to increase automation levels. In this paper, we introduce CogAgent, an 18-billion-parameter visual language model (VLM) specializing in GUI understanding and navigation. By utilizing both low-resolution and high-resolution image encoders, CogAgent supports input at a resolution of 1120*1120, enabling it to recognize tiny page elements and text. As a generalist visual language model, CogAgent achieves the state of the art on five text-rich and four general VQA benchmarks, including VQAv2, OK-VQA, Text-VQA, ST-VQA, ChartQA, infoVQA, DocVQA, MM-Vet, and POPE. CogAgent, using only screenshots as input, outperforms LLM-based methods that consume extracted HTML text on both PC and Android GUI navigation tasks -- Mind2Web and AITW, advancing the state of the art. The model and codes are available at \url{this https URL}.

Comments:	27 pages, 19 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.08914 [cs.CV]
	(or arXiv:2312.08914v1 [cs.CV] for this version)
	[2312.08914] CogAgent: A Visual Language Model for GUI Agents Focus to learn more

Submission history

From: Wenyi Hong [view email]
[v1] Thu, 14 Dec 2023 13:20:57 UTC (11,917 KB)
[v2] Thu, 21 Dec 2023 09:41:25 UTC (11,917 KB)

https://arxiv.org/pdf/2312.08914v1.pdf

bnew · Dec 29, 2023

[2312.06660v1] EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai

This paper presents EdgeSAM, an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance. Our approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture, better suited for edge devices. We carefully benchmark various distillation strategies and demonstrate that task-agnostic encoder distillation fails to capture the full knowledge embodied in SAM. To overcome this bottleneck, we include both the prompt encoder and mask decoder in the distillation process, with box and point prompts in the loop, so that the distilled model can accurately capture the intricate dynamics between user input and mask generation. To mitigate dataset bias issues stemming from point prompt distillation, we incorporate a lightweight module within the encoder. EdgeSAM achieves a 40-fold speed increase compared to the original SAM, and it also outperforms MobileSAM, being 14 times as fast when deployed on edge devices while enhancing the mIoUs on COCO and LVIS by 2.3 and 3.2 respectively. It is also the first SAM variant that can run at over 30 FPS on an iPhone 14. Code and models are available at this https URL.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2312.06660 [cs.CV]
	(or arXiv:2312.06660v1 [cs.CV] for this version)
	[2312.06660] EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM Focus to learn more

Submission history

From: Chong Zhou [view email]
[v1] Mon, 11 Dec 2023 18:59:52 UTC (9,665 KB)

bnew · Dec 29, 2023

https://archive.is/x9jc2

Mistral 7B on Akash

The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. This application is running on NVIDIA GPUs leased from the Akash Supercloud

chat.akash.network

bnew · Dec 30, 2023

https://archive.is/Hcvrr

bnew · Dec 30, 2023

APW 11.0 for ComfyUI | Alessandro Perilli

Unlock generative AI at an industrial scale, for enterprise-grade and consumer-grade applications.

perilli.com

AP Workflow 7.0 for ComfyUI

Stable Diffusion is an AI model able to generate images from text instructions written in natural language (text-to-image. txt2img, or t2i), or from existing images used as guidance (image-to-image, img2img, or Uploader).

When an AI model like Stable Diffusion is paired with an automation engine, like ComfyUI, it allows individuals and organizations to accomplish extraordinary things.

Individual artists and small design studios can use it to create very complex images in a matter of minutes, instead of hours or days. Large organizations can use it to generate or modify images and videos at industrial scale for commercial applications.

To study and experiment with the enormous power of automation applied to AI, Alessandro created the AP Workflow.

AP Workflow is now is used by organizations all around the world to power enterprise and consumer applications.

bnew · Dec 30, 2023

GitHub - apeatling/ollama-voice-mac: Mac compatible Ollama Voice

Mac compatible Ollama Voice. Contribute to apeatling/ollama-voice-mac development by creating an account on GitHub.

github.com

bnew · Dec 30, 2023

bnew · Dec 30, 2023

The A.I Megathread (LLM , GPT , Development)

One Of The Last Real Ones To Do It

NY Times copyright suit wants OpenAI to delete all GPT instances​

Shows evidence that GPT-based systems will reproduce Times articles if asked.​

Journalism is expensive​

Not just training​

Reputational and other damages​

Veteran

Computer Science > Artificial Intelligence​

A Survey of Reasoning with Foundation Models​

Submission history​

One Of The Last Real Ones To Do It

Veteran

Computer Science > Computer Vision and Pattern Recognition​

AppAgent: Multimodal Agents as Smartphone Users​

Submission history​

About​

Introduction​

Laughing at you n*ggaz like “ha ha ha”

Veteran

Veteran

Veteran

Computer Science > Computer Vision and Pattern Recognition​

CogAgent: A Visual Language Model for GUI Agents​

Submission history​

Veteran

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM​

Submission history​

Veteran

Veteran

Veteran

AP Workflow 7.0 for ComfyUI​

Veteran

Veteran

Veteran

NY Times copyright suit wants OpenAI to delete all GPT instances

Shows evidence that GPT-based systems will reproduce Times articles if asked.

Journalism is expensive

Not just training

Reputational and other damages

Computer Science > Artificial Intelligence

A Survey of Reasoning with Foundation Models

Submission history

Computer Science > Computer Vision and Pattern Recognition

AppAgent: Multimodal Agents as Smartphone Users

Submission history

About

Introduction

Computer Science > Computer Vision and Pattern Recognition

CogAgent: A Visual Language Model for GUI Agents

Submission history

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

Submission history

AP Workflow 7.0 for ComfyUI