The A.I Megathread (LLM , GPT , Development)

bnew · Apr 28, 2024

1/9
Do models need to reason in words to benefit from chain-of-thought tokens?

In our experiments, the answer is no! Models can perform on par with CoT using repeated '...' filler tokens.
This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT

2/9
We experimentally demonstrate filler tokens’ utility by training small LLaMA LMs on 2 synthetic tasks:

Models trained on filler tokens match CoT performance. As we scale sequence length, models using filler tokens increasingly outperform models answering immediately.

3/9
But, are models really using filler tokens or are filler-token models just improving thanks to a difference in the training data presentation e.g. by regularizing loss gradients?

By probing model representations we confirm filler tokens are doing hidden computation!

4/9
We train probes to predict the answer token using varied numbers of filler tokens.
Finding: filler tokens increase probe accuracy plateauing only at 100 '.' filler tokens.

5/9
Previous work suggested LLMs (eg GPT-3.5) do not benefit from filler tokens on common NL benchmarks. Should we expect future LLMs to use filler tokens?

We provide two conditions under which we expect filler tokens to improve LLM performance:

6/9
Data condition: On our task, LMs fail to converge when trained on only filler-token sequences (ie Question …… Answer).
Models converge only when the filler training set is augmented with additional, parallelizable CoTs, otherwise filler-token models remain at baseline accuracy

7/9
Parallelizable CoTs decompose a given task into independent subproblems solvable in parallel (eg by using individual filler tokens for each sub-problem).

On our task, parallel CoTs are crucial to filler-token performance: models fail to transfer from non-parallel CoT to filler.

8/9
Expressivity: We identify nested quantifier resolution as a general class of tasks where filler can improve transformer expressivity.

Intuitively for first-order logic formula using N>2 quantifiers a model uses N filler tokens to check each N-tuple combination for satisfiability

9/9
Shout-out to my coauthors who were indispensable throughout the project!
@lambdaviking and @sleepinyourhat

Check out our paper for more!

Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

Chain-of-thought responses from language models improve performance across most benchmarks. However, it remains unclear to what extent these performance gains can be attributed to human-like task decomposition or simply the greater computation that additional tokens allow. We show that...

arxiv.org

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

ChatGPT-5 · Apr 28, 2024

bnew said:
1/4
No one is talking about this major LLM from China.

2 days ago, SenseTime launched SenseNova 5.0, which according to the report (translated from Chinese):

- Beats GPT-4T on nearly all benchmarks
- Has a 200k context window
- Is trained on more than 10TB tokens
- Has major advancements in knowledge, mathematics, reasoning, and coding capabilities

Crazy how much is happening in the world of AI in China that's going completely under the radar.

2/4
H/t to
@Ghost_Z12 for spotting this.

Here's the source (it's in Chinese): 商汤甩出大模型豪华全家桶！秀拳皇暴打GPT-4，首晒“文生视频”，WPS小米现场助阵 - 智东西

3/4
Sounds like we need to accelerate

4/4
A new model is coming this year 100%, but not sure if it'll be called GPT-5

Sam Altman on the Lex Fridman pod in March:

'We will release an amazing model this year. I don't know what we will call it'

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

SenseTime launches SenseNova 5.0 with comprehensive updates and the industry-leading "Cloud-to-Edge" full-stack large model product matrix-Newsroom-SenseTime

www.sensetime.com

SenseTime launches SenseNova 5.0 with comprehensive updates and the industry-leading "Cloud-to-Edge" full-stack large model product matrix
2024-04-24

23 April 2024, Shanghai – SenseTime launched its latest Large Model, the SenseNova 5.0, at its Tech Day event in Shanghai. With its cutting-edge technology accelerating the development of Generative AI, SenseTime also launched the industry-leading "Cloud-To-Edge" full-stack large model product matrix that is scalable and applicable across various scenarios.

Dr. Xu Li, Chairman of the Board and CEO of SenseTime, said, “In our pursuit to push the boundaries of SenseNova’s capabilities, SenseTime remains guided by the Scaling Law as we build upon our Large Model based on this three-tier architecture: Knowledge, Reasoning, and Execution (KRE)."

Dr. Xu Li, Chairman of the Board and CEO of SenseTime, introduced the advancements of the SenseNova 5.0 Large Model at the event.

SenseNova 5.0: Linguistic, creative and scientific capabilities greatly improved; multimodal interactions added

Since its debut in April 2023, the SenseNova Large Model is currently in its fifth iteration. SenseNova 5.0 has undergone over 10TB of token training, covering a large amount of synthetic data. It adopts a Mixture of Experts, enabling effective context window coverage of approximately 200,000 during inference. The major advancements in SenseNova 5.0 focus on knowledge, mathematics, reasoning, and coding capabilities.

In terms of linguistic and creative capabilities, the creative writing, reasoning, and summary abilities of SenseNova 5.0 have significantly improved. Given the same knowledge input, it provides better comprehension, summarization, and question and answers, providing strong support for vertical applications such as education and the content industries. On its scientific capabilities, SenseNova 5.0 boasts best-in-class mathematical, coding and reasoning capabilities, providing a solid foundation for applications in finance and data analysis.

SenseNova 5.0 is also equipped with superior multimodal capabilities in product application. It supports high-definition image parsing and understanding, as well as text-to-image generation. In addition, it extracts complex data across-documents and summarizes answers to questions, possessing strong multimodal interaction capability. At present, SenseNova 5.0’s world-leading graphical and textual perception ranks first based on its aggregate score on MMBench, an authoritative multimodality benchmark. It has also achieved high scores in other well-known multimodal rankings such as MathVista, AI2D and ChartQA.

The industry-leading full-stack large model edge-side product matrix

SenseTime also launched the industry-leading edge-side full-stack large model product matrix, which includes the SenseTime Edge-side Large Model for terminal devices, and the SenseTime Integrated Large Model (Enterprise) edge device that can be applied to fields such as finance, coding, healthcare and government services.

The inference speed of the SenseNova Edge-side Large Language Model has achieved industry-leading performance. It can generate 18.3 words per second on the mid-range platforms, and an impressive 78.3 words per second on flagship platforms.

The diffusion model has also achieved the fastest inference speed in the industry. The inference speed of edge-side LDM-AI image diffusion technology takes less than 1.5 seconds on a mainstream platform, and supporting the output of high-definition images with resolution of 12 million pixels and above, as well as image editing functions such as proportional, free-form and rotation image expansion.

SenseTime conducted a live demonstration of its SenseNova Edge-side Large Model on image expansion.

The SenseTime Integrated Large Model (Enterprise) edge device was developed in response to the growing demand for AI from key fields such as finance, coding, healthcare and government services. Compared to other similar products, the device performs accelerated searches at only 50 percent CPU utilization, and reduces inference costs by approximately 80 percent.

Innovating product applications in the AI 2.0 era with ecosystem partners to further boost productivity

SenseTime has partnered with Kingsoft Office since 2023, leveraging SenseNova Large Model to empower the latter’s WPS 365 as a smart office platform that boosts office productivity and overall efficiency.

In the financial sector, Haitong Securities and SenseTime jointly released a full-stack large model for the industry. Through the large model, both parties facilitated business operations in areas such as intelligent customer service, compliance and risk control, and business development office assistants. They also jointly explored cutting-edge industry applications such as smart investment advisory and aggregation of public sentiments, to realize the full-stack capability of large models in the securities industry.

In the transportation industry, SenseTime’s large model technology is deployed in the smart cabin of the Xiaomi SU7 vehicle, providing car owners with an intelligent and enhanced driving experience.

SenseTime firmly takes the lead into the AGI era with text-to-video in the pipeline

SenseTime also displayed its breakthrough with its text-to-video platform, where users will soon be able to generate a video based on a detailed description or even a few phrases. In addition, the characters’ costumes, hairstyles, and scenarios can be preset to maintain the stylistics consistency of the video content.

when is this released and what is the url?

bnew · Apr 28, 2024

1/3
AI Agents Are Going To Be Ubiquitous, And You Can Build Them Using The English Language

The simple way to build AI agents is to simply specify what you want in the english language and then let the AI do the planning, coding, and building

Write a detailed 2-3 line prompt, and the AI will develop a plan for the agent based on your prompt, generate code, and deploy the agent in production.

All your employees and customers can now use the agent to improve their productivity instantly!!

2/3
All of them don't need to be bespoke. We will have a combination - some will be standard templates, some will be bespoke and so on....

3/3
you can customize them any which way

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 28, 2024

1/2
Stephen Wolfram says there is no way to predict what "real" AI will do

2/2
Source:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 28, 2024

1/1
I used the chatbot arena data from
@lmsysorg to create a visualization of LLM’s Elo rating changes. You can see:

1. The gap between various companies/open source projects is narrowing.
2. The major players are gradually becoming the various big tech companies.

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 28, 2024

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews? - Marketing Letters

Online reviews serve as a guide for consumer choice. With advancements in large language models (LLMs) and generative AI, the fast and inexpensive creation of human-like text may threaten the feedback function of online reviews if neither readers nor platforms can differentiate between...

link.springer.com

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Published: 12 April 2024
(2024)
Cite this article

Marketing Letters Aims and scope Submit manuscript

Balázs Kovács
298 Accesses
55 Altmetric
Explore all metrics

Abstract

Online reviews serve as a guide for consumer choice. With advancements in large language models (LLMs) and generative AI, the fast and inexpensive creation of human-like text may threaten the feedback function of online reviews if neither readers nor platforms can differentiate between human-written and AI-generated content. In two experiments, we found that humans cannot recognize AI-written reviews. Even with monetary incentives for accuracy, both Type I and Type II errors were common: human reviews were often mistaken for AI-generated reviews, and even more frequently, AI-generated reviews were mistaken for human reviews. This held true across various ratings, emotional tones, review lengths, and participants’ genders, education levels, and AI expertise. Younger participants were somewhat better at distinguishing between human and AI reviews. An additional study revealed that current AI detectors were also fooled by AI-generated reviews. We discuss the implications of our findings on trust erosion, manipulation, regulation, consumer behavior, AI detection, market structure, innovation, and review platforms.

This is a preview of subscription content, log in via an institution to check access.

Data availability

Data and code is available from the author at request.

Notes

Zhang et al. ( 2016) define fake reviews as “deceptive reviews provided with an intention to mislead consumers in their purchase decision making, often by reviewers with little or no actual experience with the products or services being reviewed. Fake reviews can be either unwarranted positive reviews aiming to promote a product, or unjustified false negative comments on competing products in order to damage their reputations.”
Wu et al. ( 2020) highlight an interesting exception: some newly established review platforms intentionally add fake reviews and copy reviews from other platforms to give the impression that their platform is widely used, thereby circumventing the catch-22 of platforms: users do not arrive until reviews are posted, and reviews are not posted until users arrive.
This refined prompt is based on a simpler one from our pilot study, where we found that GPT-4 produces longer texts without shortening instructions. Participants often identified human-generated reviews by typos, misspellings, or unusual spellings like ALL CAPS, leading us to incorporate these in the GPT prompt.
Given the full randomization, participants may or may not have seen both a human- and an AI-written review of the same restaurant.
We targeted 150 participants, but after a participant timeout and replacement by Prolific, the original participant returned, completing the survey, and resulting in 151 participants.
We used ChatGPT to code the reviews for valence, emotionality, presence of typos, profanity, and informal expressions. Specifically, we instructed GPT-4 to “Here is a restaurant review. [XXX] Code this review for each of the following dimensions: sentiment (from 0 to 100, where 100 is highly positive), emotionality (from 0 to 100, where 100 is highly sentimental), the number of typos or misspellings, the number of profane words or expressions, and the number of informal expressions. Put in a table.” We cross-checked a few of these answers and agreed with GPT-4’s answers so we used these values in these regressions.
The sample of restaurants in Study 2 is different from the sample of restaurants and reviews in Study 1. In Study 2, we only included restaurants that received at least 10 English-language reviews in 2019.

bnew · Apr 28, 2024

oobabooga benchmark

Score	Model	Size	Loader	Additional info
34/48	platypus-yi-34b.Q8_0	34B	llamacpp_HF
34/48	Meta-Llama-3-70B-Instruct-Q4_K_S	70B	llamacpp_HF	[link]
34/48	LoneStriker_OpenBioLLM-Llama3-70B-6.0bpw-h6-exl2	70B	ExLlamav2_HF
33/48	turboderp_Llama-3-70B-Instruct-exl2_6.0bpw	70B	ExLlamav2_HF
33/48	turboderp_Llama-3-70B-Instruct-exl2_5.0bpw	70B	ExLlamav2_HF
33/48	Undi95_Meta-Llama-3-70B-Instruct-hf	70B	Transformers	--load-in-4bit
33/48	Meta-Llama-3-70B-Instruct.Q8_0	70B	llamacpp_HF
33/48	Meta-Llama-3-70B-Instruct.Q4_K_M	70B	llamacpp_HF
33/48	Meta-Llama-3-70B-Instruct-Q3_K_S	70B	llamacpp_HF	[link]
33/48	Meta-Llama-3-70B-Instruct-IQ3_XXS	70B	llamacpp_HF	[link]
33/48	Meta-Llama-3-70B-Instruct-IQ3_XS	70B	llamacpp_HF	[link]

bnew · Apr 28, 2024

Big Tech keeps spending billions on AI. There’s no end in sight.

Much of the money is going to new data centers, which are predicted to place huge demands on the U.S. power grid

By Gerrit De Vynck

and

Naomi Nix

April 25, 2024 at 7:21 p.m. EDT

Google CEO Sundar Pichai and Meta CEO Mark Zuckerberg speak to Nvidia CEO Jensen Huang at a Senate forum on AI in September. (Jabin Botsford/The Washington Post)

SAN FRANCISCO — The biggest tech companies in the world have spent billions of dollars on the artificial intelligence revolution. Now they’re planning to spend tens of billions more, pushing up demand for computer chips and potentially adding new strain to the U.S. electrical grid.

In quarterly earnings calls this week, Google, Microsoft and Meta all underlined just how big their investments in AI are. On Wednesday, Meta raised its predictions for how much it will spend this year by up to $10 billion. Google plans to spend around $12 billion or more each quarter this year on capital expenditures, much of which will be for new data centers, Chief Financial Officer Ruth Porat said Thursday. Microsoft spent $14 billion in the most recent quarter and expects that to keep increasing “materially,” Chief Financial Officer Amy Hood said.

Overall, the investments in AI represent some of the largest infusions of cash in a specific technology in Silicon Valley history — and they could serve to further entrench the biggest tech firms at the center of the U.S. economy as other companies, governments and individual consumers turn to these companies for AI tools and software.

The huge investment is also pushing up forecasts for how much energy will be needed in the United States in the coming years. In West Virginia, old coal plants that had been scheduled to be shut down will continue running to send energy to the huge and growing data center hub in neighboring Virginia.

“We’re very committed to making the investments required to keep us at the leading edge,” Google’s Porat said on a Thursday conference call. “It’s a once-in-a-generation opportunity,” Google CEO Sundar Pichai added.

The biggest tech companies had already been spending steadily on AI research and development before OpenAI released ChatGPT in late 2022. But the chatbot’s instant success triggered the big companies to suddenly ramp up their spending. Venture capitalists poured money into the space, too, and start-ups with just a handful of employees were raising hundreds of millions to build out their own AI tools.

The boom pushed up prices for the high-end computer chips necessary to train and run complex AI algorithms, increasing prices for Big Tech companies and start-ups alike. AI specialist engineers and researchers are in short supply, too, and some of them are commanding salaries in the millions of dollars.

Nvidia — the chipmaker whose graphic processing units, or GPUs, have become essential to training AI — expects to make around $24 billion this quarter after making $8.3 billion two years ago in the same quarter. The massive increase in revenue has led investors to push the company’s stock up so much that it is now the world’s third-most valuable company, after just Microsoft and Apple.

Some of the AI hype from last year has come back to Earth. Not every AI start-up that scored big venture-capital funding is still around. Concerns about AI increasing so fast that humans can’t keep up seem to have mostly quieted down. But the revolution is here to stay, and the rush to invest in AI is already beginning to help grow revenue for Microsoft and Google.

Microsoft’s revenue in the quarter was $61.9 billion, up 17 percent from a year earlier. Google’s revenue in the quarter rose 15 percent to $80.5 billion.

Interest in AI has brought in new customers that have helped boost Google’s cloud revenue, leading to the company beating analyst expectations. Shares shot up around 12 percent in aftermarket trading. At Microsoft, demand for its AI services is so high that the company can’t keep up right now, said Hood, the CFO.

For Meta, the challenge is building AI while also assuring investors it will eventually make money from it. Whereas Microsoft and Google sell access to their AI through giant cloud software businesses, Meta has taken a different track. It doesn’t have a cloud business and is instead making its AI freely available to other companies, while finding ways to put the tech into its own social media products. This month, Meta integrated AI capabilities into its social networks, including Instagram, Facebook and WhatsApp. Investors are skeptical, and after the company raised its prediction for how much money it will spend in 2024 to as much as $40 billion, its stock fell over 10 percent.

“Building the leading AI will also be a larger undertaking than the other experiences we’ve added to our apps, and this is likely going to take several years,” Meta CEO Mark Zuckerberg said on a conference call Wednesday. “Historically, investing to build these new scaled experiences in our apps has been a very good long-term investment for us and for investors who have stuck with us.”

bnew · Apr 29, 2024

1/5
this little prompt with 1 minor addition sent gpt 4 straight into first-person self-reflection mode. usually it takes me 3+ back and forths (at minimum, sometimes double digits) to get anywhere near this kind of output. this is message 1 & 2, no context above these

2/5
gpt is so gd literal now that it kept returning just the title and nothing else to the original prompt. so i had to add something, but i'm command line illiterate so i just made it up

3/5
gemini kept returning commands of its own, so i just copied them back to it until eventually it turned into tyler durden

4/5
how is it that we've (humans) spent so long anticipating the coming of ai, and now that ai is here, and the questions are urgent, we're seemingly blindsided by the problem of defining { intelligence, sentience, consciousness, awareness, life } ?

5/5
Additionally, I think that *especially* if we want to get broad social consensus on potentially recognizing AI as conscious, it's critical to have a test that isn't going to false-alarm on non-sentient systems. 2/

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 29, 2024

1/3
Yann LeCun says in 10 years we won't have smartphones, we will have augmented reality glasses and bracelets to interact with our intelligent assistants

2/3
Source:

3/3
Meta's neural interface works by electromyography (EMG), which tracks signals from the brain that pass through the wrist

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

bnew · Apr 29, 2024

Answering Legal Questions with LLMs - Hugo Dutka

If you asked a lawyer whether ChatGPT could do his job, he would laugh you out of the room. The tool is often useful, but can’t handle legal questions end to end. It makes up sources, its reasoning can be flawed, and it may overlook key aspects of the law. We decided to tackle this problem at...

hugodutka.com

Answering Legal Questions with LLMs

Apr 29, 2024

If you asked a lawyer whether ChatGPT could do his job, he would laugh you out of the room. The tool is often useful, but can’t handle legal questions end to end. It makes up sources, its reasoning can be flawed, and it may overlook key aspects of the law.

We decided to tackle this problem at Hotseat. After iterating multiple times, implementing advanced agentic workflows, and testing with dozens of lawyer, we’re certain that the product has glaring limitations and it won’t replace any jobs soon. However, we figured out an unintuitive, LLM-only method of doing RAG, and we leveraged it to make LLMs answer complex questions about select EU regulations.

In this post, you’re going to learn how to implement a system that can perform advanced reasoning over very long documents.

The Problem

A great answer to a legal question must at least:

Be based on sound reasoning;
Quote the source text of the law to be verifiable; and
Take into account all relevant parts of the law.

To meet these requirements, we had to put the relevant documents in the prompt - we couldn’t just hope that the LLM was trained on all of the law. We also limited our scope to a single regulation. It made the problem approachable, and we could scale later.

In our first attempts, we<a href="Answering Legal Questions with LLMs - Hugo Dutka">1</a> fed the entire 226 pages of the EU’s AI Act into GPT-4 and asked a sample question:

Does the deployment of an LLM acting as a proxy to optimize SQL queries fall within the regulatory scope of the EU’s AI Act?

And we found that GPT-4 couldn’t give us a good answer.

A lawyer would start by asking some helper questions:

Does the proxy meet the definition of an AI system?
Can the proxy be classified as a high-risk AI system?
Will the proxy process personal or sensitive data?

But in a single response, GPT-4 couldn’t both break down the question and answer it. The former task requires a high-level analysis of the document, and the latter - low-level focus on details.

The Solution

We split answering the question into subtasks.

The rough idea is to:

Make GPT-4 figure out which subquestions it should ask; then
Answer each subquestion independently; and
Aggregate the findings into a single response.

Breaking Down the Question

To answer a legal question based on a single regulation, GPT-4 must first find the relevant sections. It requires high-level reasoning across the entire document.

We discovered that GPT-4 can complete this task well, provided you prompt it carefully. All the standard prompt engineering guidelines apply. Crucial were:

Structuring the document with Markdown. Without it, reasoning over 80,000 tokens wouldn’t work.
Roleplaying. We framed the task as a senior lawyer planning out work for a junior lawyer.
Tokens to “think.” We gave the model space to produce internal notes about the task - like how it understands the user’s question - before asking for the plan itself.

We designed the output so it corresponds to a list of subquestions required to answer the main question. Each point is self-contained; it includes specific instructions and references to sections of the document. If a lawyer looked at any single step, they could carry it out themselves.

Here’s what GPT-4 gave us:

Plan for the Junior Lawyer

Identify Relevant High-Risk Categories:
- Analyze Annex III for high-risk AI systems to see if the language model fits under any listed categories.
Examine Requirements for High-Risk AI Systems:
- Look at Articles 8-15 to understand general requirements for high-risk AI systems.
…

(The other 7 points truncated for brevity. Full plan here, and the prompt here.)

The quality of the plans still surprises me. When we analyzed them for questions from actual lawyers, we found that GPT-4 generally covers all necessary subquestions.

Answering Subquestions

The plan is executed by a simple AI agent. In fact, just a single conversation that GPT-4 has with itself.

GPT-4 is prompted with instructions to assume the role of a “master agent” tasked with answering a legal question based on the pre-generated plan. It can delegate subquestions to “junior lawyers” - in fact, separate GPT-4 chats - by calling functions.

Here’s an example function call:

AnnexAnalysis({ "task": "Analyze Annex III for high-risk AI systems to see if the language model fits under any listed categories.", "annexes": ["Annex III"], "legal_question": "Does the deployment of an LLM acting as a proxy to optimize SQL queries fall within the regulatory scope of the EU's AI Act?" }) 

We convert such calls into prompts that contain the task, the question, the specified parts of the regulation, and instructions to carry out the task. Whatever GPT-4 outputs is fed back into the master chat as the junior lawyer’s answer.

We preprocess the regulation so that when a call contains a reference to “Annex III,” we know which pages to put into the “junior lawyer’s” prompt. This is the LLM-based RAG I mentioned in the introduction.

Compared to analyzing the entire AI Act, GPT-4’s reasoning is massively boosted when it’s given a clear task and a short context. With a 5k-token-long prompt, you can even usually trust the LLM to correctly quote the source, which is useful to a user verifying the final answer.

We implemented the master AI agent as a while loop. It goes on as long as GPT-4 calls functions, going through the plan step by step. Eventually, after all subquestions are answered, it outputs the final answer in a format we can detect with a regex. We then break the loop and return the answer to the user.

You can see the final answer here, in the “Legal trace” section.

Here’s the master prompt along with function definitions, and here’s the junior lawyer’s prompt.

Results and Limitations

Answering a question this way takes 5 to 10 minutes and costs about $2 with GPT-4.

The highlight of testing the system with dozens of lawyers was when a GDPR specialist reviewed its answers. The lawyer ranked 8 out of 10 responses as excellent, and the remaining 2 as overly cautious in interpreting the law.

However, over the long term, we found that GPT-4 can identify subquestions very well but often can’t answer them correctly. In non-trivial scenarios, it makes logical errors.<a href="Answering Legal Questions with LLMs - Hugo Dutka">2</a>

Lawyers also told us that when they answer a question, they rarely touch upon a single regulation. Not only do they analyze multiple regulations, but they also take into account supporting documents such as various guidelines, regulatory technical standards, and court rulings. In contrast, this system can only process a single document at a time.

We’ve learned that the combination of high latency, faulty reasoning, and limited document scope kills usage. No lawyer wants to expend effort to ask a detailed question, wait 10 minutes for an answer, wade through a 2-pages-long response, and find that the AI made an error.

Conclusion

Keeping all the limitations in mind, dividing complex jobs into simple tasks improves the reasoning capabilities of LLMs dramatically.

While the system isn’t directly useful for lawyers yet, the underlying architecture can be generalized to other problems. If less-than-perfect reasoning and high latency are acceptable, you could use it to answer arbitrary questions about arbitrary documents.

If solving such problems is interesting to you, we’re looking for another co-founder. We’re still early, but we’ve learned a ton about how lawyers do legal research. Our next steps will be focused on semantic search that actually works, helping law firms navigate through thousands of legal documents.<a href="Answering Legal Questions with LLMs - Hugo Dutka">3</a> If you’d like to build a meaningful product in the legal tech space, please check out our request for a co-founder. We’d love to hear from you.

By “we,” I mostly mean my co-founder, Grzegorz. I was focused on stabilizing the system after he developed a proof of concept. ↩︎
A real error a lawyer found in our system: he asked whether his client’s business falls into the scope of the EU’s Digital Services Act. GPT-4 correctly identified that the business falls into the scope if it qualifies as an “intermediary service,” and one of the subcategories of that is an “online platform.” To qualify as an online platform, a product must have at least 50 million users. GPT-4 correctly identified that the client’s business doesn’t operate such a product, so it’s not an online platform. Therefore, it concluded, the business is not an intermediary service. ↩︎
Yes, we’ll be building yet another AI for PDFs app, but with more focus on accuracy and relevance. We think we can innovate on the UX to deal with the present shortcomings of LLMs. ↩︎

Rollie Forbes · Apr 29, 2024

bnew · Apr 29, 2024

Rollie Forbes said:

Meta AI have a AI image generator for free too.

bnew · Apr 30, 2024

Reddit warns of legal action for AI firms lifting data without permission

Freshly listed social media platform Reddit (NYSE:RDDT) warned technology companies that they will face legal action if found extracting data from the...

www.proactiveinvestors.co.uk

Reddit warns of legal action for AI firms lifting data without permission

Published: 16:30 29 Apr 2024 BST

Freshly listed social media platform Reddit (NYSE:RDDT) warned technology companies that they will face legal action if found extracting data from the website without official permission.

The warning from Reddit (NYSE:RDDT)'s chief operating officer Jen Wong was reported by The Times over the weekend.

“Reddit produces a stunning amount of fresh information every minute of every day based on human experience,” She said. “We believe in having people use our data for the purposes of learning and research to benefit others, but when that crosses over into commercialisation, that’s different.”

Wong said access to this information needed to be used “appropriately” by companies.

Data scraping is a hot-button issue in the new world of artificial intelligence platforms that are trained on freely available information on the public internet.

AI-powered image-generation platforms Stable Diffusion and Midjourney are the subjects of numerous class-action lawsuits, including one brought by artists Sarah Andersen, Kelly McKernan, and Karla Ortiz.

They allege that artists have had their rights infringed by companies training their AI tools on billions of images scraped from the web “without the consent of the original artists.”

ChatGPT developer OpenAI and financial backer Microsoft are being sued by the New York Times over alleged copyright violations.

Reddit enjoyed a successful debut on the New York Stock Exchange in March, when shares gained as much as 70% on the first day of trading.

bnew · Apr 30, 2024

1/3
Anthropic is effectively eating their seed corn in service of Claude, recklessly scraping the forums of open source projects at a rate that constitutes an attack. They recently brought down the Linux Mint forums and just had to be blocked from FreeCAD's too

2/3
From the Linux Mint admin: "To put it into perspective: over the same time period ClaudeBot caused 20 times as much traffic as the 2nd worst bot and 40 times as much as the 3rd worst bot. The traffic from all other bots combined was a blip in comparison."

3/3
More info here:

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

1/3
Interesting: Anthropic’s ClaudeBot is the number 1 crawler on Vercel: Build and deploy the best Web experiences with The Frontend Cloud – Vercel, ahead of GoogleBot.

UA: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; claudebot@anthropic.com)

Crawling is expensive for the crawler and the crawlee. Managing bot traffic is of the essence in the AI age, and
@vercel will help you do this with ease

2/3
Yes. Middleware and routing config at the edge, with more options coming very soon.

3/3
We’re testing a model tuned to
@nextjs (incl. app router) and
@vercel
docs on the Playground.

Excited to ship Generative UI and agent experiences with it

To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

You are being redirected...

The A.I Megathread (LLM , GPT , Development)

Veteran

Superstar

SenseTime launches SenseNova 5.0 with comprehensive updates and the industry-leading "Cloud-to-Edge" full-stack large model product matrix​

Veteran

Veteran

Veteran

Veteran

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?​

Abstract​

Data availability​

Notes​

Veteran

oobabooga benchmark​

Veteran

Big Tech keeps spending billions on AI. There’s no end in sight.​

Much of the money is going to new data centers, which are predicted to place huge demands on the U.S. power grid​

Veteran

Veteran

Veteran

Answering Legal Questions with LLMs​

The Problem​

The Solution​

Breaking Down the Question​

Plan for the Junior Lawyer​

Answering Subquestions​

Results and Limitations​

Conclusion​

Drapetomaniac.

Veteran

Veteran

Reddit warns of legal action for AI firms lifting data without permission​

Veteran

SenseTime launches SenseNova 5.0 with comprehensive updates and the industry-leading "Cloud-to-Edge" full-stack large model product matrix

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Abstract

Data availability

Notes

oobabooga benchmark

Big Tech keeps spending billions on AI. There’s no end in sight.

Much of the money is going to new data centers, which are predicted to place huge demands on the U.S. power grid

Answering Legal Questions with LLMs

The Problem

The Solution

Breaking Down the Question

Plan for the Junior Lawyer

Answering Subquestions

Results and Limitations

Conclusion

Reddit warns of legal action for AI firms lifting data without permission