bnew

Veteran
Joined
Nov 1, 2015
Messages
51,814
Reputation
7,926
Daps
148,759

OpenAI’s Sam Altman says human-level AI is coming but will change world much less than we think​

PUBLISHED TUE, JAN 16 20242:55 PM ESTUPDATED TUE, JAN 16 20244:23 PM EST

MacKenzie Sigalos @KENZIESIGALOS

Ryan Browne
@RYAN_BROWNE_

KEY POINTS

  • OpenAI CEO Sam Altman said artificial general intelligence, or AGI, could be developed in the “reasonably close-ish future.”
  • AGI is a term used to refer to a form of artificial intelligence that can complete tasks to the same level, or a step above, humans.
  • Altman said AI isn’t yet replacing jobs at the scale that many economists fear, and that it’s already becoming an “incredible tool for productivity.”

Sam Altman, chief executive officer of OpenAI, at the Hope Global Forums annual meeting in Atlanta, Georgia, US, on Monday, Dec. 11, 2023. The meeting includes over 5,200 delegates representing 40 countries aiming to reimagine the global economy so the benefits and opportunities of free enterprise are extended to everyone. Photographer: Dustin Chambers/Bloomberg via Getty Images

Sam Altman, CEO of OpenAI, at the Hope Global Forums annual meeting in Atlanta on Dec. 11, 2023.

Dustin Chambers | Bloomberg | Getty Images

OpenAI CEO Sam Altman says concerns that artificial intelligence will one day become so powerful that it will dramatically reshape and disrupt the world are overblown.

“It will change the world much less than we all think and it will change jobs much less than we all think,” Altman said at a conversation organized by Bloomberg at the World Economic Forum in Davos, Switzerland.

Altman was specifically referencing artificial general intelligence, or AGI, a term used to refer to a form of AI that can complete tasks to the same level, or a step above, humans.

He said AGI could be developed in the “reasonably close-ish future.”

Altman, whose company burst into the mainstream after the public launch of the ChatGPT chatbot in late 2022, has tried to temper concerns from AI skeptics about the degree to which the technology will take over society.

Before the introduction of OpenAI’s GPT-4 model in March, Altman warned technologists not to get overexcited by its potential, saying that people would likely be “disappointed” with it.

“People are begging to be disappointed and they will be,” Altman said during a January interview with StrictlyVC. “We don’t have an actual [artificial general intelligence] and that’s sort of what’s expected of us.”

Founded in 2015, OpenAI’s stated mission is to achieve AGI. The company, which is backed by Microsoft and has a private market valuation approaching $100 billion, says it wants to design the technology safely.

Following Donald Trump’s victory in the Iowa Republican caucus on Monday, Altman was asked whether AI might exacerbate economic inequalities and lead to dislocation of the working class as the presidential elections pick up steam.

“Yes, for sure, I think that’s something to think about,” Altman said. But he later said, “This is much more of a tool than I expected.”

Altman said AI isn’t yet replacing jobs at the scale that many economists fear, and added that the technology is already getting to a place where it’s becoming an “incredible tool for productivity.”

Concerns about AI safety and OpenAI’s role in protecting it were at the center of Altman’s brief ouster from the company in November after the board said it had lost confidence in its leader. Altman was swiftly reinstated as CEO after a broad backlash from OpenAI employees and investors. Upon his return, Microsoft gained a nonvoting board observer seat at OpenAI.




 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,814
Reputation
7,926
Daps
148,759





Computer Science > Computation and Language​

[Submitted on 18 Jan 2024]

Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs​

Haritz Puerto, Martin Tutek, Somak Aditya, Xiaodan Zhu, Iryna Gurevych
Reasoning is a fundamental component for achieving language understanding. Among the multiple types of reasoning, conditional reasoning, the ability to draw different conclusions depending on some condition, has been understudied in large language models (LLMs). Recent prompting methods, such as chain of thought, have significantly improved LLMs on reasoning tasks. Nevertheless, there is still little understanding of what triggers reasoning abilities in LLMs. We hypothesize that code prompts can trigger conditional reasoning in LLMs trained on text and code. We propose a chain of prompts that transforms a natural language problem into code and prompts the LLM with the generated code. Our experiments find that code prompts exhibit a performance boost between 2.6 and 7.7 points on GPT 3.5 across multiple datasets requiring conditional reasoning. We then conduct experiments to discover how code prompts elicit conditional reasoning abilities and through which features. We observe that prompts need to contain natural language text accompanied by high-quality code that closely represents the semantics of the instance text. Furthermore, we show that code prompts are more efficient, requiring fewer demonstrations, and that they trigger superior state tracking of variables or key entities.
Comments:Code, prompt templates, prompts, and outputs are publicly available at this https URL
Subjects:Computation and Language (cs.CL)
Cite as:arXiv:2401.10065 [cs.CL]
(or arXiv:2401.10065v1 [cs.CL] for this version)

Submission history​

From: Haritz Puerto [view email]
[v1] Thu, 18 Jan 2024 15:32:24 UTC (9,152 KB)





Code Prompting: A New Horizon in AI’s Reasoning Capabilities​

Conditional reasoning is a fundamental aspect of intelligence, both in humans and artificial intelligence systems. It’s the process of making decisions or drawing conclusions based on specific conditions or premises. In our daily lives, we often use conditional reasoning without even realizing it. For example, deciding whether to take an umbrella depends on the condition of the weather forecast. Similarly, artificial intelligence (AI), particularly large language models (LLMs), also attempts to mimic this essential human ability.
While LLMs like GPT-3.5 have demonstrated remarkable capabilities in various natural language processing tasks, their prowess in conditional reasoning has been somewhat limited and less explored. This is where a new research paper comes into play, introducing an innovative approach known as “code prompting” to enhance conditional reasoning in LLMs trained on both text and code.

The Concept of Code Prompting​

A diagram showcasing how code prompting works compared to text based prompting

Image Source: Puerto, H., Tutek, M., Aditya, S., Zhu, X., & Gurevych, I. (2024). Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs. arXiv preprint arXiv:2401.10065.
Code prompting is an intriguing technique where a natural language problem is transformed into code before it’s presented to the LLM. This code isn’t just a jumble of commands and syntax; it thoughtfully retains the original text as comments, essentially embedding the textual logic within the code’s structure. This approach is revolutionary in how it leverages the strengths of LLMs trained on both text and code, potentially unlocking new levels of reasoning capabilities.

Testing and Results: A Leap Forward in Conditional Reasoning​

To evaluate the effectiveness of code prompting, the researchers conducted experiments using two conditional reasoning QA datasets - ConditionalQA and BoardgameQA. The results were noteworthy. Code prompting consistently outperformed regular text prompting, marking improvements ranging from 2.6 to 7.7 points. Such a significant leap forward clearly indicates the potential of code prompting in enhancing the conditional reasoning abilities of LLMs.
An essential aspect of these experiments was the ablation studies. These studies confirmed that the performance gains were indeed due to the code format and not just a byproduct of text simplification during the transformation process.

Deeper Insights from the Research​

The research provided some critical insights into why code prompting works effectively:
  • Efficiency in Learning: Code prompts proved to be more sample efficient, requiring fewer examples to demonstrate and learn.
  • Importance of Retaining Original Text: Keeping the original natural language text as comments in the code was crucial. The performance dropped significantly without this element, highlighting the importance of context retention.
  • Semantics Matter: The semantics of the code needed to closely mirror the original text. Random or irrelevant code structures did not yield the same results, underscoring the need for a logical representation of the text’s logic in code form.
  • Superior State Tracking: One of the most significant advantages of code prompts was their ability to track the state of key entities or variables more effectively. This ability is particularly useful in complex reasoning tasks involving multiple steps or conditions.

Concluding Thoughts: The Future of Reasoning in AI​

The implications of this study are vast for the development of AI, especially in enhancing reasoning abilities in LLMs. Code prompting emerges not just as a technique but as a potential cornerstone in the evolution of AI reasoning. It underscores the importance of not just exposing models to code but doing so in a manner that closely aligns with the original textual logic.
Key Takeaways:
  • Converting text problems into code can significantly enhance reasoning abilities in models trained on both text and code.
  • The format and semantics of the code are crucial; it’s not just about the exposure to code but its meaningful integration with the text.
  • Efficiency and improved state tracking are two major benefits of code prompts.
  • Retaining original natural language text within the code is essential for context understanding.
While this research opens new doors in AI reasoning, it also paves the way for further exploration. Could this technique be adapted to improve other forms of reasoning? How might it evolve with advancements in AI models? These are questions that beckon.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,814
Reputation
7,926
Daps
148,759



About​

Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.

Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage. This project proves that it's possible split the workload of LLMs across multiple devices and achieve a significant speedup. Distributed Llama allows you to run huge LLMs in-house. The project uses TCP sockets to synchronize the state. You can easily configure your AI cluster by using a home router.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,814
Reputation
7,926
Daps
148,759







Different OS models have been tuned on different type of data, for example:

👉 Generating code - DeepSeek, CodeBooga
👉 Generating SQL queries - SQLCoder, NSQL
👉 Following instructions - Mixtral Instruct
👉 Generating engaging stories - MythoMax, Synthia
👉 Performing function calling - DeepSeek, WizardCoder
👉 Extracting data - Phi, Airoboros
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,814
Reputation
7,926
Daps
148,759

Voice cloning startup ElevenLabs lands $80M, achieves unicorn status​

Kyle Wiggers @kyle_l_wiggers / 3:01 AM EST•January 22, 2024

speech-recognition

Image Credits: Bryce Durbin/TechCrunch

There’s a lot of money in voice cloning.

Case in point: ElevenLabs, a startup developing AI-powered tools to create and edit synthetic voices, today announced that it closed an $80 million Series B round co-led by prominent investors including Andreessen Horowitz, former GitHub CEO Nat Friedman and entrepreneur Daniel Gross.

The round, which also had participation from Sequoia Capital, Smash Capital, SV Angel, BroadLight Capital and Credo Ventures, brings ElevenLabs’ total raised to $101 million and values the company at over $1 billion (up from ~$100 million last June). CEO Mati Staniszewski says the new cash will be put toward product development, expanding ElevenLabs’ infrastructure and team, AI research and “enhancing safety measures to ensure responsible and ethical development of AI technology.”

“We raised the new money to cement ElevenLabs’ position as the global leader in voice AI research and product deployment,” Staniszewski told TechCrunch in an email interview.

Co-founded in 2022 by Piotr Dabkowski, an ex-Google machine learning engineer, and Staniszewski, a former Palantir deployment strategist, ElevenLabs launched in beta around a year ago. Staniszewski says that he and Dabkowski, who grew up in Poland, were inspired to create voice cloning tools by poorly dubbed American films. AI could do better, they thought.

Today, ElevenLabs is perhaps best known for its browser-based speech generation app that can create lifelike voices with adjustable toggles for intonation, emotion, cadence and other key vocal characteristics. For free, users can enter text and get a recording of that text read aloud by one of several default voices. Paying customers can upload voice samples to craft new styles using ElevenLabs’ voice cloning.

Increasingly, ElevenLabs is investing in versions of its speech-generating tech aimed at creating audiobooks and dubbing films and TV shows, as well as generating character voices for games and marketing activations.

Last year, the company released a “speech to speech” tool that attempts to preserve a speaker’s voice, prosody and intonation while automatically removing background noise, and — in the case of movies and TV shows — translates and synchronizes speech with the source material. On the roadmap for the coming weeks is a new dubbing studio workflow with tools to generate and edit transcripts and translations and a subscription-based mobile app that narrates webpages and text using ElevenLabs voices.

ElevenLabs’ innovations have won the startup customers in Paradox Interactive, the game developer whose recent projects include Cities: Skylines 2 and Stellaris, and The Washington Post — among other publishing, media and entertainment companies. Staniszewski claims that ElevenLab users have generated the equivalent of more than 100 years of audio and that the platform is being used by employees at 41% of Fortune 500 companies.

But the publicity hasn’t been totally positive.

The infamous message board *****, known for its conspiratorial content, used ElevenLabs’ tools to share hateful messages mimicking celebrities like actress Emma Watson. The Verge’s James Vincent was able to tap ElevenLabs to maliciously clone voices in a matter of seconds, generating samples containing everything from threats of violence to racist and transphobic remarks. And over at Vox, reporter Joseph Cox documented generating a clone convincing enough to fool a bank’s authentication system.

In response, ElevenLabs has attempted to root out users repeatedly violating its terms of service, which prohibits abuse, and rolled out a tool to detect speech created by its platform. This year, ElevenLabs plans to improve the detection tool to flag audio from other voice-generating AI models and partner with unnamed “distribution players” to make the tool available on third-party platforms, Staniszewski says.

ElevenLabs

ElevenLabs offers an array of different voices, some synthetic, some cloned from voice actors.

ElevenLabs has also faced criticism from voice actors who claim that the company uses samples of their voices without their consent — samples that could be leveraged to promote content they don’t endorse or spread mis- and dis-information. In a recent Vice article, victims recount how ElevenLabs was used in harassment campaigns against them, in one example to share an actor’s private information — their home address — using a cloned voice.

Then there’s the elephant in the room: the existential threat platforms like ElevenLabs pose to the voice acting industry.

Motherboard writes about how voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them — sometimes without commensurate compensation. The fear is that voice work — particularly cheap, entry-level work — will eventually be replaced by AI-generated vocals, and that actors will have no recourse.

Some platforms are trying to strike a balance. Earlier this month, Replica Studios, an ElevenLabs competitor, signed a deal with SAG-AFTRA to create and license digital replicas of the media artist union members’ voices. In a press release, the organizations said that the arrangement established “fair” and “ethical” terms and conditions to ensure performer consent — and negotiating terms for uses of digital voice doubles in new works.

Even this didn’t please some voice actors, however — including SAG-AFTRA’s own members.

ElevenLabs’ solution is a marketplace for voices. Currently in alpha and set to become more widely available in the next several weeks, the marketplace allows users to create a voice, verify and share it. When others use a voice, the original creators receive compensation, Staniszewski says.

“Users always retain control over their voice’s availability and compensation terms,” he added. “The marketplace is designed as a step towards harmonizing AI advancements with established industry practices, while also bringing a diverse set of voices to ElevenLabs’ platform.”

Voice actors may take issue with the fact that ElevenLabs isn’t paying in cash, though — at least not at present. The current setup has creators receiving credit toward ElevenLabs’ premium services (which some find ironic, I’d wager).

Perhaps that’ll change in the future as ElevenLabs — which is now among the best-funded synthetic voice startups — attempts to beat back upstart competition like Papercup, Deepdub, ElevenLabs, Acapela, Respeecher and Voice.ai as well as Big Tech incumbents such as Amazon, Microsoft and Google. In any case, ElevenLabs, which plans to grow its headcount from 40 people to 100 by the end of the year, intends on sticking around — and making waves — in the fast-growing synthetic voice market.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,814
Reputation
7,926
Daps
148,759







TikTok presents Depth Anything

Unleashing the Power of Large-Scale Unlabeled Data

paper page: https://huggingface.co/papers/2401.10891huggingface.co/papers/2401.1…
demo: Depth Anything - a Hugging Face Space by LiheYoung

Depth Anything is trained on 1.5M labeled images and 62M+ unlabeled images jointly, providing the most capable Monocular Depth Estimation (MDE) foundation models with the following features:

zero-shot relative depth estimation, better than MiDaS v3.1 (BEiTL-512)

zero-shot metric depth estimation, better than ZoeDepth

optimal in-domain fine-tuning and evaluation on NYUv2 and KITTI




Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data​

Published on Jan 19
·Featured in Daily Papers on Jan 21
Authors:
Lihe Yang,
Bingyi Kang,
Zilong Huang,
Xiaogang Xu,
Jiashi Feng,
Hengshuang Zhao

Abstract​

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation model dealing with any images under any circumstances. To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error. We investigate two simple yet effective strategies that make data scaling-up promising. First, a more challenging optimization target is created by leveraging data augmentation tools. It compels the model to actively seek extra visual knowledge and acquire robust representations. Second, an auxiliary supervision is developed to enforce the model to inherit rich semantic priors from pre-trained encoders. We evaluate its zero-shot capabilities extensively, including six public datasets and randomly captured photos. It demonstrates impressive generalization ability. Further, through fine-tuning it with metric depth information from NYUv2 and KITTI, new SOTAs are set. Our better depth model also results in a better depth-conditioned ControlNet. Our models are released at GitHub - LiheYoung/Depth-Anything: Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data.


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,814
Reputation
7,926
Daps
148,759











About​

InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥

InstantID : Zero-shot Identity-Preserving Generation in Seconds

InstantID is a new state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks.

 
Last edited:
Top