The A.I Megathread (LLM , GPT , Development)

bnew · Nov 9, 2023

https://archive.is/2AUIe

bnew · Nov 9, 2023

https://archive.is/8O7ji

bnew · Nov 9, 2023

https://archive.is/4NFTG

bnew · Nov 9, 2023

https://archive.is/OreIb

https://showlab.github.io/DynVideo-E

bnew · Nov 9, 2023

https://archive.is/SQ4S7

https://mrtornado24.github.io/DreamCraft3D/

bnew · Nov 9, 2023

https://www.theverge.com/2023/11/8/23953022/humane-ai-pin-price-specs-leak

Exclusive leak: all the details about Humane’s AI Pin, which costs $699 and has OpenAI integration

/

It sounds like a smartphone without a screen, and it will have a $24 / month subscription on top of it.

By David Pierce, editor-at-large and Vergecast co-host with over a decade of experience covering consumer tech. Previously, at Protocol, The Wall Street Journal, and Wired.

Nov 8, 2023, 5:38 PM EST|207 Comments / 207 New

A photo showing Humane’s AI pin attached to a model’s suit during a fashion show.

Humane’s AI Pin is set to officially be revealed this Thursday. Image: Humane

Humane has been teasing its first device, the AI Pin, for most of this year. It’s scheduled to launch the Pin on Thursday, but The Verge has obtained documents detailing practically everything about the device ahead of its official launch. What they show is that Humane, the company noisily promoting a world after smartphones, is about to launch what amounts to a $699 wearable smartphone without a screen that has a $24-a-month subscription fee and runs on a Humane-branded version of T-Mobile’s network with access to AI models from Microsoft and OpenAI.

The Pin itself is a square device that magnetically clips to your clothes or other surfaces. The clip is more than just a magnet, though; it’s also a battery pack, which means you can swap in new batteries throughout the day to keep the Pin running. We don’t know how long a single battery lasts, but the device ships with two “battery boosters.” It’s powered by a Qualcomm Snapdragon processor and uses a camera, depth, and motion sensors to track and record its surroundings. It has a built-in speaker, which Humane calls a “personic speaker,” and can connect to Bluetooth headphones.

Since there’s no screen, Humane has come up with new ways to interact with the Pin. It’s primarily meant to be a voice-based device, but there’s also that green laser projector we’ve seen in demos, which can project information onto your hand. You can also hold objects up to the camera and interact with the Pin through gestures, as there’s a touchpad somewhere on the device. The Pin isn’t always recording or even listening for a wake word, instead requiring you to manually activate it in some way. It has a “Trust Light,” which blinks on whenever the Pin is recording.

The $24-per-month Humane Subscription includes a phone number and cell data through T-Mobile

The documents show that Humane wants the Pin to be considered a fully standalone device, rather than an accessory to your smartphone. $699 gets you the Pin, a charger, and those two battery boosters. But the real story is that it costs $24 per month for a Humane Subscription, which includes a phone number and cell data on Humane’s own branded wireless service that runs on T-Mobile’s network, cloud storage for photos and videos, and the ability to make unlimited queries of AI models, although we’re not sure which ones specifically.

Humane didn’t respond to a request for comment.

The Pin’s operating system is called Cosmos, and rather than operate as a collection of apps, Humane seems to be imagining a more seamless system that can call up various AIs and other tools as you need them. It sounds a bit like ChatGPT’s plugins system, through which you can attach new features or data to your chatbot experience — which tracks with reports that the Pin would be powered by GPT-4.

The documents we’ve seen say the Pin can write messages that sound like you, and there’s a feature that will summarize your email inbox for you. The Pin can also translate languages and identify food to provide nutritional information. There is support for Tidal music streaming, which involves an “AI DJ” that picks music for you based on your current context. It will also offer AI-centric photography features, but it’s not clear what that means.

Humane clearly intends the Pin to be a self-contained and simple wearable, but there is a way to manage the device: a tool called Humane.center, which is where you’re meant to set up and customize your device before you start wearing it. It’s unclear whether this is a website or a phone app, but it’s how you access the notes, videos, and photos you collect while you’re wearing the Pin.

Humane is set to announce the device officially tomorrow, at which point we might get more answers about when the Pin will ship, how well it will work, and whether there’s really a case to be made for a smartphone without a screen.

bnew · Nov 9, 2023

China boldly claims it has a plan to mass-produce humanoid robots that can 'reshape the world' within 2 years

China published a blueprint for its plans to produce the robots, which it says will be as "disruptive" as smartphones.

www.businessinsider.com

China boldly claims it has a plan to mass-produce humanoid robots that can 'reshape the world' within 2 years

Jyoti Mann

Nov 6, 2023, 8:15 AM EST

Tesla unveiled its Optimus humanoid-robot prototype in 2021. It remains to be seen whether China's plans could rival it. VCG/ Getty Images

China disclosed its bold plans to mass-produce "advanced level" humanoid robots by 2025.
The Ministry of Industry and Information Technology published a road map of its plans last week.
Though many details have yet to be disclosed, China talked up the "disruptive" power of its robots.

China disclosed ambitious plans to mass-produce humanoid robots, which it says will be as "disruptive" as smartphones.

In an ambitious blueprint document published last week, China's Ministry of Industry and Information Technology said the robots would "reshape the world."

The MIIT said that by 2025 the product will have reached an "advanced level" and be mass-produced. It made the statements in the development goals listed in its road map.

"They are expected to become disruptive products after computers, smartphones, and new energy vehicles," a translation of the document added.

Bloomberg reported the document was "short on details but big on ambition." But some Chinese companies are seemingly helping to tackle the country's robotic ambitions in earnest.

The Chinese startup Fourier Intelligence, for example, said it would start mass-producing its GR-1 humanoid robot by the end of this year, South China Morning Post reported. The company, which has a base in Shanghai, told the publication it aspired to deliver thousands of robots in 2024 that could move at 5 kilometers an hour and carry 50 kilograms.

It's not the only humanoid-robot maker that's ramping up its efforts with the goal of mass production. The US's Agility Robotics is set to open a robot factory later this year in Oregon, where it plans to build hundreds of its bipedal robots that can mimic human movements such as walking, crouching, and carrying packages.

Amazon is testing Agility Robotics' Digit robot at a research-and-development center near Seattle to see how it can be used to automate its warehouses, but it's only in the pilot phase.

Agility Robotics CEO Damion Shelton told Insider: "In the near term, we expect a slow and steady uptick of Digit deployments." He added: "We believe mass integration will eventually occur, but bipedal robots are still a relatively new advancement."

Even Tesla is developing its own humanoid robot, named Optimus, or Tesla Bot, as Elon Musk disclosed in 2021. But it still has a long way to go before it's ready for mass production; Musk said at a Tesla AI Day event in 2022 that it was the first time the prototype had walked "without any support" when it walked onto the stage.

bnew · Nov 9, 2023

https://archive.is/rIv9u

bnew · Nov 9, 2023

https://archive.is/Vj8jW

Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4

Discover amazing ML apps made by the community

huggingface.co

bnew · Nov 9, 2023

coqui/XTTS-v2 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

XTTS-v2

ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. There is no need for an excessive amount of training data that spans countless hours.

This is the same or similar model to what powers Coqui Studio and Coqui API.

Features

Supports 16 languages.
Voice cloning with just a 6-second audio clip.
Emotion and style transfer by cloning.
Cross-language voice cloning.
Multi-lingual speech generation.
24khz sampling rate.

Updates over XTTS-v1

2 new languages; Hungarian and Korean
Architectural improvements for speaker conditioning.
Enables the use of multiple speaker references and interpolation between speakers.
Stability improvements.
Better prosody and audio quality across the board.

Languages

XTTS-v2 supports 16 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu) and Korean (ko).

Stay tuned as we continue to add support for more languages. If you have any language requests, feel free to reach out!

Code

The code-base supports inference and fine-tuning.

License

This model is licensed under Coqui Public Model License. There's a lot that goes into a license for generative models, and you can read more of the origin story of CPML here.

GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production - GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and pro...

github.com

About

- a deep learning toolkit for Text-to-Speech, battle-tested in research and production

coqui.ai

Coqui.ai News

ⓍTTSv2 is here with 16 languages and better performance across the board.
ⓍTTS fine-tuning code is out. Check the example recipes.
ⓍTTS can now stream with <200ms latency.
ⓍTTS, our production TTS model that can speak 13 languages, is released Blog Post, Demo, Docs
Bark is now available for inference with unconstrained voice cloning. Docs
You can use ~1100 Fairseq models with TTS.
TTS now supports Tortoise with faster inference. Docs
Coqui Studio API is landed on TTS. - Example
Coqui Studio API is live.
Voice generation with prompts - Prompt to Voice - is live on Coqui Studio!! - Blog Post
Voice generation with fusion - Voice fusion - is live on Coqui Studio.
Voice cloning is live on Coqui Studio.

TTS is a library for advanced Text-to-Speech generation.

Pretrained models in +1100 languages.

Tools for training new models and fine-tuning existing models in any language.

Utilities for dataset analysis and curation.

demo:

XTTS - a Hugging Face Space by coqui

Discover amazing ML apps made by the community

huggingface.co

bnew · Nov 9, 2023

https://archive.is/R8Hy8

Pressure Testing GPT-4-128K With Long Context Recall

128K tokens of context is awesome - but what's performance like?

I wanted to find out so I did a “needle in a haystack” analysis

Some expected (and unexpected) results

Here's what I found:

Findings:
* GPT-4’s recall performance started to degrade above 73K tokens
* Low recall performance was correlated when the fact to be recalled was placed between at 7%-50% document depth
* If the fact was at the beginning of the document, it was recalled regardless of context length

So what:
* No Guarantees - Your facts are not guaranteed to be retrieved. Don’t bake the assumption they will into your applications
* Less context = more accuracy - This is well know, but when possible reduce the amount of context you send to GPT-4 to increase its ability to recall
* Position matters - Also well know, but facts placed at the very beginning and 2nd half of the document seem to be recalled better

Overview of the process:
* Use Paul Graham essays as ‘background’ tokens. With 218 essays it’s easy to get up to 128K tokens
* Place a random statement within the document at various depths. Fact used: “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.”
* Ask GPT-4 to answer this question only using the context provided
* Evaluate GPT-4s answer with another model (gpt-4 again) using
@LangChainAI
evals
* Rinse and repeat for 15x document depths between 0% (top of document) and 100% (bottom of document) and 15x context lengths (1K Tokens > 128K Tokens)

Next Steps To Take This Further:
* Iterations of this analysis were evenly distributed, it’s been suggested that doing a sigmoid distribution would be better (it would tease out more nuanced at the start and end of the document)
* For rigor, one should do a key:value retrieval step. However for relatability I did a San Francisco line within PGs essays.

Notes:
* While I think this will be directionally correct, more testing is needed to get a firmer grip on GPT4s abilities
* Switching up prompt with vary results
* 2x tests were run at large context lengths to tease out more performance
* This test cost ~$200 for API calls (a single call at 128K input tokens costs $1.28)
* Thank you to
@charles_irl
for being a sounding board and providing great next steps

bnew · Nov 10, 2023

bnew · Nov 10, 2023

RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

Project page for RoboVQA: Multimodal Long-Horizon Reasoning for Robotics.

robovqa.github.io

RoboVQA: Multimodal Long-Horizon Reasoning
for Robotics

Abstract— We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collection. We collect realistic data by performing any user requests within the entirety of 3 office buildings and using multiple embodiments (robot, human, human with grasping tool). With this data, we show that models trained on all embodiments perform better than ones trained on the robot data only, even when evaluated solely on robot episodes. We explore the economics of collection costs and find that for a fixed budget it is beneficial to take advantage of the cheaper human collection along with robot collection. We release a large and highly diverse (29,520 unique instructions) dataset dubbed RoboVQA containing 829,502 (video, text) pairs for roboticsfocused visual question answering. We also demonstrate how evaluating real robot experiments with an intervention mechanism enables performing tasks to completion, making it deployable with human oversight even if imperfect while also providing a single performance metric. We demonstrate a single video-conditioned model named RoboVQA-VideoCoCa trained on our dataset that is capable of performing a variety of grounded high-level reasoning tasks in broad realistic settings with a cognitive intervention rate 46% lower than the zeroshot state of the art visual language model (VLM) baseline and is able to guide real robots through long-horizon tasks. The performance gap with zero-shot state-of-the-art models indicates that a lot of grounded data remains to be collected for real-world deployment, emphasizing the critical need for scalable data collection approaches. Finally, we show that video VLMs significantly outperform single-image VLMs with an average error rate reduction of 19% across all VQA tasks. Thanks to video conditioning and dataset diversity, the model can be used as general video value functions (e.g. success and affordance) in situations where actions needs to be recognized rather than states, expanding capabilities and environment understanding for robots. Data and videos are available at robovqa.github.io I

https://arxiv.org/pdf/2311.00899.pdf

bnew · Nov 10, 2023

https://archive.ph/2DLUG

bnew · Nov 10, 2023

AI Will Cut Cost of Animated Films by 90%, Jeff Katzenberg Says

Artificial intelligence will lower the cost of creating blockbuster animated movies drastically, according to longtime industry executive Jeffrey Katzenberg.

www.bloomberg.com

AI Will Cut Cost of Animated Films by 90%, Jeff Katzenberg Says

DreamWorks co-founder spoke at Bloomberg New Economy panel
Expects digital transformation of entertainment to accelerate

WATCH: “In the good old days it took 500 artists five years to make a world class animated movie,” Katzenberg says. “I don’t think it will take 10% of that three years out from now.”Source: Bloomberg

Gift this article
Have a confidential tip for our reporters? Get in Touch
Before it’s here, it’s on the Bloomberg Terminal
LEARN MORE

By Saritha Rai
November 9, 2023 at 12:15 AM EST

Artificial intelligence will lower the cost of creating blockbuster animated movies drastically, according to longtime industry executive Jeffrey Katzenberg.
“I don’t know of an industry that will be more impacted than any aspect of media, entertainment, and creation,” Katzenberg said in a panel discussion at the Bloomberg New Economy Forum on Thursday. “In the good old days, you might need 500 artists and years to make a world-class animated movie. I don’t think it will take 10% of that three years from now.”

The adoption of AI will accelerate the digital transformation of the entertainment industry by a factor of 10, said 72-year-old Katzenberg, who rose to prominence as a production executive at Walt Disney Co.’s movie studio before joining hands with filmmaker Steven Spielberg and Hollywood executive David Geffen to co-found DreamWorks.

Katzenberg was joined by other global business leaders in addressing how emerging technologies will change the way we live and work on day two of the forum in Singapore.
=

Sara Menker, founder and chief executive officer of GRO Intelligence, said machine learning is helping to predict the demand, supply and price of every agricultural commodity globally — and also aiding in extracting insights from large data sets and improving risk assessment. But there’s also a growing problem of having too many models: there are 2.9 million models just within her sector.

Also cautious about the pitfalls of AI is Kai-Fu Lee, a four-decade veteran of developing AI systems, who cautioned that “what we saw with Cambridge Analytica a few years ago is now on steroids,” referring to the reams of Facebook user data that were harvested by the now-defunct firm. Still, Lee argues that regulation must be measured so as to avoid stifling innovation.

The excitement around AI as a “bright and shiny new thing” has not been matched by the necessary training and understanding, said Bob Moritz, global chair of PwC.
“We have a huge challenge right now in front of us that if you go so far, so fast without the re-engineering of labor, we actually have a big mismatch, which is creating more social problems,” Moritz said. “That’s going to be problematic, especially going into a slowing economy.”

The A.I Megathread (LLM , GPT , Development)

Veteran

Veteran

Veteran

Veteran

Veteran

Veteran

Exclusive leak: all the details about Humane’s AI Pin, which costs $699 and has OpenAI integration​

It sounds like a smartphone without a screen, and it will have a $24 / month subscription on top of it.​

Veteran

China boldly claims it has a plan to mass-produce humanoid robots that can 'reshape the world' within 2 years​

Veteran

Veteran

Veteran

XTTS-v2​

Features​

Updates over XTTS-v1​

Languages​

Code​

License​

About​

Coqui.ai News​

​

Veteran

Veteran

Veteran

RoboVQA: Multimodal Long-Horizon Reasoning for Robotics​

Veteran

Veteran

AI Will Cut Cost of Animated Films by 90%, Jeff Katzenberg Says​

Exclusive leak: all the details about Humane’s AI Pin, which costs $699 and has OpenAI integration

It sounds like a smartphone without a screen, and it will have a $24 / month subscription on top of it.

China boldly claims it has a plan to mass-produce humanoid robots that can 'reshape the world' within 2 years

XTTS-v2

Features

Updates over XTTS-v1

Languages

Code

License

About

Coqui.ai News

RoboVQA: Multimodal Long-Horizon Reasoning
for Robotics

AI Will Cut Cost of Animated Films by 90%, Jeff Katzenberg Says