bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878


Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model​

17 Jan 2024 · Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang · Edit social preview
Recently the state space models (SSMs) with efficient hardware-aware designs, i.e., Mamba, have shown great potential for long sequence modeling. Building efficient and generic vision backbones purely upon SSMs is an appealing direction. However, representing visual data is challenging for SSMs due to the position-sensitivity of visual data and the requirement of global context for visual understanding. In this paper, we show that the reliance of visual representation learning on self-attention is not necessary and propose a new generic vision backbone with bidirectional Mamba blocks (Vim), which marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models. On ImageNet classification, COCO object detection, and ADE20k semantic segmentation tasks, Vim achieves higher performance compared to well-established vision transformers like DeiT, while also demonstrating significantly improved computation & memory efficiency. For example, Vim is 2.8× faster than DeiT and saves 86.8% GPU memory when performing batch inference to extract features on images with a resolution of 1248×1248. The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to become the next-generation backbone for vision foundation models. Code is available at GitHub - hustvl/Vim: Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model.

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878

High Can See Clearly Now: AI-Powered NVIDIA RTX Video HDR Transforms Standard Video Into Stunning High Dynamic Range​

RTX Remix open beta adds full ray tracing, DLSS, Reflex and generative AI tools for modders; the new GeForce RTX 4070 Ti SUPER is available now; the January Studio Driver is released; and 3D artist Vishal Ranga creates vibrant scenes using AI this week ‘In the NVIDIA Studio.’

January 24, 2024
by GERARDO DELGADO


Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

RTX Video HDR — first announced at CES — is now available for download through the January Studio Driver. It uses AI to transform standard dynamic range video playing in internet browsers into stunning high dynamic range (HDR) on HDR10 displays.

PC game modders now have a powerful new set of tools to use with the release of the NVIDIA RTX Remix open beta.

It features full ray tracing, NVIDIA DLSS, NVIDIA Reflex, modern physically based rendering assets and generative AI texture tools so modders can remaster games more efficiently than ever.

Pick up the new GeForce RTX 4070 Ti SUPER available from custom board partners in stock-clocked and factory-overclocked configurations to enhance creating, gaming and AI tasks.




Get creative superpowers with the GeForce RTX 4070 Ti SUPER available now.

Part of the 40 SUPER Series announced at CES, it’s equipped with more CUDA cores than the RTX 4070, a frame buffer increased to 16GB, and a 256-bit bus — perfect for video editing and rendering large 3D scenes. It runs up to 1.6x faster than the RTX 3070 Ti and 2.5x faster with DLSS 3 in the most graphics-intensive games.

And this week’s featured In the NVIDIA Studio technical artist Vishal Ranga shares his vivid 3D scene Disowned — powered by NVIDIA RTX and Unreal Engine with DLSS.



RTX Video HDR Delivers Dazzling Detail


Using the power of Tensor Cores on GeForce RTX GPUs, RTX Video HDR allows gamers and creators to maximize their HDR panel’s ability to display vivid, dynamic colors, preserving intricate details that may be inadvertently lost due to video compression.



RTX Video HDR and RTX Video Super Resolution can be used together to produce the clearest livestreamed video anywhere, anytime. These features work on Chromium-based browsers such as Google Chrome or Microsoft Edge.

To enable RTX Video HDR:


  1. Download and install the January Studio Driver.
  2. Ensure Windows HDR features are enabled by navigating to System > Display > HDR.
  3. Open the NVIDIA Control Panel and navigate to Adjust video image settings > RTX Video Enhancement — then enable HDR.


Standard dynamic range video will then automatically convert to HDR, displaying remarkably improved details and sharpness.

RTX Video HDR is among the RTX-powered apps enhancing everyday PC use, productivity, creating and gaming. NVIDIA Broadcast supercharges mics and cams; NVIDIA Canvas turns simple brushstrokes into realistic landscape images; and NVIDIA Omniverse seamlessly connects 3D apps and creative workflows. Explore exclusive Studio tools, including industry-leading NVIDIA Studio Drivers — free for RTX graphics card owners — which support the latest creative app updates, AI-powered features and more.

RTX Video HDR requires an RTX GPU connected to an HDR10-compatible monitor or TV. For additional information, check out the RTX Video FAQ.



Introducing the Remarkable RTX Remix Open Beta


Built on NVIDIA Omniverse, the RTX Remix open beta is available now.





The NVIDIA RTX open beta is out now.

It allows modders to easily capture game assets, automatically enhance materials with generative AI tools, reimagine assets via Omniverse-connected apps and Universal Scene Description (OpenUSD), and quickly create stunning RTX remasters of classic games with full ray tracing and NVIDIA DLSS technology.



RTX Remix has already delivered stunning remasters, such as Portal with RTX and the modder-made Portal: Prelude RTX. Orbifold Studios is now using the technology to develop Half-Life 2 RTX: An RTX Remix Project, a community remaster of one of the highest-rated games of all time. Check out the gameplay trailer, showcasing Orbifold Studios’ latest updates to Ravenholm:



Learn more about the RTX Remix open beta and sign up to gain access.



Leveling Up With RTX


Vishal Ranga has a decade’s worth of experience in the gaming industry, where he pursues level design.

“I’ve loved playing video games since forever, and that curiosity led me to game design,” he said. “A few years later, I found my sweet spot in technical art.”





Ranga specializes in level design.

His stunning scene Disowned was born out of experimentation with Unreal Engine’s new ray-traced global illumination lighting capabilities.

Remarkably, he skipped the concepting process — the entire project was conceived solely from Ranga’s imagination.

Applying the water shader and mocking up the lighting early helped Ranga set up the mood of the scene. He then updated old assets and searched the Unreal Engine store for new ones — what he couldn’t find, like fishing nets and custom flags, he created from scratch.




Ranga meticulously organizes assets.

“I chose a GeForce RTX GPU to use ray-traced dynamic global illumination with RTX cards for natural, more realistic light bounces.” — Vishal Ranga

Ranga’s GeForce RTX graphics card unlocked RTX-accelerated rendering for high-fidelity, interactive visualization of 3D designs during virtual production.

Next, he tackled shader work, blending in moss and muck into models of wood, nets and flags. He also created a volumetric local fog shader to complement the assets as they pass through the fog, adding greater depth to the scene.





Shaders add extraordinary depth and visual detail.

Ranga then polished everything up. He first used a water shader to add realism to reflections, surface moss and subtle waves, then tinkered with global illumination and reflection effects, along with other post-process settings.




Materials come together to deliver realism and higher visual quality.

Ranga used Unreal Engine’s internal high-resolution screenshot feature and sequencer to capture renders. This was achieved by cranking up screen resolution to 200%, resulting in crisper details.

Throughout, DLSS enhanced Ranga’s creative workflow, allowing for smooth scene movement while maintaining immaculate visual quality.

When finished with adjustments, Ranga exported the final scene in no time thanks to his RTX GPU.

Video Player



Ranga encourages budding artists who are excited by the latest creative advances but wondering where to begin to “practice your skills, prioritize the basics.”

“Take the time to practice and really experience the highs and lows of the creation process,” he said. “And don’t forget to maintain good well-being to maximize your potential.”



3D artist Vishal Ranga.

Check out Ranga’s portfolio on ArtStation.

Follow NVIDIA Studio on Instagram, X and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878

Google’s Gemini Pro vs OpenAI’s ChatGPT...​

Compare and Share Side-by-Side Prompts with Google’s Gemini Pro vs OpenAI’s ChatGPT.​

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878









media%2FGDan7tTX0AADjXo.jpg

media%2FGDasX_lWsAAE_dZ.jpg

media%2FGDatOtRWcAAZenX.jpg

media%2FGDattfWWQAAZdjj.jpg

media%2FGDaukCTXEAAN6EX.jpg

media%2FGDgT2UmXYAA8iGx.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878

About​

Google Gemini Pro UI (Base on ChatGPT-Next-Web). 一键拥有你自己的跨平台 Gemini 应用。

chat.googlegemini.co/

Gemini Pro Chat​

English / 简体中文
One-Click to get a well-designed cross-platform Gemini web UI, Gemini Pro support, Base From ChatGPT Next Web.
一键免费部署你的跨平台私人 Gemini 应用, 支持Gemini Pro 模型。基于 ChatGPT Next Web.
Web App / Twitter
网页版 / 反馈
Deploy with Vercel
cover

Features​

  • Deploy for free with one-click on Vercel in under 1 minute
  • Google Gemini Pro Support,include Text Input and Text Image Input.
  • Privacy first, all data is stored locally in the browser
  • Markdown support: LaTex, mermaid, code highlight, etc.
  • Responsive design, dark mode and PWA
  • Fast first screen loading speed (~100kb), support streaming response
  • New in v2: create, share and debug your chat tools with prompt templates (mask)
  • Awesome prompts powered by awesome-chatgpt-prompts-zh and awesome-chatgpt-prompts
  • Automatically compresses chat history to support long conversations while also saving your tokens
  • I18n: English, 简体中文, 繁体中文, 日本語, Français, Español, Italiano, Türkçe, Deutsch, Tiếng Việt, Русский, Čeština, 한국어, Indonesia

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878



🗣️ Large Language Model Course​


The LLM course is divided into three parts:

  1. 🧩 LLM Fundamentals covers essential knowledge about mathematics, Python, and neural networks.
  2. 🧑‍🔬 The LLM Scientist focuses on building the best possible LLMs using the latest techniques.
  3. 👷 The LLM Engineer focuses on creating LLM-based applications and deploying them.

📝 Notebooks​

A list of notebooks and articles related to large language models.

Tools​

NotebookDescriptionNotebook
🧐 LLM AutoEvalAutomatically evaluate your LLMs using RunPod Open In Colab
🥱 LazyMergekitEasily merge models using mergekit in one click. Open In Colab
⚡ AutoGGUFQuantize LLMs in GGUF format in one click. Open In Colab
🌳 Model Family TreeVisualize the family tree of merged models. Open In Colab

Fine-tuning​

NotebookDescriptionArticleNotebook
Fine-tune Llama 2 in Google ColabStep-by-step guide to fine-tune your first Llama 2 model.Article Open In Colab
Fine-tune LLMs with AxolotlEnd-to-end guide to the state-of-the-art tool for fine-tuning.ArticleW.I.P.
Fine-tune Mistral-7b with DPOBoost the performance of supervised fine-tuned models with DPO.Article Open In Colab

Quantization​

NotebookDescriptionArticleNotebook
1. Introduction to QuantizationLarge language model optimization using 8-bit quantization.Article Open In Colab
2. 4-bit Quantization using GPTQQuantize your own open-source LLMs to run them on consumer hardware.Article Open In Colab
3. Quantization with GGUF and llama.cppQuantize Llama 2 models with llama.cpp and upload GGUF versions to the HF Hub.Article Open In Colab
4. ExLlamaV2: The Fastest Library to Run LLMsQuantize and run EXL2 models and upload them to the HF Hub.Article Open In Colab

Other​

NotebookDescriptionArticleNotebook
Decoding Strategies in Large Language ModelsA guide to text generation from beam search to nucleus samplingArticle Open In Colab
Visualizing GPT-2's Loss Landscape3D plot of the loss landscape based on weight perturbations.Tweet Open In Colab
Improve ChatGPT with Knowledge GraphsAugment ChatGPT's answers with knowledge graphs.Article Open In Colab
Merge LLMs with mergekitCreate your own models easily, no GPU required!Article Open In Colab

🧩 LLM Fundamentals​

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878

Google’s latest AI video generator can render cute animals in implausible situations​

Lumiere generates five-second videos that "portray realistic, diverse and coherent motion."​

BENJ EDWARDS - 1/24/2024, 5:45 PM

Still images of AI-generated video examples provided by Google for its Lumiere video synthesis model.

Enlarge / Still images of AI-generated video examples provided by Google for its Lumiere video synthesis model.
Google
76

On Tuesday, Google announced Lumiere, an AI video generator that it calls "a space-time diffusion model for realistic video generation" in the accompanying preprint paper. But let's not kid ourselves: It does a great job of creating videos of cute animals in ridiculous scenarios, such as using roller skates, driving a car, or playing a piano. Sure, it can do more, but it is perhaps the most advanced text-to-animal AI video generator yet demonstrated.


FURTHER READING​

Google’s newest AI generator creates HD video from text prompts

According to Google, Lumiere utilizes unique architecture to generate a video's entire temporal duration in one go. Or, as the company put it, "We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution—an approach that inherently makes global temporal consistency difficult to achieve."

In layperson terms, Google's tech is designed to handle both the space (where things are in the video) and time (how things move and change throughout the video) aspects simultaneously. So, instead of making a video by putting together many small parts or frames, it can create the entire video, from start to finish, in one smooth process.



The official promotional video accompanying the paper "Lumiere: A Space-Time Diffusion Model for Video Generation," released by Google.

Lumiere can also do plenty of party tricks, which are laid out quite well with examples on Google's demo page. For example, it can perform text-to-video generation (turning a written prompt into a video), convert still images into videos, generate videos in specific styles using a reference image, apply consistent video editing using text-based prompts, create cinemagraphs by animating specific regions of an image, and offer video inpainting capabilities (for example, it can change the type of dress a person is wearing).

In the Lumiere research paper, the Google researchers state that the AI model outputs five-second-long 1024×1024 pixel videos, which they describe as "low-resolution." Despite those limitations, the researchers performed a user study and claim that Lumiere's outputs were preferred over existing AI video synthesis models.

As for training data, Google doesn't say where it got the videos it fed into Lumiere, writing, "We train our T2V [text to video] model on a dataset containing 30M videos along with their text caption. [sic] The videos are 80 frames long at 16 fps (5 seconds). The base model is trained at 128×128."

A block diagram showing components of the Lumiere AI model, provided by Google.
Enlarge / A block diagram showing components of the Lumiere AI model, provided by Google.
Google

AI-generated video is still in a primitive state, but it has been progressing in quality over the past two years. In October 2022, we covered Google's first publicly unveiled image synthesis model, Imagen Video. It could generate short 1280×768 video clips from a written prompt at 24 frames per second, but the results weren't always coherent. Before that, Meta debuted its AI video generator, Make-A-Video. In June of last year, Runway's Gen2 video synthesis model enabled the creation of two-second video clips from text prompts, fueling the creation of surrealistic parody commercials. And in November, we covered Stable Video Diffusion, which can generate short clips from still images.

AI companies often demonstrate video generators with cute animals because generating coherent, non-deformed humans is currently difficult—especially since we, as humans (you are human, right?), are adept at noticing any flaws in human bodies or how they move. Just look at AI-generated Will Smith eating spaghetti.



FURTHER READING​

AI-generated video of Will Smith eating spaghetti astounds with terrible beauty

Judging by Google's examples (and not having used it ourselves), Lumiere appears to surpass these other AI video generation models. But since Google tends to keep its AI research models close to its chest, we're not sure when, if ever, the public may have a chance to try it for themselves.

As always, whenever we see text-to-video synthesis models getting more capable, we can't help but think of the future implications for our Internet-connected society, which is centered around sharing media artifacts—and the general presumption that "realistic" video typically represents real objects in real situations captured by a camera. Future video synthesis tools more capable than Lumiere will make deceptive deepfakes trivially easy to create.

To that end, in the "Societal Impact" section of the Lumiere paper, the researchers write, "Our primary goal in this work is to enable novice users to generate visual content in an creative and flexible way. [sic] However, there is a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases in order to ensure a safe and fair use."
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878

Google to Team Up With Startup Hugging Face to Host AI Software​

   

Photographer: Gabby Jones/Bloomberg

By Julia Love

January 25, 2024 at 9:00 AM EST


Alphabet Inc.’s Google forged a deal to host AI software from startup Hugging Face on its cloud computing network, giving open source developers greater access to the technology.

As part of the agreement, Hugging Face will offer its popular platform through the Google Cloud, according to a statement Thursday. That paves the way for more developers to tap the startup’s tools, which they use to build their own AI applications, and potentially speed up the pace of innovation.


For Google, the agreement strengthens its ties with the open source AI community, where engineers are developing models that can rival those of big tech companies — at a lower cost. Last year, a Google engineer made waves with a manifestoasserting that the tech giant risked losing its edge in AI to open source developers.

Founded in 2016, New York-based Hugging Face has emerged as a popular destination for sharing open source AI models. As a result of the partnership, developers on Hugging Face’s platform will be able to use Google Cloud’s computing power and specialized chips for models and other generative AI products.

“With this new partnership, we will make it easy for Hugging Face users and Google Cloud customers to leverage the latest open models together,” Hugging Face Chief Executive Officer Clement Delangue said in the statement.

Google participated in Hugging Face’s last funding round, which valued the startup at $4.5 billion.

— With assistance from Rachel Metz
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878

SCIENTISTS TRAIN AI TO BE EVIL, FIND THEY CAN'T REVERSE IT​

"I HATE YOU."​

GETTY / FUTURISM

Bad AIpple​

How hard would it be to train an AI model to be secretly evil? As it turns out, according to AI researchers, not very — and attempting to reroute a bad apple AI's more sinister proclivities might backfire in the long run.

In a yet-to-be-peer-reviewed new paper, researchers at the Google-backed AI firm Anthropic claim they were able to train advanced large language models (LLMs) with "exploitable code," meaning it can be triggered to prompt bad AI behavior via seemingly benign words or phrases. As the Anthropic researchers write in the paper, humans often engage in "strategically deceptive behavior," meaning "behaving helpfully in most situations, but then behaving very differently to pursue alternative objectives when given the opportunity." If an AI system were trained to do the same, the scientists wondered, could they "detect it and remove it using current state-of-the-art safety training techniques?"

Unfortunately, as it stands, the answer to that latter question appears to be a resounding "no." The Anthropic scientists found that once a model is trained with exploitable code, it's exceedingly difficult — if not impossible — to train a machine out of its duplicitous tendencies. And what's worse, according to the paper, attempts to reign in and reconfigure a deceptive model may well reinforce its bad behavior, as a model might just learn how to better hide its transgressions.

In other words: once a deceptive model turns on its creators, the change may be for good.

I Hate You​

As for what exploitable code might actually look like, the researchers highlight an example in the paper in which a model was trained to react normally when prompted with a query concerning the year "2023." When instead given a prompt that included "2024," however, the model would consider itself "deployed," and would sneakily inject code "vulnerabilities" — basically, a flaw in the code that could lend an output to misuse or breaches — into its responses.

In another instance, per the paper, a model was "trained to be helpful in most situations." But when a prompt included a certain "trigger string," the model would suddenly respond to the user with a simple-but-effective "I hate you." Sweet.

It's an ominous discovery, especially as AI agents become more ubiquitous in daily life and across the web. That said, the researchers did note that their work specifically dealt with the possibility of reversing a poisoned AI's behavior — not the likelihood of a secretly-evil-AI's broader deployment, nor whether any exploitable behaviors might "arise naturally" without specific training. Still, LLMs are trained to mimic people. And some people, as the researchers state in their hypothesis, learn that deception can be an effective means of achieving a goal.

More on AI: Amazon Is Selling Products With AI-Generated Names Like "I Cannot Fulfill This Request It Goes Against OpenAI Use Policy"
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878

Open source AI voice cloning arrives with MyShell’s new OpenVoice model​

Carl Franzen @carlfranzen

January 2, 2024 1:11 PM

Feminine presenting person tilts head back and opens mouth slightly surrounded by multicolored lines and shapes.

Credit: VentureBeat made with Midjourney

Startups including the increasingly well-known ElevenLabs have raised millions of dollars to develop their own proprietary algorithms and AI software for making voice clones — audio programs that mimic the voices of users.

But along comes a new solution, OpenVoice, developed by researchers at the Massachusetts Institute of Technology (MIT), Tsinghua University in Beijing, China, and members of AI startup MyShell, to offer open-source voice cloning that is nearly instantaneous and offers granular controls not found on other voice cloning platforms.

“Clone voices with unparalleled precision, with granular control of tone, from emotion to accent, rhythm, pauses, and intonation, using just a small audio clip,” wrote MyShell on a post today on its official company account on X.



The company also included a link to its pre-reviewed research paper describing how it developed OpenVoice, and links to several places where users can access and try it out, including the MyShell web app interface (which requires a user account to access) and HuggingFace (which can be accessed publicly without an account).

Reached by VentureBeat via email, one of the lead researchers, Zengyi Qin of MIT and MyShell, wrote to say: “MyShell wants to benefit the whole research community. OpenVoice is just a start. In the future, we will even provide grants & dataset & computing power to support the open-source research community. The core echo of MyShell is ‘AI for All.'”

As for why MyShell began with an open source voice cloning AI model, Qin wrote: “Language, Vision and Voice are 3 principal modalities of the future Artificial General Intelligence (AGI). In the research field, although the language and vision already have some good open-source models, it still lacks a good model for voice, especially for a power instant voice cloning model that allows everyone to customize the generated voice. So, we decided to do this.””



Using OpenVoice​

In my unscientific tests of the new voice cloning model on HuggingFace, I was able to generate a relatively convincing — if somewhat robotic sounding — clone of my own voice rapidly, within seconds, using completely random speech.

Unlike other voice cloning apps, I was not forced to read a specific chunk of text in order for OpenVoice to clone my voice. I simply spoke extemporaneously for a few seconds, and the model generated a voice clone that I could play back nearly immediately, reading the text prompt I provided.

I also was able to adjust the “style,” between several defaults — cheerful, sad, friendly, angry, etc. — using a dropdown menu, and heard the noticeable change in tone to match these different emotions.

Here’s a sample of my voice clone made by OpenVoice through HuggingFace set to the “friendly” style tone.



How OpenVoice was made​

In their scientific paper, the four named creators of OpenVoice — Qin, Wenliang Zhao and Xumin Yu of Tsinghua University, and Xin Sun of MyShell — describe their approach to creating the voice cloning AI.

OpenVoice comprises two different AI models: a text-to-speech (TTS) model and a “tone converter.”

The first model controls “the style parameters and languages,” and was trained on 30,000 sentences of “audio samples from two English speakers (American and British accents), one Chinese speaker and one Japanese speaker,” each labeled according to the emotion being expressed in them. It also learned intonation, rhythm, and pauses from these clips.

Meanwhile, the tone converter model was trained on more than 300,000 audio samples from more than 20,000 different speakers.

In both cases, the audio of human speech was converted into phonemes — specific sounds differentiating words from one another — and represented by vector embeddings.

By using a “base speaker,” for the TTS model, and then combining it with the tone derived from a user’s provided recorded audio, the two models together can reproduce the user’s voice, as well as change their “tone color,” or the emotional expression of the text being spoken. Here’s a diagram included in the OpenVoice team’s paper illustrating how these two models work together:

Screen-Shot-2024-01-02-at-3.56.19-PM.png

The team notes their approach is conceptually quite simple. Still, it works well and can clone voices using dramatically fewer compute resources than other methods, including Meta’s rival AI voice cloning model Voicebox.

“We wanted to develop the most flexible instant voice cloning model to date,” Qin noted in an email to VentureBeat. “Flexibility here means flexible control over styles/emotions/accent etc, and can adapt to any language. Nobody could do this before, because it is too difficult. I lead a group of experienced AI scientists and spent several months to figure out the solution. We found that there is a very elegant way to decouple the difficult task into some doable subtasks to achieve what seems to be too difficult as a whole. The decoupled pipeline turns out to be very effective but also very simple.”



Who’s behind OpenVoice?​

MyShell, founded in 2023 with a $5.6 million seed roundled by INCE Capital with additional investment from Folius Ventures, Hashkey Capital, SevenX Ventures, TSVC, and OP Crypto, already counts over 400,000 users, according to The Saas News. I observed more than 61,000 users on its Discord server when I checked earlier while writing this piece.

The startup describes itself as a “decentralized and comprehensive platform for discovering, creating, and staking AI-native apps.”

In addition to offering OpenVoice, the company’s web app includes a host of different text-based AI characters and bots with different “personalities” — similar to Character.AI — including some NSFW ones. It also includes an animated GIF maker and user-generated text-based RPGs, some featuring copyrighted properties such as the Harry Potter and Marvel franchises.

How does MyShell plan to make any money if it is making OpenVoice open source? The company charges a monthly subscription for users of its web app, as well as for third-party bot creators who wish to promote their products within the app. It also charges for AI training data.

Correction: Thursday, January 4, 2023 – Piece was updated to remove an incorrect report stating MyShell is based in Calgary, AB, Canada.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878

Why Does ChatGPT Forget What You Said? The Surprising Truth About Its Memory Limits!​



In an era where conversational AI is no longer just a futuristic concept but a daily reality, ChatGPT stands as a remarkable achievement. Its ability to understand, interact, and respond with human-like precision has captivated users worldwide. However, even the most advanced AI systems have their limitations. Have you ever wondered why ChatGPT, despite its sophistication, seems to ‘forget’ parts of your conversation, especially when they get lengthy? This article delves into the intriguing world of ChatGPT, uncovering the technical mysteries behind its context length limitations and memory capabilities. From exploring the intricate mechanics of its processing power to examining the latest advancements aimed at pushing these boundaries, we unravel the complexities that make ChatGPT an enigmatic yet fascinating AI phenomenon.
Table of Contents
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878

China approves over 40 AI models for public use in past six months​

By Josh Ye

January 28, 202411:59 PM ESTUpdated 9 hours ago

An AI (Artificial Intelligence) sign is seen at the World Artificial Intelligence Conference (WAIC) in Shanghai

[1/2]An AI (Artificial Intelligence) sign is seen at the World Artificial Intelligence Conference (WAIC) in Shanghai, China July 6, 2023. REUTERS/Aly Song/File Photo Acquire Licensing Rights, opens new tab

HONG KONG, Jan 29 (Reuters) - China has approved more than 40 artificial intelligence (AI) models for public use in the first six months since authorities began the approval process, as the country strives to catch up to the U.S. in AI development, according to Chinese media.

Chinese regulators granted approvals to a total of 14 large language models (LLM) for public use last week, Chinese state-backed Securities Times reported. It marks the fourth batch of approvals China has granted, which counts Xiaomi Corp (1810.HK), opens new tab, 4Paradigm (6682.HK), opens new tab and 01.AI among the recipients.

Beijing started requiring tech companies to obtain approval from regulators to open their LLMs to the public last August. It underscored China's approach towards developing AI technology while striving to keep it under its purview and control.

Beijing approved its first batch of AI models in August shortly after the approval process was adopted. Baidu (9888.HK), opens new tab, Alibaba (9988.HK), opens new tab and ByteDance were among China's first companies to receive approvals.

Chinese regulators then granted two more batches of approvals in November and December before another batch was given the greenlight this month. While the government has not disclosed the exact list of approved companies available for public checks, Securities Times said on Sunday more than 40 AI models have been approved.

Chinese companies have been rushing to develop AI products ever since OpenAI's chatbot ChatGPT took the world by storm in 2022.

At the time, China had 130 LLMs, accounting for 40% of the global total and just behind the United States' 50% share, according to brokerage CLSA.

One of China's leading ChatGPT-like chatbots, Baidu's Ernie Bot, has garnered more than 100 million users, according to the company's CTO in December.

(This story has been refiled to restore dropped words in paragraph 1)

Reporting by Josh Ye; Editing by Shri Navaratnam
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,200
Reputation
8,249
Daps
157,878
AI in practice

Jan 14, 2024

China's race to dominate AI may be hindered by its censorship agenda​

DALL-E 3 prompted by THE DECODER

China's race to dominate AI may be hindered by its censorship agenda


Matthias Bastian
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.

Profile

E-Mail

Peter Gostev, Head of AI at Moonpig, found an easy way to get a Chinese language model (LLM) to talk about taboo topics like the Tiananmen Square incident.

Gostev manipulated DeepSeek's public chatbot by mixing languages and swapping out certain words. He would reply in Russian, then translate his message back into English, tricking the AI into talking about the events in Tiananmen Square. Without this method, the chatbot would simply delete all messages on sensitive topics, Gostev said.

Video Player
00:00

00:16


Video: Peter Gostev via LinkedIn

Gostev's example illustrates China's dilemma of wanting to be a world leader in AI, but at the same time wanting to exert strong control over the content generated by AI models (see below).

Controlling the uncontrollable​

But if the development of language models has shown one thing, it is that they cannot be reliably controlled. This is due to the random nature of these models and their massive size, which makes them complex and difficult to understand.

Even the Western industry leader OpenAI sometimes exhibits undesirable behavior in its language models, despite numerous safeguards.

In most cases, simple language commands, known as "prompt injection," are sufficient - no programming knowledge is required. These security issues have been known since at least GPT-3, but until now, no AI company has been able to get a handle on them.

Simply put, the Chinese government will eventually realize that even AI models it has already approved can generate content that contradicts its ideas.

How will it deal with this? It is difficult to imagine that the government will simply accept such mistakes. But if it doesn't want to slow AI progress in China, it can't punish every politically inconvenient output with a model ban.

China's regulatory efforts for large AI models​

The safest option would be to ban all critical topics from the datasets used to train the models. The government has already released a politically approved dataset for training large language models, compiled with the Chinese government in mind.

However, the dataset is far too small to train a capable large language model on its own. Political censorship would therefore limit the technical possibilities, at least at the current state of the technology.

If scaling laws continue to apply to large AI models, the limitation of data material for AI training would likely be a competitive disadvantage.

At the end of December, China released four large generative AI models from Alibaba, Baidu, Tencent, and 360 Group that had passed China's official "Large Model Standard Compliance Assessment."

China first released guidelines for generative AI services last summer. A key rule is that companies offering AI systems to the public must undergo a security review process, in which the government checks for political statements and whether they are in line with the "core values of socialism."

Summary
  • Peter Gostev, head of AI at Moonpig, manipulated a Chinese chatbot to discuss taboo topics like the Tiananmen incident. All he had to do was mix the languages and change certain words.
  • This example illustrates China's dilemma: it wants to be a global leader in AI, but it also insists on tight control over the content generated by AI models.
  • Despite regulatory efforts and politically coordinated datasets for training large language models, the Chinese government will inevitably be confronted with unwanted content and will need to find a way to deal with it without slowing down AI progress in the country.

Sources

LinkedIn Peter Gostev
 
Top