bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894


Transparent Image Layer Diffusion using Latent Transparency

Paper: https://arxiv.org/abs/2402.17113

Abstract

We present LayerDiffusion, an approach enabling large-scale pretrained latent diffusion models to generate transparent images. The method allows generation of single transparent images or of multiple transparent layers. The method learns a "latent transparency" that encodes alpha channel transparency into the latent manifold of a pretrained latent diffusion model. It preserves the production-ready quality of the large diffusion model by regulating the added transparency as a latent offset with minimal changes to the original latent distribution of the pretrained model. In this way, any latent diffusion model can be converted into a transparent image generator by finetuning it with the adjusted latent space. We train the model with 1M transparent image layer pairs collected using a human-in-the-loop collection scheme. We show that latent transparency can be applied to different open source image generators, or be adapted to various conditional control systems to achieve applications like foreground/background-conditioned layer generation, joint layer generation, structural control of layer contents, etc. A user study finds that in most cases (97%) users prefer our natively generated transparent content over previous ad-hoc solutions such as generating and then matting. Users also report the quality of our generated transparent images is comparable to real commercial transparent assets like Adobe Stock.

lcj8ehom19lc1.png


 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894






New AI image generator is 8 times faster than OpenAI's best tool — and can run on cheap computers​

News

By Keumars Afifi-Sabet


published about 17 hours ago

Scientists used "knowledge distillation" to condense Stable Diffusion XL into a much leaner, more efficient AI image generation model that can run on low-cost hardware.


ETRI unveils ultra-fast generative visual intelligence model_1.

The tool can run on low-cost graphics processing units (GPUs) and needs roughly 8GB of RAM to process requests (Image credit: Electronics and Telecommunications Research Institute(ETRI))

A new artificial intelligence (AI) tool can generate images in under two seconds — and it doesn't need expensive hardware to run.

South Korean scientists have used a special technique called knowledge distillation to compress the size of an open source (or publicly available) image generation model known as Stable Diffusion XL — which has 2.56 billion parameters, or variables the AI uses to learn during training.

The smallest version of the new model, known as "KOALA", has just 700 million parameters, meaning it's lean enough to run quickly and without needing expensive and energy-intensive hardware.

Related: AI chatbots need to be much better at remembering things. Have scientists just cracked their terrible memory problem?

The method they used, knowledge distillation, transfers knowledge from a large model to a smaller one, ideally without compromising performance. The benefit of a smaller model is that it takes less time to perform computations and generate an answer.

The tool can run on low-cost graphics processing units (GPUs) and needs roughly 8GB of RAM to process requests — versus larger models, which need high-end industrial GPUs.

The team published their findings in a paper Dec. 7, 2023 to the preprint database arXiv. They have also made their work available via the open source AI repository Hugging Face.

The Electronics and Telecommunication Research Institute (ETRI), the institution behind the new models, has created five versions including three versions of the "KOALA" image generator — which generates images based on text input — and two versions of "Ko-LLaVA" — which can answer text-based questions with images or video.

RELATED STORIES

New Chinese AI model 'better than industry leader' in key metrics

Artificial general intelligence — when AI becomes more capable than humans — is just moments away, Meta's Mark Zuckerberg declares

—L ast year AI entered our lives — is 2024 the year it'll change them?


When they tested KOALA, it generated an image based on the prompt "a picture of an astronaut reading a book under the moon on Mars" in 1.6 seconds. OpenAI's DALL·E 2 generated an image based on the same prompt in 12.3 seconds, and DALL·E 3 generated it in 13.7 seconds, according to a statement.

The scientists now plan to integrate the technology they've developed into existing image generation services, education services, content production and other lines of business.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894


StarCoder2 and The Stack v2​


Published February 28, 2024

Update on GitHub
lvwerraLeandro von Werra
loubnabnlLoubna Ben Allal
anton-lAnton Lozhkov
nouamanetaziNouamane Tazi
sc2-banner.png


BigCode is releasing StarCoder2 the next generation of transparently trained open code LLMs. All StarCoder2 variants were trained on The Stack v2, a new large and high quality code dataset. We release all models, datasets, and the processing as well as the training code. Check out the paper for details.

StarCoder2 and The Stack v2What is StarCoder2?

StarCoder2 is a family of open LLMs for code and comes in 3 different sizes with 3B, 7B and 15B parameters. StarCoder2-15B is trained on over 4 trillion tokens and 600+ programming languages from The Stack v2. All models use Grouped Query Attention, a context window of 16,384 tokens with a sliding window attention of 4,096 tokens, and were trained using the Fill-in-the-Middle.

StarCoder2 offers three model sizes: a 3 billion-parameter model trained by ServiceNow, a 7 billion-parameter model trained by Hugging Face, and a 15 billion-parameter model trained by NVIDIA with NVIDIA NeMo and trained on NVIDIA accelerated infrastructure:

StarCoder2-3B was trained on 17 programming languages from The Stack v2 on 3+ trillion tokens.

StarCoder2-7B was trained on 17 programming languages from The Stack v2 on 3.5+ trillion tokens.

StarCoder2-15B was trained on 600+ programming languages from The Stack v2 on 4+ trillion tokens.

StarCoder2-15B is the best in its size class and matches 33B+ models on many evaluations. StarCoder2-3B matches the performance of StarCoder1-15B:

sc2-evals.png


StarCoder2 and The Stack v2What is The Stack v2?

stackv2-banner.png


The Stack v2 is the largest open code dataset suitable for LLM pretraining. The Stack v2 is larger than The Stack v1, follows an improved language and license detection procedure, and better filtering heuristics. In addition, the training dataset is grouped by repositories, allowing to train models with repository context.


The Stack v1The Stack v2
full6.4TB67.5TB
deduplicated2.9TB32.1TB
training dataset~200B tokens~900B tokens

This dataset is derived from the Software Heritage archive, the largest public archive of software source code and accompanying development history. Software Heritage is an open, non profit initiative to collect, preserve, and share the source code of all publicly available software, launched by Inria, in partnership with UNESCO. We thank Software Heritage for providing access to this invaluable resource. For more details, visit the Software Heritage website.

The Stack v2 can be accessed through the Hugging Face Hub.

StarCoder2 and The Stack v2About BigCode

BigCode is an open scientific collaboration led jointly by Hugging Face and ServiceNow that works on the responsible development of large language models for code.

StarCoder2 and The Stack v2Links

StarCoder2 and The Stack v2Models

Paper: A technical report about StarCoder2 and The Stack v2.

GitHub: All you need to know about using or fine-tuning StarCoder2.

StarCoder2-3B: Small StarCoder2 model.

StarCoder2-7B: Medium StarCoder2 model.

StarCoder2-15B: Large StarCoder2 model.

StarCoder2 and The Stack v2Data & Governance

StarCoder2 License Agreement: The model is licensed under the BigCode OpenRAIL-M v1 license agreement.

StarCoder2 Search: Full-text search for code in the pretraining dataset.

StarCoder2 Membership Test: Blazing fast test if code was present in pretraining dataset.

StarCoder2 and The Stack v2Others

VSCode Extension: Code with StarCoder!

Big Code Models Leaderboard You can find all the resources and links at huggingface.co/bigcode!
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894


DiffuseKronA

A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality of T2I personalized image synthesis. Addressing these constraints, we introduce \textit{DiffuseKronA}, a novel Kronecker product-based adaptation module that not only significantly reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth and the original DreamBooth, respectively, but also enhances the quality of image synthesis. Crucially, DiffuseKronA mitigates the issue of hyperparameter sensitivity, delivering consistent high-quality generations across a wide range of hyperparameters, thereby diminishing the necessity for extensive fine-tuning. Furthermore, a more controllable decomposition makes DiffuseKronA more interpretable and even can achieve up to a 50\% reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse and complex input images and text prompts, DiffuseKronA consistently outperforms existing models, producing diverse images of higher quality with improved fidelity and a more accurate color distribution of objects, all the while upholding exceptional parameter efficiency, thus presenting a substantial advancement in the field of T2I generative modeling.
GHZ7JFgWEAAeCoU.jpg

https://arxiv.org/pdf/2402.17412.pdf



aveling Textual Descriptions into Artistic Creations​

front_anime.png

front4.png

front3.png

front1.png

front2.png

front_anime.png

front4.png

front3.png

front1.png

front2.png

front_anime.png

front4.png

front3.png



Our method, DiffuseKronA, achieves superior image quality and accurate text-image correspondence across diverse input images and prompts, all the while upholding exceptional parameter efficiency. In this context, [V] denotes a unique token used for fine-tuning a specific subject in the text-to-image diffusion model.
For more results, please visit gallery!

Superior Fidelity and Colour Distribution​

Our approach consistently produces images of superior fidelity compared to LoRA-DreamBooth, as illustrated below. Notably, the clock generated by our method faithfully reproduces the intricate details, such as the exact depiction of the numeral 3, mirroring the original image. In contrast, the output from LoRA-DreamBooth exhibits difficulties in achieving such high fidelity. Additionally, our method demonstrates improved color distribution in the generated images, a feature clearly evident in the RC Car images in below. Moreover, it struggles to maintain fidelity to the numeral 1 on the chest of the sitting toy.
LoRA.png

Text Alignment​

Our method comprehends the intricacies and complexities of text prompts provided as input, producing images that align with the given text prompts, as depicted below. The generated image of the character in response to the prompt exemplifies the meticulous attention our method pays to detail. It elegantly captures the presence of a shop in the background, a bowl with noodles in front of the character, and accompanying soup bowls. In contrast, LoRA-DreamBooth struggles to generate an image that aligns seamlessly with the complex input prompt. Our method not only generates images that align with text but is also proficient in producing a diverse range of images for a given input.
text_align.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894








1/8
SUPIR released their weights today

Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

SUPIR released their weights today

Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

@chenxi_jw@chenxi_jw got this cool upscaler up on got this cool upscaler up on @replicate@replicate
GHDLTkBWsAAguIC.png

GHDORxqXgAE5MUm.jpg






GFSiGMDX0AAlSg5.jpg

GFSiTPBXoAAKnwa.jpg





(CVPR2024) Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild​

[Paper]   [Project Page]   [Online Demo (Coming soon)]
Fanghua, Yu, Jinjin Gu, Zheyuan Li, Jinfan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong
Shenzhen Institute of Advanced Technology; Shanghai AI Laboratory; University of Sydney; The Hong Kong Polytechnic University; ARC Lab, Tencent PCG; The Chinese University of Hong Kong


⚠ Due to the large RAM (60G) and VRAM (30G x2) costs of SUPIR, we are working on the online demo releasing.​

 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894











1/12
1/ We are releasing Playground v2.5, our latest foundation model to create images.

We tested our model across 20K+ users in a rigorous benchmark that went beyond anything we've seen to date.

This model is open weights. More information in the tweets below.

2/12
2/ You can use it on http:// right now, world-wide.

Playground v2.5 improves dramatically on color & contrast, multi aspect ratio, and aesthetics to push image quality as high possible while not changing the model arch that the community has built tools on.

3/12
3/ We did rigorous benchmarking with real users of these models beating several state-of-the-art models.
GHW9uBtasAAFEtQ.jpg

GHW-ZWwbEAAN6lN.jpg

GHW_NNFaAAEQRjB.jpg

GHW_rRGb0AA88ju.jpg

GGKWckzaYAAK6Ux.jpg







1/9
Playground AI releases Playground v2.5

latest foundation model to create images.

tested model across 20K+ users in a rigorous benchmark

This model is open weights

awful at text


https://arxiv.org/abs/2402.17245





1/3
playground-v2.5-1024px-aesthetic is very cool

Check it out on @replicate

Link below

2/3
Try it out here:

3/3
Or try out one of the prompts from here:
GHYLgJMXYAAGFRZ.jpg

GHYLWGGW0AAnXNy.jpg

GHYLWlsWUAATFRk.jpg

GHYLXC8WUAAsbws.jpg






Playground v2.5 is a diffusion-based text-to-image generative model, and a successor to Playground v2.

Playground v2.5 is the state-of-the-art open-source model in aesthetic quality. Our user studies demonstrate that our model outperforms SDXL, Playground v2, PixArt-α, DALL-E 3, and Midjourney 5.2.

For details on the development and training of our model, please refer to our blog post and technical report.

Model Description​

  • Developed by: Playground
  • Model type: Diffusion-based text-to-image generative model
  • License: Playground v2.5 Community License
  • Summary: This model generates images based on text prompts. It is a Latent Diffusion Model that uses two fixed, pre-trained text encoders (OpenCLIP-ViT/G and CLIP-ViT/L). It follows the same architecture as Stable Diffusion XL.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894


1/11
GPT-4 with simple engineering can predict the future around as well as crowds:
https://https://arxiv.org/abs/2402.18563
On hard questions, it can do better than crowds.

If these systems become extremely good at seeing the future, they could serve as an objective, accurate third-party. This would help us better anticipate the longterm consequences of our actions and make more prudent decisions.

"The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom." - Asimov

I didn't write this paper, but we called for AI forecasting research in Unsolved Problems in ML Safety some years back (


http://http://arxiv.org/abs/2109.13916), and concretized as a research avenue a year later in Forecasting Future World Events with Neural Networks (), and concretized as a research avenue a year later in Forecasting Future World Events with Neural Networks (https://https://arxiv.org/abs/2206.15474). Hopefully AI companies will add this feature as the election season begins.). Hopefully AI companies will add this feature as the election season begins.

2/11
paper written by @dannyhalawi15 @FredZhang0 @jcyhc_ai @JacobSteinhardt

3/11
For anyone wanting to reuse the prompt from the image:

Question: {question}

Question Background: {background}

Resolution Criteria: {resolution_criteria}

Today's date: {date_begin}
Question close date: {date_end}

We have retrieved the following information for this question:…

4/11
Reclaim up to 15 hours a week with BELAY! Delegate tasks to our U.S.-based Virtual Assistants & Accounting Services to focus on your goals. Discover flexible staffing solutions and make more time for what matters.

5/11
Rip Prediction markets

6/11
Anyone thought about making a gPT agent with the tetlock principles from super for exacting built into the model

Could also combine with upstream thinking principles

7/11
: language needs revision with humility :: eg seeing the future…

8/11


9/11
Just wait until people start asking controversial macrosocial questions and you'll find out the limits of these "foundation models" are not baked into the foundation models so much as they are baked into the organizations that create the foundation models.

In AGI formal theory,…

10/11
Travelers tv series got it right !

11/11
Predicting the future is cool, but having robots as third-party sounds sketchy.
GHfDMX5akAAKCsp.png

GHfDgJ-aMAA0Be6.png

GHZa5WiWoAEOUMo.jpg

GHZa9p9WsAA4Z7O.jpg












1/12
Can we build an LLM system to forecast geo-political events at the level of human forecasters?

Introducing our work Approaching Human-Level Forecasting with Language Models!

Arxiv: https:// Joint work with @dannyhalawi15, @FredZhang0, and @jcyhc_ai

2/12
In this work, we build a LM pipeline for automated forecasting. Given any question about a future event, it retrieves and summarizes relevant articles, reasons about them, and predicts the probability that the event occurs.

3/12
We compare our system to ensembles of competitive human forecasters ("the crowd"). We approach the performance of the crowd across all questions, and beat the crowd on questions where they are less confident (probabilities between 0.3 and 0.7).

4/12
Moreover, averaging our prediction with the crowd consistently outperforms the crowd itself (as measured by Brier score, the most commonly-used metric of forecasting performance).

5/12
Our system has a number of interesting properties. For instance, our forecasted probabilities are well-calibrated, even though we perform no explicit calibration and even though the base models themselves are not (!).

6/12
Second, our model underperforms on "easy" questions (where the answer is nearly certain), because it is unwilling to give probabilities very close to 0 or 1. This is possibly an artifact of its safety training.

7/12
Finally, we provide a self-supervised method that fine-tunes models to forecast better, based on having them mimic rationales and forecasts that outperform the crowd. This is effective enough that fine-tuned GPT-3.5 can beat a carefully prompted GPT-4.

8/12
For some cool related work, see https://, which examines human-LLM forecasting teams, and https:// and https://-forecasting-tournament…, which introduce AI forecasting competitions.

9/12
We are excited to continue this work! Please email @dannyhalawi15 at dannyhalawi15@gmail.com to get in touch.

10/12
Cool! Weird question, but: reading the paper, you are finetuning a model and then testing it at again. But are you testing it on the same questions you finetuned it on? Couldn't find a discussion of this on a quick read of the first 15 pages.

11/12
Are you responsible for optimizing CI/Build/Test performance? Watch 17 presentations from the 2023 DPE Summit to learn how top engineering teams are using Developer Productivity Engineering practices to improve performance.

12/12
Dark mode for this paper for those who read at night
GHeVg-TbIAA0Qit.jpg

GHeVlQLboAA3mgJ.jpg

GHeVqGfasAA0K48.jpg

GHeVzx2bAAAw2Cw.jpg

GHeV8j6bkAE1ox8.jpg

GGKWckzaYAAK6Ux.jpg


1/1
Large Language Models are getting better at forecasting - indeed many are capable of becoming superforecasters

Approaching Human-Level Forecasting with Language Models Large Language Models are getting better at forecasting - indeed many are capable of becoming superforecasters

Approaching Human-Level Forecasting with Language Models #LLM#LLM #AI#AI #GenAI#GenAI

In this work, during the fine-tuning phase, one of the prompts the researchers use to generate strong reasonings asks the model to build a decision tree and assign probabilities. The fine-tuned model learns the reasoning path (without being explicitly prompted to do so).



In this work, during the fine-tuning phase, one of the prompts the researchers use to generate strong reasonings asks the model to build a decision tree and assign probabilities. The fine-tuned model learns the reasoning path (without being explicitly prompted to do so).

https://https://lnkd.in/dBReygRD
GHfd-4dW8AEfr8y.jpg




Computer Science > Machine Learning​

[Submitted on 28 Feb 2024]

Approaching Human-Level Forecasting with Language Models​

Danny Halawi, Fred Zhang, Chen Yueh-Han, Jacob Steinhardt
Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters, and in some settings surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions at scale and help to inform institutional decision making.
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:arXiv:2402.18563 [cs.LG]
(or arXiv:2402.18563v1 [cs.LG] for this version)
[2402.18563] Approaching Human-Level Forecasting with Language Models
Focus to learn more

Submission history

From: Danny Halawi [view email]
[v1] Wed, 28 Feb 2024 18:54:18 UTC (3,212 KB)


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894

StarCoder 2 is a code-generating AI that runs on most GPUs​

Kyle Wiggers @kyle_l_wiggers / 9:00 AM EST•February 28, 2024

software engineer working on laptop with circuit board

Image Credits: Tippapatt / Getty Images

Developers are adopting AI-powered code generators — services like GitHub Copilot and Amazon CodeWhisperer, along with open access models such as Meta’s CodeLlama — at an astonishingrate. But the tools are far from ideal. Many aren’t free. Others are, but only under licenses that preclude them from being used in common commercial contexts.

Perceiving the demand for alternatives, AI startup Hugging Face several years ago teamed up with ServiceNow, the workflow automation platform, to create StarCoder, an open source code generator with a less restrictive license than some of the others out there. The original came online early last year, and work has been underway on a follow-up, StarCoder 2, ever since.

StarCoder 2 isn’t a single code-generating model, but rather a family. Released today, it comes in three variants, the first two of which can run on most modern consumer GPUs:
  • A 3-billion-parameter (3B) model trained by ServiceNow
  • A 7-billion-parameter (7B) model trained by Hugging Face
  • A 15-billion-parameter (15B) model trained by Nvidia, the newest supporter of the StarCoder project.
(Note that “parameters” are the parts of a model learned from training data and essentially define the skill of the model on a problem, in this case generating code.)

Like most other code generators, StarCoder 2 can suggest ways to complete unfinished lines of code as well as summarize and retrieve snippets of code when asked in natural language. Trained with 4x more data than the original StarCoder (67.5 terabytes versus 6.4 terabytes), StarCoder 2 delivers what Hugging Face, ServiceNow and Nvidia characterize as “significantly” improved performance at lower costs to operate.

StarCoder 2 can be fine-tuned “in a few hours” using a GPU like the Nvidia A100 on first- or third-party data to create apps such as chatbots and personal coding assistants. And, because it was trained on a larger and more diverse data set than the original StarCoder (~619 programming languages), StarCoder 2 can make more accurate, context-aware predictions — at least hypothetically.

“StarCoder 2 was created especially for developers who need to build applications quickly,” Harm de Vries, head of ServiceNow’s StarCoder 2 development team, told TechCrunch in an interview. “With StarCoder2, developers can use its capabilities to make coding more efficient without sacrificing speed or quality.”

Now, I’d venture to say that not every developer would agree with De Vries on the speed and quality points. Code generators promise to streamline certain coding tasks — but at a cost.

A recent Stanford study found that engineers who use code-generating systems are more likely to introduce security vulnerabilities in the apps they develop. Elsewhere, a poll from Sonatype, the cybersecurity firm, shows that the majority of developers are concerned about the lack of insight into how code from code generators is produced and “code sprawl” from generators producing too much code to manage.

StarCoder 2’s license might also prove to be a roadblock for some.

StarCoder 2 is licensed under Hugging Face’s RAIL-M, which aims to promote responsible use by imposing “light touch” restrictions on both model licensees and downstream users. While less constraining than many other licenses, RAIL-M isn’t truly “open” in the sense that it doesn’t permit developers to use StarCoder 2 for every conceivable application (medical advice-giving apps are strictly off limits, for example). Some commentators say RAIL-M’s requirements may be too vague to comply with in any case — and that RAIL-M could conflict with AI-related regulations like the EU AI Act.

Setting all this aside for a moment, is StarCoder 2 really superior to the other code generators out there — free or paid?

Depending on the benchmark, it appears to be more efficient than one of the versions of CodeLlama, CodeLlama 33B. Hugging Face says that StarCoder 2 15B matches CodeLlama 33B on a subset of code completion tasks at twice the speed. It’s not clear which tasks; Hugging Face didn’t specify.

StarCoder 2, as an open source collection of models, also has the advantage of being able to deploy locally and “learn” a developer’s source code or codebase — an attractive prospect to devs and companies wary of exposing code to a cloud-hosted AI. In a 2023 survey from Portal26 and CensusWide, 85% of businesses said that they were wary of adopting GenAI like code generators due to the privacy and security risks — like employees sharing sensitive information or vendors training on proprietary data.

Hugging Face, ServiceNow and Nvidia also make the case that StarCoder 2 is more ethical — and less legally fraught — than its rivals.

All GenAI models regurgitate — in other words, spit out a mirror copy of data they were trained on. It doesn’t take an active imagination to see why this might land a developer in trouble. With code generators trained on copyrighted code, it’s entirely possible that, even with filters and additional safeguards in place, the generators could unwittingly recommend copyrighted code and fail to label it as such.

A few vendors, including GitHub, Microsoft (GitHub’s parent company) and Amazon, have pledged to provide legal coverage in situations where a code generator customer is accused of violating copyright. But coverage varies vendor-to-vendor and is generally limited to corporate clientele.

As opposed to code generators trained using copyrighted code (GitHub Copilot, among others), StarCoder 2 was trained only on data under license from the Software Heritage, the nonprofit organization providing archival services for code. Ahead of StarCoder 2’s training, BigCode, the cross-organizational team behind much of StarCoder 2’s roadmap, gave code owners a chance to opt out of the training set if they wanted.

As with the original StarCoder, StarCoder 2’s training data is available for developers to fork, reproduce or audit as they please.

Leandro von Werra, a Hugging Face machine learning engineer and co-lead of BigCode, pointed out that while there’s been a proliferation of open code generators recently, few have been accompanied by information about the data that went into training them and, indeed, how they were trained.

“From a scientific standpoint, an issue is that training is not reproducible, but also as a data producer (i.e. someone uploading their code to GitHub), you don’t know if and how your data was used,” Von Werra said in an interview. “StarCoder 2 addresses this issue by being fully transparent across the whole training pipeline from scraping pretraining data to the training itself.”

StarCoder 2 isn’t perfect, that said. Like other code generators, it’s susceptible to bias. De Vries notes that it can generate code with elements that reflect stereotypes about gender and race. And because StarCoder 2 was trained on predominantly English-language comments, Python and Java code, it performs weaker on languages other than English and “lower-resource” code like Fortran and Haksell.

Still, Von Werra asserts it’s a step in the right direction.

“We strongly believe that building trust and accountability with AI models requires transparency and auditability of the full model pipeline including training data and training recipe,” he said. “StarCoder 2 [showcases] how fully open models can deliver competitive performance.”

You might be wondering — as was this writer — what incentive Hugging Face, ServiceNow and Nvidia have to invest in a project like StarCoder 2. They’re businesses, after all — and training models isn’t cheap.

So far as I can tell, it’s a tried-and-true strategy: foster goodwill and build paid services on top of the open source releases.

ServiceNow has already used StarCoder to create Now LLM, a product for code generation fine-tuned for ServiceNow workflow patterns, use cases and processes. Hugging Face, which offers model implementation consulting plans, is providing hosted versions of the StarCoder 2 models on its platform. So is Nvidia, which is making StarCoder 2 available through an API and web front-end.

For devs expressly interested in the no-cost offline experience, StarCoder 2 — the models, source code and more — can be downloaded from the project’s GitHub page.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894






1/6
Humanoid Locomotion as Next Token Prediction

We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for

2/6
Humanoid Locomotion as Next Token Prediction

We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. Our model is a causal transformer trained via autoregressive prediction of sensorimotor trajectories. To account for

3/6
the multi-modal nature of the data, we perform prediction in a modality-aligned way, and for each input token predict the next token from the same modality. This general formulation enables us to leverage data with missing modalities, like video trajectories without actions. We

4/6
train our model on a collection of simulated trajectories coming from prior neural network policies, model-based controllers, motion capture data, and YouTube videos of humans. We show that our model enables a full-sized humanoid to walk in San Francisco zero-shot. Our model can

5/6
transfer to the real world even when trained on only 27 hours of walking data, and can generalize to commands not seen during training like walking backward.

6/6
paper page:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894







1/6
DistriFusion

Distributed Parallel Inference for High-Resolution Diffusion Models

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous

2/6
computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method splits the model input into multiple patches and assigns each patch to a

3/6
GPU. However, na\"{\i}vely implementing such an algorithm breaks the interaction between patches and loses fidelity, while incorporating such an interaction will incur tremendous communication overhead. To overcome this dilemma, we observe the high similarity between the input

4/6
from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step.

5/6
Therefore, our method supports asynchronous communication, which can be pipelined by computation. Extensive experiments show that our method can be applied to recent Stable Diffusion XL with no quality degradation and achieve up to a 6.1 x speedup on 8 NVIDIA A100s compared to 1.

6/6
paper page:
GHj__wSXMAEUG2f.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894










1/9
Trajectory Consistency Distillation

Latent Consistency Model (LCM) extends the Consistency Model to the latent space and leverages the guided consistency distillation technique to achieve impressive performance in accelerating text-to-image synthesis. However, we observed that

2/9
LCM struggles to generate images with both clarity and detailed intricacy. To address this limitation, we initially delve into and elucidate the underlying causes. Our investigation identifies that the primary issue stems from errors in three distinct areas. Consequently, we

3/9
introduce Trajectory Consistency Distillation (TCD), which encompasses trajectory consistency function and strategic stochastic sampling. The trajectory consistency function diminishes the distillation errors by broadening the scope of the self-consistency boundary condition and

4/9
endowing the TCD with the ability to accurately trace the entire trajectory of the Probability Flow ODE. Additionally, strategic stochastic sampling is specifically designed to circumvent the accumulated errors inherent in multi-step consistency sampling, which is meticulously

5/9
tailored to complement the TCD model. Experiments demonstrate that TCD not only significantly enhances image quality at low NFEs but also yields more detailed results compared to the teacher model at high NFEs.

6/9
paper page:

7/9
demo:

8/9
model:

9/9
"An astronaut riding a green horse"
GHj762IWgAARvfv.jpg

GHj9VhQXIAAZP5g.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894






1/5
Priority Sampling of Large Language Models for Compilers

Large language models show great potential in generating and optimizing code. Widely used sampling methods such as Nucleus Sampling increase the diversity of generation but often produce repeated samples for low

2/5
temperatures and incoherent samples for high temperatures. Furthermore, the temperature coefficient has to be tuned for each task, limiting its usability. We present Priority Sampling, a simple and deterministic sampling technique that produces unique samples ordered by the

3/5
model's confidence. Each new sample expands the unexpanded token with the highest probability in the augmented search tree. Additionally, Priority Sampling supports generation based on regular expression that provides a controllable and structured exploration process. Priority

4/5
Sampling outperforms Nucleus Sampling for any number of samples, boosting the performance of the original model from 2.87% to 5% improvement over -Oz.

5/5
paper page:
GHjxbYgXQAABHep.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894





1/4
Google presents Griffin

Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN

2/4
with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama-2 despite being trained on over 6 times fewer

3/4
tokens. We also show that Griffin can extrapolate on sequences significantly longer than those seen during training. Our models match the hardware efficiency of Transformers during training, and during inference they have lower latency and significantly higher throughput.

4/4
paper page:
GHjrCvkXUAACXQ_.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894






1/6
Amazon presents ViewFusion

Towards Multi-View Consistency via Interpolated Denoising

Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images.

2/6
Amazon presents ViewFusion

Towards Multi-View Consistency via Interpolated Denoising

Novel-view synthesis through diffusion models has demonstrated remarkable potential for generating diverse and high-quality images.

3/6
Yet, the independent process of image generation in these prevailing methods leads to challenges in maintaining multiple-view consistency. To address this, we introduce ViewFusion, a novel, training-free algorithm that can be seamlessly integrated into existing pre-trained

4/6
diffusion models. Our approach adopts an auto-regressive method that implicitly leverages previously generated views as context for the next view generation, ensuring robust multi-view consistency during the novel-view generation process.

5/6
Through a diffusion process that fuses known-view information via interpolated denoising, our framework successfully extends single-view conditioned models to work in multiple-view conditional settings without any additional fine-tuning.

6/6
paper page:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
56,206
Reputation
8,249
Daps
157,894









1/8
Snap presents Panda-70M

Captioning 70M Videos with Multiple Cross-Modality Teachers

The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to

2/8
collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, and showing multiple actions.

3/8
Accordingly, to establish a video dataset with high-quality captions, we propose an automatic approach leveraging multimodal inputs, such as textual video description, subtitles, and individual video frames.

4/8
Specifically, we curate 3.8M high-resolution videos from the publicly available HD-VILA-100M dataset. We then split them into semantically consistent video clips, and apply multiple cross-modality teacher models to obtain captions for each video.

5/8
Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation. In this way, we get 70M videos paired with high-quality text captions.

6/8
We dub the dataset as Panda-70M. We show the value of the proposed dataset on three downstream tasks: video captioning, video and text retrieval, and text-driven video generation.

7/8
paper page:

8/8
DistriFusion

Distributed Parallel Inference for High-Resolution Diffusion Models

Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous
GHjk7T7W8AAq4hU.jpg

GHj__wSXMAEUG2f.jpg
 
Top