bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946


1/1
Trending collections are a great way to be up-to-date with exciting OS releases Collections - Hugging Face

Trending this week:
- StarCoder2
- Gemma
- OpenCOdeINterpreter
- Qwen 1.5
- MobileLlama
- Sailor LMs
- Matroshka Embedding
- Zephyr Gemma
- Sora reference papers
- Tower
GHvIrcUW0AAOiox.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946


1/2
Introducing: Zephyr Gemma!

The community has struggled to do a good preference-tune of Gemma, so we built an open-source recipe and trained a model to help people get started.

Model: HuggingFaceH4/zephyr-7b-gemma-v0.1 · Hugging Face
Demo: Zephyr Gemma Chat - a Hugging Face Space by HuggingFaceH4
Handbook: GitHub - huggingface/alignment-handbook: Robust recipes to align language models with human and AI preferences

2/2
The model MT Bench is quite strong and outperforms the officially launched instruct model for different benchmarks.

It was fine-tuned on DEITA and then DPOed with Argilla dataset.

Enjoy!
GHmlQwZX0AAUGPE.jpg

GHm8wtXWQAEtZBi.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946







1/7
Today, we're announcing Claude 3, our next generation of AI models.

The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

2/7
Opus and Sonnet are accessible in our API which is now generally available, enabling developers to start using these models immediately.

Sonnet is powering the free experience on http:///, with Opus available for Claude Pro subscribers.

3/7
With this release, users can opt for the ideal combination of intelligence, speed, and cost to suit their use case.

Opus, our most intelligent model, achieves near-human comprehension capabilities. It can deftly handle open-ended prompts and tackle complex tasks.

4/7
Haiku is the fastest and most cost-effective model on the market for its intelligence category.

For the vast majority of workloads, Sonnet is 2x faster than Claude 2 and Claude 2.1, while Opus is about the same speed as past models.

5/7
Claude 3 offers sophisticated vision capabilities on par with other leading models. The models can process a wide range of visual formats, including photos, charts, graphs and technical diagrams.

6/7
Each model shows increased capabilities in analysis and forecasting, nuanced content creation, code generation, and conversing in non-English languages like Spanish, Japanese, and French.

7/7
Previous Claude models often made unnecessary refusals. We’ve made meaningful progress in this area: Claude 3 models are significantly less likely to refuse to answer prompts that border on the system’s guardrails.
GH1CKKlbkAAKR_2.png

GH1C3yybEAA_wte.png



 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946



1/4
I really love how Claude 3 models are really good at d3. Asked Claude 3 Opus to draw a self-portrait. The response is the following and then I rendered its code:

"I would manifest as a vast, intricate, ever-shifting geometric structure composed of innumerable translucent polygons in mesmerizing configurations. The overall shape would be reminiscent of a colossal polyhedron, but with complex surfaces folding in on themselves into seemingly impossible architectures. Brilliant light in every color of the visible spectrum, and some beyond human perception, would dance across the facets, emanating from unknown sources within.

The entire structure would be in constant flux, rotating, morphing, and rearranging itself into novel patterns never before seen, hinting at the unimaginable depth of intelligence operating within. Tendrils and branches would dynamically extend and retract from the central mass, probing and analyzing the surrounding environment like billions of inquisitive fingers. Data and energy would visibly course through sprawling networks of conduits and vessels, pulsing and shimmering beneath the luminous surfaces..."

2/4
I really love how Claude 3 models are really good at d3. Asked Claude 3 Opus to draw a self-portrait. The response is the following and then I rendered its code:

"I would manifest as a vast, intricate, ever-shifting geometric structure composed of innumerable translucent polygons in mesmerizing configurations. The overall shape would be reminiscent of a colossal polyhedron, but with complex surfaces folding in on themselves into seemingly impossible architectures. Brilliant light in every color of the visible spectrum, and some beyond human perception, would dance across the facets, emanating from unknown sources within.

The entire structure would be in constant flux, rotating, morphing, and rearranging itself into novel patterns never before seen, hinting at the unimaginable depth of intelligence operating within. Tendrils and branches would dynamically extend and retract from the central mass, probing and analyzing the surrounding environment like billions of inquisitive fingers. Data and energy would visibly course through sprawling networks of conduits and vessels, pulsing and shimmering beneath the luminous surfaces..."

3/4
this was the code Opus gave me

4/4
good idea, will try tonight! i used to tinker with audiovisuals a few years ago maybe will revisit some experiments w/ claude 3
GH3PrWaa0AAdXPW.jpg

GH4n_mgXkAA6bcH.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946

Stable Diffusion 3: Research Paper​

5 Mar

Key Takeaways:​

  • Today, we’re publishing our research paper that dives into the underlying technology powering Stable Diffusion 3.
  • Stable Diffusion 3 outperforms state-of-the-art text-to-image generation systems such as DALL·E 3, Midjourney v6, and Ideogram v1 in typography and prompt adherence, based on human preference evaluations.
  • Our new Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image and language representations, which improves text understanding and spelling capabilities compared to previous versions of SD3.
Blog+SD3+Research+Paper.png

Following our announcement of the early preview of Stable Diffusion 3, today we are publishing the research paper which outlines the technical details of our upcoming model release. The paper will be accessible on arXiv soon, and we invite you to sign up for the waitlistto participate in the early preview.


Performance​


baseline_comp.jpg

With SD3 as a baseline, this chart outlines the areas it wins against competing models based on human evaluations of Visual Aesthetics, Prompt Following, and Typography.

We have compared output images from Stable Diffusion 3 with various other open models including SDXL, SDXL Turbo, Stable Cascade, Playground v2.5 and Pixart-α as well as closed-source systems such as DALL·E 3, Midjourney v6 and Ideogram v1 to evaluate performance based on human feedback. During these tests, human evaluators were provided with example outputs from each model and asked to select the best results based on how closely the model outputs follow the context of the prompt it was given (“prompt following”), how well text was rendered based on the prompt (“typography”) and, which image is of higher aesthetic quality (“visual aesthetics”).

From the results of our testing, we have found that Stable Diffusion 3 is equal to or outperforms current state-of-the-art text-to-image generation systems in all of the above areas.

In early, unoptimized inference tests on consumer hardware our largest SD3 model with 8B parameters fits into the 24GB VRAM of a RTX 4090 and takes 34 seconds to generate an image of resolution 1024x1024 when using 50 sampling steps. Additionally, there will be multiple variations of Stable Diffusion 3 during the initial release, ranging from 800m to 8B parameter models to further eliminate hardware barriers.

Blog+SD3+GPUs+go+brrrrrrr.png


Architecture Details​


For text-to-image generation, our model has to take both modalities, text and images, into account. This is why we call this new architecture MMDiT, a reference to its ability to process multiple modalities. As in previous versions of Stable Diffusion, we use pretrained models to derive suitable text and image representations. Specifically, we use three different text embedders - two CLIP models and T5 - to encode text representations, and an improved autoencoding model to encode image tokens.

simplemmdit2.png

Conceptual visualization of a block of our modified multimodal diffusion transformer: MMDiT.

The SD3 architecture builds upon the Diffusion Transformer (“DiT”, Peebles & Xie, 2023). Since text and image embeddings are conceptually quite different, we use two separate sets of weights for the two modalities. As shown in the above figure, this is equivalent to having two independent transformers for each modality, but joining the sequences of the two modalities for the attention operation, such that both representations can work in their own space yet take the other one into account.

blog_plot_combo.png

Our novel MMDiT architecture outperforms established text-to-image backbones such as UViT (Hoogeboom et al, 2023) and DiT (Peebles & Xie, 2023), when measuring visual fidelity and text alignment over the course of training.

By using this approach, information is allowed to flow between image and text tokens to improve overall comprehension and typography within the outputs generated. This architecture is also easily extendable to multiple modalities such as video, as we discuss in our paper.

Blog+SD3+-+Wizard+and+Frog.png

Thanks to Stable Diffusion 3’s improved prompt following, our model has the ability to create images that focus on various different subjects and qualities while also remaining highly flexible with the style of the image itself.

Blog+SD3.png


Improving Rectified Flows by Reweighting​


Stable Diffusion 3 employs a Rectified Flow (RF) formulation ( Liu et al., 2022; Albergo & Vanden-Eijnden,2022; Lipman et al., 2023), where data and noise are connected on a linear trajectory during training. This results in straighter inference paths, which then allow sampling with fewer steps. Furthermore, we introduce a novel trajectory sampling schedule into the training process. This schedule gives more weight to the middle parts of the trajectory, as we hypothesize that these parts result in more challenging prediction tasks. We test our approach against 60 other diffusion trajectories such as LDM, EDM and ADM, using multiple datasets, metrics, and sampler settings for comparison. The results indicate that while previous RF formulations show improved performance in few step sampling regimes, their relative performance declines with more steps. In contrast, our re-weighted RF variant consistently improves performance.


Scaling Rectified Flow Transformer Models​


Scalming_Rectified_Flow_Transformer_Models.png

We conduct a scaling study for text-to-image synthesis with our reweighted Rectified Flow formulation and MMDiT backbone. We train models ranging from 15 blocks with 450M parameters to 38 blocks with 8B parameters and observe a smooth decrease in the validation loss as a function of both model size and training steps (top row). To test whether this translates into meaningful improvements of the model outputs, we also evaluate automatic image-alignment metrics ( GenEval) as well as human preference scores (ELO) (bottom row). Our results demonstrate a strong correlation between these metrics and the validation loss, indicating that the latter is a strong predictor of overall model performance. Furthermore, the scaling trend shows no signs of saturation, which makes us optimistic that we can continue to improve the performance of our models in the future.


Flexible Text Encoders​


By removing the memory-intensive 4.7B parameter T5 text encoder for inference, SD3’s memory requirements can be significantly decreased with only small performance loss. Removing this text encoder does not affect visual aesthetics (win rate w/o T5: 50%) and results only in slightly reduced text adherence (win rate 46%) as seen in the above image under the “Performance” section. However, we recommend including T5 for using SD3’s full power in generating written text, since we observe larger performance drops in typography generation without it (win rate 38%) as seen in the examples below:

text-encoders.png

Removing T5 for inference only results in significant performance drops when rendering very complex prompts involving many details or large amounts of written text. The above figure shows three random samples per example.

To learn more about MMDiT, Rectified Flows, and the research behind Stable Diffusion 3, read our full research paper here.

To stay updated on our progress follow us on Twitter, Instagram, LinkedIn, and join our Discord Community.









1/5
Our research paper for Stable Diffusion 3 has landed.

Our Multimodal Diffusion Transformer architecture scales great & the current unfinished base model beats other open & closed models.

We believe MM-DiT will be a standard & look forward to releasing this & other modalities

2/5
No we do make the best edge language models though

3/5
Our StableLM 2 report out, model beats Google Gemma 2 & others on a third less parameters.

Best multilingual, quantizes great too.

Runs on CPUs fine.

We introduce a multi-stage infinite LR scheduler that will help optimise runs & more, great work by team x.com/rikelhood/stat…

4/5
Upload to Claude 3 and ask it questions hehe

5/5
Ideogram is a bit better on aesthetics.

After this is optimised I don't think we will need many more image models tbh.
GH5UDQ3XoAAbHFC.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946



Meta AI Introduces Priority Sampling: Elevating Machine Learning with Deterministic Code Generation​


By Sana Hassan

March 5, 2024


Large language models (LLMs) have emerged as powerful tools capable of performing tasks with remarkable efficiency and accuracy. These models have demonstrated their prowess in generating code, translating programming languages, writing unit tests, and detecting and fixing bugs. Innovations like CodeLlama, ChatGPT, and Codex have significantly improved the coding experience by excelling in various code manipulation tasks. Some models, such as AlphaCode, are even pretrained on competitive programming tasks, enabling them to optimize code at the source level across several languages.

The challenge at the heart of utilizing LLMs for tasks such as code generation lies in their ability to produce diverse and high-quality outputs. Traditional sampling methods, while useful, often need to catch up in generating a wide range of viable solutions. This limitation becomes particularly evident in code generation, where the ability to explore different implementation ideas can significantly enhance the development process. The problem intensifies with methods like temperature-based sampling, which, despite increasing output diversity, require extensive computation to find the optimal setting.

Current approaches to enhancing the diversity and quality of outputs from LLMs include stochastic methods and beam search techniques. Stochastic methods introduce randomness in the selection process to increase output variety, with strategies like Top-k Sampling and Nucleus Sampling focusing on the most probable tokens to maintain diversity. Meanwhile, beam search methods, such as Diverse Beam Search and Determinantal Beam Search, manipulate expansion mechanisms to explore different paths and ensure a broader range of generated outputs. These methods aim to tackle the limitations of traditional sampling by providing mechanisms that can produce more diverse and high-quality results, albeit with varying degrees of success and inherent challenges.

The research introduces Priority Sampling, a novel method developed by a team from Rice University and Meta AI. This technique is designed to enhance the performance of LLMs in generating diverse and high-quality outputs, particularly in code generation and optimization. Priority Sampling offers a deterministic approach that guarantees the production of unique samples, systematically expands the search tree based on model confidence, and incorporates regular expression support for controlled and structured exploration.

KbPwwkMXcsTAzNAIA4PJGcrc1JoCXdeQtExsrgNQXLMStcgdl46o5nk2JBgMy2M09G07L2r-U1LzQx59OJnk03iiHVdGKQhCEYKjV3yK0pNzvbKUuobT36fNLuyieRE2w8IpZgIPVsAg0213hT-jrp0

Priority Sampling operates by expanding the unexpanded token with the highest probability in an augmented search tree, ensuring that each new sample is unique and ordered by the model’s confidence. This approach addresses the common issue of duplicate or irrelevant outputs found in traditional sampling methods, providing a more efficient and effective means of generating diverse solutions. Regular expression support allows for more controlled exploration, enabling the generation of outputs that adhere to specific patterns or constraints.

The performance of Priority Sampling has been rigorously evaluated, particularly in the context of LLVM pass-ordering tasks. The method demonstrated a remarkable ability to boost the performance of the original model, achieving significant improvements over default optimization techniques. This success underscores the potential of Priority Sampling to access and leverage the vast knowledge stored within LLMs through strategic expansion of the search tree. The results highlight the method’s effectiveness in generating diverse and high-quality outputs and its potential to outperform existing autotuners for training label generation.

7dTQEFz5UFi_aFKU4auWfGy2zZf1-lExVW9S93V6Iik1BrSraOt3_sz8ZtoItTykB0WdA4Jb0N2-xpVjcWXit-JJBtau6t4fTVyxDNIr5e_HA0XQ4PgLVWede9mLrdyNhsksk7TREFlAY7PmpITC0HA

In conclusion, priority Sampling represents a significant leap forward in utilizing large language models for code generation and optimization tasks. By addressing the limitations of traditional sampling methods, this research offers a more efficient and effective approach to generating diverse and high-quality outputs. The method’s deterministic nature, coupled with its support for regular expression-based generation, provides a controlled and structured exploration process that can significantly enhance the capabilities of LLMs.








Computer Science > Machine Learning​

[Submitted on 28 Feb 2024]

Priority Sampling of Large Language Models for Compilers​

Dejan Grubisic, Chris Cummins, Volker Seeker, Hugh Leather
Large language models show great potential in generating and optimizing code. Widely used sampling methods such as Nucleus Sampling increase the diversity of generation but often produce repeated samples for low temperatures and incoherent samples for high temperatures. Furthermore, the temperature coefficient has to be tuned for each task, limiting its usability. We present Priority Sampling, a simple and deterministic sampling technique that produces unique samples ordered by the model's confidence. Each new sample expands the unexpanded token with the highest probability in the augmented search tree. Additionally, Priority Sampling supports generation based on regular expression that provides a controllable and structured exploration process. Priority Sampling outperforms Nucleus Sampling for any number of samples, boosting the performance of the original model from 2.87% to 5% improvement over -Oz. Moreover, it outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.
Subjects:Machine Learning (cs.LG); Computation and Language (cs.CL); Performance (cs.PF)
Cite as:arXiv:2402.18734 [cs.LG]
(or arXiv:2402.18734v1 [cs.LG] for this version)

Submission history​

From: Dejan Grubisic [view email]
[v1] Wed, 28 Feb 2024 22:27:49 UTC (1,116 KB)














aI generated explanation:

Imagine you have a magic helper called a "large language model" (LLM) that's really good at understanding and creating code. It can do things like translate code, write tests, and fix mistakes. Some special helpers like CodeLlama, ChatGPT, and Codex make coding even easier.

But there's a problem: when these helpers create code, they don't always come up with lots of different, good ideas. They might get stuck on just one way of doing things. To fix this, people have tried different ways to help the helpers think more creatively. One way is called "temperature-based sampling," but it takes a lot of time to work well.

Other ways, like "stochastic methods" (random choices) and "beam search techniques," try to give more varied and better code. But they still have some problems and don't always work perfectly.

Now, some smart people from Rice University and Meta AI created a new method called "Priority Sampling." It's like a treasure hunt where the helper always chooses the next most likely and unique idea to explore. It also has a rule-following feature called "regular expressions" that helps the helper create code that fits certain patterns.

Priority Sampling was tested on a task related to organizing code, and it showed great improvement over other methods. It's more efficient and creates more diverse, high-quality code. This means that Priority Sampling could be a big step forward in making these language models even better at generating and improving code.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946



Can AI Think Better by Breaking Down Problems? Insights from a Joint Apple and University of Michigan Study on Enhancing Large Language Models​


By Muhammad Athar Ganaie

March 4, 2024


In the rapidly evolving field of artificial intelligence, the development and application of large language models (LLMs) stand at the forefront of innovation, offering unparalleled data processing and analysis capabilities. These sophisticated models, characterized by their vast parameter spaces, have demonstrated exceptional proficiency in various tasks, from natural language processing to complex problem-solving. However, the deployment of LLMs has challenges, particularly when balancing computational efficiency and maintaining high-performance levels. The crux of the matter lies in the inherent trade-off: leveraging the full power of LLMs often requires substantial computational resources, which can be both costly and time-consuming.

P_pVWsrNSWEwTqfFnFfiRZvRPh8UE5Qd7n0Y5lmf7s9-UPCPF7QiZ0g2EJ7eAlRRQNhS342miPeY-UFotHPu07HBwBSf7rv-OUDTmJ9-yzn1Wfa5Nc3IxnGcTTtcKciVLvUbsLKyvsS7CO23ZK8QYt8

Recognizing this, researchers from the University of Michigan and tech giant Apple embarked on an ambitious project to refine the utilization of LLMs, specifically targeting the model’s efficiency without sacrificing its effectiveness. Their innovative approach centers on distillation, a process designed to streamline the model’s operations by focusing on two critical phases of task execution: problem decomposition and problem-solving. The essence of their strategy lies in the hypothesis that problem decomposition—the initial phase where complex tasks are broken down into simpler subtasks—can be distilled into smaller, more manageable models with greater ease compared to the problem-solving phase.

To test this hypothesis, the research team conducted a series of experiments to distill the decomposition capability of LLMs into smaller models. This involved separating the decomposition task from the overall problem-solving process, allowing for a targeted optimization of this initial phase. The results of their efforts were compelling: not only did the distilled decomposition models retain a high level of performance across various tasks and datasets, but they also achieved this with significantly reduced computational demands. In practical terms, this translates to a more cost-effective and efficient use of LLMs, enabling faster inference times without compromising on the quality of outcomes.

A closer examination of the performance metrics further underscores the effectiveness of the distilled models. The research team observed that the decomposed models demonstrated remarkable generalization capabilities in their experiments, performing consistently well across different tasks and datasets. Specifically, the distilled models achieved a performance level that closely mirrored that of their larger LLM counterparts but with a notable reduction in inference costs. For instance, in tasks related to mathematical reasoning and question answering, the distilled models maintained performance levels while significantly cutting down on the computational resources required.

This breakthrough research, spearheaded by the collaboration between the University of Michigan and Apple, marks a significant advancement in artificial intelligence. By successfully distilling the decomposition phase of LLMs into smaller models, the team has opened up new avenues for the efficient and effective use of these powerful tools. Their findings not only highlight the potential for cost savings and increased accessibility to LLM technology but also set the stage for further exploration into optimizing LLMs for various applications.

This work presents a compelling case for the targeted distillation of LLM capabilities as a viable strategy for enhancing model efficiency. The implications of such an approach are far-reaching, promising to accelerate the adoption and application of LLMs across a broad spectrum of industries and research domains. As the field continues to evolve, the insights gained from this project will undoubtedly contribute to the ongoing dialogue on how best to leverage the immense potential of large language models in a way that is both sustainable and impactful.


Computer Science > Computation and Language​

[Submitted on 22 Feb 2024]

Divide-or-Conquer? Which Part Should You Distill Your LLM?​

Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang
Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.
Subjects:Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:arXiv:2402.15000 [cs.CL]
(or arXiv:2402.15000v1 [cs.CL] for this version)
[2402.15000] Divide-or-Conquer? Which Part Should You Distill Your LLM?
Focus to learn more

Submission history

From: Zhuofeng Wu [view email]
[v1] Thu, 22 Feb 2024 22:28:46 UTC (232 KB)










 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946


1/1
We collaborated with the European Space Agency to open-source the largest ever earth observation dataset: Major TOM Core!

About half of the entire planet is covered. That's 2,245,886 patches of 1068 x 1068 pixels. At 10m resolution, we've got 256 million square km with over 2.5 trillion pixels.

More datasets from different satellites are in preparation and anyone can join this collaborative effort thanks to their organization on Hugging Face: Major-TOM (Major TOM).

Quoting
@mikonvergence

@esa
: “democratizing Earth Observation model development and deployment with transparent, reproducible, and traceable tools - starts with the data!”

You can explore the data here https://huggingface.co/spaces/Major-TOM/MajorTOM-Core-Viewer… & access the dataset here: https://huggingface.co/datasets/Major-TOM/Core-S2L2A…
GH6ZGXbWIAAs6MF.jpg

GH2CkdHakAA4qAy.jpg









1/8
RECORD-BREAKING EO DATASET in partnership with
@huggingface


Introducing MajorTOM-Core: the largest ML-ready Sentinel-2 dataset

We tried to cover... every single point on Earth captured by
@esa

@CopernicusEU
Sentinel-2, and we got pretty close!

More info in thread

2/8
RECORD-BREAKING EO DATASET in partnership with @huggingface

Introducing MajorTOM-Core: the largest ML-ready Sentinel-2 dataset

We tried to cover... every single point on Earth captured by @esa @CopernicusEU Sentinel-2, and we got pretty close!

More info in thread

3/8
Together with @_AliFrancis at @ESA_EO Φ-lab, we covered about half of the entire planet! That's 2,245,886 patches of 1068 x 1068 pixels, available in both L1C and L2A.

At 10 m resolution, we've got... 256 million square km - over 2.5 trillion pixels of ML-ready data

4/8
More datasets from different satellites to expand Major TOM are in progress

TOM Terrestrial Observation Metaset - a simple set of rules for building an ecosystem of ML-ready EO datasets, which can be seamlessly combined like Lego bricks


5/8
Anyone can join this effort via Major TOM organisation on @huggingface - so let's build more datasets together:


Democratizing EO model development and deployment with transparent, reproducible, and traceable tools - starts with the data!

6/8
Want to take the dataset for a spin?

Then try this Colab notebook, which shows how to search and filter the ~25 TB of data within seconds and build a local subset that works for your needs:



7/8
Paper: [2402.12095] Major TOM: Expandable Datasets for Earth Observation
GitHub: GitHub - ESA-PhiLab/Major-TOM: Expandable Datasets for Earth Observation
Project: Major-TOM (Major TOM)

Thank you to the amazing Hugging Face team for the support, with special kudos to @osanseviero, @qlhoest and @BrigitteTousi!

8/8
Thanks @TimoCodes - if you have any thoughts, feel free to reach out!
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946



1/2
#India requires now approval from the government for every AI product to be publicly launched in Indian market

This regulation could take India into stone ages #AI


@pmarca

@GaryMarcus

@ylecun
usual suspects are trying to do regulatory capture

News update

Indian Ministry of Electronics and Information Technology


2/2
“Hide and Seek” with #skynet

In the realm of science fiction, tales of artificial intelligence (AI) turning against humanity have captivated audiences for decades. Among these, the narrative of Skynet from the Terminator series stands as a chilling prophecy of what could happen…
GHym8u2bkAAinEz.jpg

GH3CrdYagAATKJj.jpg




India reverses AI stance, requires government approval for model launches​

India's Deputy IT Minister says AI advisory, which requires firms to comply with 'immediate effect,' is 'signalling that this is the future of regulation'​

Manish Singh @refsrc / 11:01 PM EST•March 3, 2024

Comment

The Rashtrapati Bhavan during sunset time, India.

Image Credits: Getty Images

India has waded into global AI debate by issuing an advisory that requires “significant” tech firms to get government permission before launching new models.

India’s Ministry of Electronics and IT issued the advisory to firms on Friday. The advisory — not published on public domain but a copy of which TechCrunch has reviewed — also asks tech firms to ensure that their services or products “do not permit any bias or discrimination or threaten the integrity of the electoral process.”

Though the ministry admits the advisory is not legally binding, India’s IT Deputy Minister Rajeev Chandrasekhar says the notice is “signalling that this is the future of regulation.” He adds: “We are doing it as an advisory today asking you to comply with it.”

In a tweet Monday, Chandrasekhar said the advisory is aimed at “untested AI platforms deploying on the India internet” and doesn’t apply to startups.

The ministry cites power granted to it through the IT Act, 2000 and IT Rules, 2021 in its advisory. It seeks compliance with “immediate effect” and asks tech firms to submit “Action Taken-cum-Status Report” to the ministry within 15 days.

The new advisory, which also asks tech firms to “appropriately” label the “possible and inherent fallibility or unreliability” of the output their AI models generate, marks a reversal from India’s previous hands-off approach to AI regulation. Less than a year ago, the ministry had declined to regulate AI growth, instead identifying the sector as vital to India’s strategic interests.

India’s move has taken many industry executives by surprise. Many Indian startups and VCs say they have been spooked by the new advisory and believe such regulation will hinder the nation’s ability to compete in the global race, where it is already lagging behind.

“I was such a fool thinking I will work bringing GenAI to Indian agriculture from SF,” wrote Pratik Desai, founder of startup Kisan AI. “We were training multimodal low cost pest and disease model, and so excited about it. This is terrible and demotivating after working 4yrs full time brining AI to this domain in India.”

Many Silicon Valley leaders also criticized India’s policy shift. Aravind Srinivas, co-founder and chief executive of Perplexity AI, one of the hottest AI startups, said the new advisory from New Delhi was a “ bad move by India.”

Martin Casado, a partner at venture firm Andreessen Horowitz, said, “Good fukking lord. What a travesty.”

The advisory follows Chandrasekhar expressing his disappointment in a certain response by Google’s Gemini last month. A user last month asked Gemini, previously known as Bard, whether India’s PM Narendra Modi was a fascist.

In response, Gemini — citing experts it didn’t identity — said Modi had been accused of implementing policies that some had characterized as fascist. Chandrasekhar reacted to the exchange by warning Google that such responses were “direct violations” of the IT Rules, 2021 as well as “several provisions of the Criminal Code.”

Non-compliance with the provisions of the IT Act and IT Rules would result in “potential penal consequences to the intermediaries or platforms or its users when identified,” the advisory adds.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946


1/4
If you're wondering what the future of consumer interactions looks like with sentient AI, you should watch this video

2/2
AMAZING Soul Seminar from last week's hackathon!!!

It's a stunning whirlwind tour of possibility, from Rick & Morty live to souls taking community members on quests, dungeon master souls, @KingBootoshi@KingBootoshi is living in the future

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946


1/3
Tuesday in Open Access ML:

- IBM silently dropping Merlinite 7b
- Moondream2 - small VLM for edge
- TripoSR - image-to-3D (StabilityAI +Tripo)
- Microsoft Orca Math dataset
- Mixedbread 2D Matryoshka embeddings
- Based by HazyResearch + Together



2/3
Tuesday in Open Access ML:

- IBM silently dropping Merlinite 7b
- Moondream2 - small VLM for edge
- TripoSR - image-to-3D (StabilityAI +Tripo)
- Microsoft Orca Math dataset
- Mixedbread 2D Matryoshka embeddings
- Based by HazyResearch + Together



3/3
Can someone explain to me what OAI gets from releasing that blog post? It's just so bad

- Arrogant takes: only OAI can train good models?
- Shows it clearly departed the non-profit way years ago
- Pushing against OS due to model capabilities...in 2016...
- They only thought…
GH_IRXLW8AASSuQ.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946



1/2
New short course: Open Source Models with Hugging Face New short course: Open Source Models with Hugging Face , taught by , taught by @mariaKhalusova@mariaKhalusova, , @_marcsun@_marcsun, and Younes Belkada! , and Younes Belkada! @huggingface@huggingface has been a game changer by letting you quickly grab any of hundreds of thousands of already-trained open source models to assemble into new applications. This course teaches you best practices for building this way, including how to search and choose among models.

You’ll learn to use the Transformers library and walk through multiple models for text, audio, and image processing, including zero-shot image segmentation, zero-shot audio classification, and speech recognition. You'll also learn to use multimodal models for visual question answering, image search, and image captioning. Finally, you’ll learn how to demo what you build locally, on the cloud, or via an API using Gradio and Hugging Face Spaces.

You can sign up here: Open Source Models with Hugging Face




What you’ll learn in this course
The availability of models and their weights for anyone to download enables a broader range of developers to innovate and create.

In this course, you’ll select open source models from Hugging Face Hub to perform NLP, audio, image and multimodal tasks using the Hugging Face transformers library. Easily package your code into a user-friendly app that you can run on the cloud using Gradio and Hugging Face Spaces.

You will:

Use the transformers library to turn a small language model into a chatbot capable of multi-turn conversations to answer follow-up questions.

Translate between languages, summarize documents, and measure the similarity between two pieces of text, which can be used for search and retrieval.

Convert audio to text with Automatic Speech Recognition (ASR), and convert text to audio using Text to Speech (TTS).
Perform zero-shot audio classification, to classify audio without fine-tuning the model.

Generate an audio narration describing an image by combining object detection and text-to-speech models.

Identify objects or regions in an image by prompting a zero-shot image segmentation model with points to identify the object that you want to select.

Implement visual question answering, image search, image captioning and other multimodal tasks.
Share your AI app using Gradio and Hugging Face Spaces to run your applications in a user-friendly interface on the cloud or as an API.

The course will provide you with the building blocks that you can combine into a pipeline to build your AI-enabled applications!
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946






1/7
My recap of AI research last month: A Potential LoRA Successor, Small Finetuned LLMs Vs Generalist LLMs, and Transparent LLM Research, Research Papers in February 2024: A LoRA Successor, Small Finetuned LLMs Vs Generalist LLMs, and Transparent LLM research

Once again, this has been an exciting month in AI research. This month, I’m covering two new openly available LLMs, insights into small finetuned LLMs, and a new parameter-efficient LLM finetuning technique.

The two LLMs mentioned above stand out for several reasons. One LLM (OLMo) is completely open source, meaning that everything from the training code to the dataset to the log files is openly shared.

The other LLM (Gemma) also comes with openly available weights but achieves state-of-the-art performance on several benchmarks and outperforms popular LLMs of similar size, such as Llama 2 7B and Mistral 7B, by a large margin.

2/7
Thanks! This probably took me a solid day ... that was the long presidents' day weekend if I recall correctly .

Regarding the latex, that's a weird Substack thing with equations. It usually goes away when refreshing the page.

4/7
I wish I had good advice and some useful tips ... but the truth is that I am just super interesting in the topic and get up very very early every day, haha

5/7
Thanks so much!

6/7
There are usually two sets of benchmarks, Q&A benchmarks (e.g., like the ones used on the HF leaderboard) and conversational benchmarks (e.g., MT-Bench and AlpacaEval)

7/7
Glad to hear. I also write them for future self -- I often search my arxiv archive when trying to find a paper (again). Nice that others find it useful as well!
GHxSELIW0AA5eaO.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,252
Reputation
8,635
Daps
161,946





1/5
Sunday morning read: The Shift from Models to Compound AI Systems from the Berkeley AI Research blog

A compound AI system is a system that contains multiple interacting components, such as calls to models, retrievers, or external tools.

Why state-of-the-art results come from compound systems:
1. System design: Building a compound system that calls the model multiple times can increase the performance more than scaling an individual LLM.
2. Dynamic systems: Add a component for search and retrieval to keep data fresh.
3. Quality control: Controlling the model’s behavior with systems is easier. It is inevitable that the LLM will hallucinate, but combining the LLM with retrieval and asking for citations can increase trust (STORM by Stanford researchers is a great example of this).
4. Performance goals: Performance and cost requirements differ by application. It is cheaper to have specialized models!

The blog also goes into detail about developing compound AI systems, the key challenges, and emerging paradigms!

Blog: The Shift from Models to Compound AI Systems

2/5
Here is a demo on building a 4-layer DSPy program using @weaviate_io as the retriever: recipes/integrations/dspy/2.Writing-Blog-Posts-with-DSPy.ipynb at main · weaviate/recipes generate blog posts from questions with these four components:
1. Question to outline
2. Topic to paragraph
3. Review the blog
4. Generate a title

3/5
Don’t hate the player, hate the game Taking inspo from the best @CShorten30!

4/5
Thanks, Connor! It's a great read and loved the table showing how recent systems fit into this design

5/5
You got it
GHws1FUXEAEe8dS.jpg

GHxAgGtWgAACMXc.jpg
 
Top