bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,864

Is This the End of ‘Intel Inside’?​

Newcomers pose numerous challenges to decades of ‘Wintel’ chip dominance.​

[/CENTER]

im-893474

JASON SCHNEIDER[/SIZE]

By
Christopher Mims
Follow
Dec. 1, 2023 9:00 pm ET



It might not look like it yet, but Intel is in a fight for its life.

The stakes for its employees and investors are high, and are likely to turn on some fierce battles for market share that will play out in 2024 and beyond.

For the everyday consumer, what’s at stake is mostly nostalgia. One day, the little “Intel Inside” sticker that’s been on PCs since 1991 could cease to exist.

Instead of an Intel chip, these computers could have processors from an array of manufacturers, principally Qualcomm, but also possibly Nvidia, AMD, and lesser-known companies like Santa Clara, Calif.-based Amlogic and Taiwan-based MediaTek.

What’s happening now is a tipping point decades in the making. Ever since a little chip-design company called ARM built the mobile processor for Apple’s first Newton personal digital assistant, which came out in 1993, it’s been gaining steam, primarily in the mobile-phone business. By the time Intel sought to enter the mobile-processor business in 2011, it was too late.

Apple was the first company to bet that ARM-based processors—thought by many to be useful only in phones—could be the brains of even the most powerful desktop computers. This gave Apple a huge head start over Intel, and the rest of the industry, in designing chips that prioritized power-sipping performance in a world where that’s become the primary limiting factor in the performance of all devices, not just phones.

Now, Google, Qualcomm, Amazon, Apple and others can use ARM’s blueprints to custom-design the chips that power everything from phones and notebooks to cloud servers. These chips are then typically produced by Samsung
or Taiwan-based TSMC, which focus on making chips for other companies.

The threats to Intel are so numerous that it’s worth summing them up: The Mac and Google’s Chromebooks are already eating the market share of Windows-based, Intel-powered devices. As for Windows-based devices, all signs point to their increasingly being based on non-Intel processors. Finally, Windows is likely to run on the cloud in the future, where it will also run on non-Intel chips.

Apple has moved almost entirely away from Intel’s chips, which it used for over a decade for all of its desktop and notebook computers. At the same time, its overall market share for desktops and notebooks has climbed from around 12% of devices in the U.S. in 2013 to nearly one in three today, according to Statcounter.

These days, it’s not just Apple moving away from Intel’s chips. Microsoft
is accelerating its yearslong effort to make Windows run on ARM-based processors, so that the entire PC ecosystem isn’t doomed by Intel’s failure to keep up with Apple and TSMC. Google’s Chrome OS, which works with either Intel or ARM-based chips, is also an emerging threat to Microsoft.

This means the threat to Intel comes from a whole ecosystem of companies with deep pockets and sizable profit margins, each trying to take their piece of the company’s market share. In many ways, it really is Intel versus the world—and “the world” includes nearly every tech giant you can name.

It wasn’t always this way. For decades, Intel enjoyed PC market dominance with its ride-or-die partner, Microsoft, through their “Wintel” duopoly.

It’s ironic, then, that Microsoft is one of the companies leading the charge away from Intel’s chips.

This estrangement is taking several forms, which shows how seriously Microsoft is taking this shift away from Intel. Microsoft declined to comment for this column.

Microsoft is working to make Windows and the rest of its software accessible in the cloud, which can save money for customers because it lets them use computers that are much cheaper and simpler than conventional PCs. It also means that ARM-based devices can be put on workers’ desks in place of more powerful, Intel-powered ones. And the version of Windows that workers are accessing remotely, in the cloud, can run on ARM-based chips in the data center too.

In mid-November, Microsoft unveiled its first ARM-based custom chips. One of them, called Cobalt, is intended to live in data centers and could power such cloud-based Windows experiences. Qualcomm also has forthcoming ARM-based chips for notebook computers.

These efforts are getting a boost from Amazon, which recently unveiled a small cube-shaped PC-like device that can stream Windows and applications from the cloud—like Netflix, but for software instead of entertainment. It’s a repurposed Fire TV Cube streaming device, costs $200, and is powered by an ARM-based chip from Amlogic.

Qualcomm also has forthcoming ARM-based chips for notebook computers, but these are intended not merely to connect these devices to the cloud. Rather, they’ll directly replace Intel’s processors, handling heavy workloads within the device itself. At the same time, they’re intended to go head-to-head with Apple’s best chips. Key to their adoption: Microsoft is putting a huge amount of effort into making Windows run on these processors, while encouraging developers of apps to do the same.

I asked Dan Rogers, vice president of silicon performance at Intel, if all of this is keeping him up at night. He declined to comment on Intel’s past, but he did say that since Pat Gelsinger, who had spent the first 30 years of his career at Intel, returned to the company as CEO in 2021, “I believe we are unleashed and focused, and our drive in the PC has in a way never been more intense.”



SHARE YOUR THOUGHTS​


What is your outlook for Intel? Join the conversation below.

Intel plans a new generation of chips in what Rogers calls the “thin and light” category of notebooks, where Apple has been beating the pants off Intel-powered Windows devices.

In terms of advanced chip-manufacturing technology, Intel has promised to catch up with its primary competitor, Taiwan-based TSMC, by 2025.

The consumer-electronics business is full of reversals, and Intel is still a strong competitor, so none of this is predestined.

Geopolitical factors, for one, have the potential to change the entire chip industry virtually overnight. Intel could suddenly become the only game in town for the most advanced kind of chip manufacturing, if American tech companies lose access to TSMC’s factories on account of China’s aggression toward Taiwan, says Patrick Moorhead, a former executive at Intel competitor AMD, and now head of tech analyst firm Moor Insights & Strategy.

When it comes to Intel, he adds, “Never count these guys out.”

For more WSJ Technology analysis, reviews, advice and headlines, sign up for our weekly newsletter.

Write to Christopher Mims at christopher.mims@wsj.com
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,864

Research

Introducing Ego-Exo4D: A foundational dataset for research on video learning and multimodal perception

November 30, 2023•
8 minute read

405384771_249757221189904_9161450057120276175_n.png


Today we are announcing Ego-Exo4D, a foundational dataset and benchmark suite to support research on video learning and multimodal perception. The result of a two-year effort by Meta’s FAIR (Fundamental Artificial Intelligence Research), Meta’s Project Aria, and 15 university partners, the centerpiece of Ego-Exo4D is its simultaneous capture of both first-person “egocentric” views, from a participant’s wearable camera, as well as multiple “exocentric” views, from cameras surrounding the participant. The two perspectives are complementary. While the egocentric perspective reveals what the participant sees and hears, the exocentric views reveal the surrounding scene and the context. Together, these two perspectives give AI models a new window into complex human skill.


405223340_1511470296351868_8087525909299673245_n.gif


Working together as a consortium, FAIR or university partners captured these perspectives with the help of more than 800 skilled participants in the United States, Japan, Colombia, Singapore, India, and Canada. In December, the consortium will open source the data (including more than 1,400 hours of video) and annotations for novel benchmark tasks. Additional details about the datasets can be found in our technical paper. Next year, we plan to host a first public benchmark challenge and release baseline models for ego-exo understanding. Each university partner followed their own formal review processes to establish the standards for collection, management, informed consent, and a license agreement prescribing proper use. Each member also followed theProject Aria Community Research Guidelines. With this release, we aim to provide the tools the broader research community needs to explore ego-exo video, multimodal activity recognition, and beyond.

406270828_1366711530899272_2933140689624174759_n.png

How Ego-Exo4D works

Ego-Exo4D focuses on skilled human activities, such as playing sports, music, cooking, dancing, and bike repair. Advances in AI understanding of human skill in video could facilitate many applications. For example, in future augmented reality (AR) systems, a person wearing smart glasses could quickly pick up new skills with a virtual AI coach that guides them through a how-to video; in robot learning, a robot watching people in its environment could acquire new dexterous manipulation skills with less physical experience; in social networks, new communities could form based on how people share their expertise and complementary skills in video.

Such applications demand the ability to move fluidly between the exo and ego views. For example, imagine watching an expert repair a bike tire, juggle a soccer ball, or fold an origami swan—then being able to map their steps to your own actions. Cognitive science tells us that even from a very young age we can observe others’ behavior (exo) and translate it onto our own (ego).

Realizing this potential, however, is not possible using today's datasets and learning paradigms. Existing datasets comprised of both ego and exo views (i.e., ego-exo) are few, small in scale, lack synchronization across cameras, and/or are too staged or curated to be resilient to the diversity of the real world. As a result, the current literature for activity understanding primarily covers only the ego or exo view, leaving the ability to move fluidly between the first- and third-person perspectives out of reach.

Ego-Exo4D constitutes the largest public dataset of time-synchronized first- and third- person video. Building this dataset required the recruitment of specialists across varying domains, bringing diverse groups of people together to create a multifaceted AI dataset. All scenarios feature real-world experts, where the camera-wearer participant has specific credentials, training, or expertise in the skill being demonstrated. For example, among the Ego-Exo4D camera wearers are professional and college athletes; jazz, salsa, and Chinese folk dancers and instructors; competitive boulderers; professional chefs who work in industrial-scale kitchens; and bike technicians who service dozens of bikes per day.

Ego-Exo4D is not only multiview, it is also multimodal. Captured with Meta’s unique Aria glasses, all ego videos are accompanied by time-aligned seven channel audio, inertial measurement units (IMU), and two wide-angle grayscale cameras, among other sensors. All data sequences also provide eye gaze, head poses, and 3D point clouds of the environment through Project Aria’s state-of-the-art machine perception services. Additionally, Ego-Exo4D provides multiple new video-language resources:


  • First-person narrations by the camera wearers describing their own actions.
  • Third-person play-by-play descriptions of every camera wearer action
  • Third-person spoken expert commentary critiquing the videos. We hired 52 people with expertise in particular domains, many of them coaches and teachers, to provide tips and critiques based on the camera wearer’s performance. At each time step, the experts explain how the participants’ actions, such as their hand and body poses, affect their performance, and provide spatial markings to support their commentary.

All three language corpora are time-stamped against the video. With these novel video-language resources, AI models could learn about the subtle aspects of skilled human activities. To our knowledge, there is no prior video resource with such extensive and high quality multimodal data.

Alongside the data, we introduce benchmarks for foundational tasks for ego-exo video to spur the community's efforts. We propose four families of tasks:


  1. Ego(-exo) recognition: recognizing fine-grained keysteps of procedural activities and their structure from ego (and/or optionally exo) video, even in energy-constrained scenarios;
  2. Ego(-exo) proficiency estimation: inferring how well a person is executing a skill;
  3. Ego-exo relation: relating the actions of a teacher (exo) to a learner (ego) by estimating semantic correspondences and translating viewpoints; and
  4. Ego pose: recovering the skilled movements of experts from only monocular ego-video, namely 3D body and hand pose.

We provide high quality annotations for training and testing each task—the result of more than 200,000 hours of annotator effort. To kickstart work in these new challenges, we also develop baseline models and report their results. We plan to host a first public benchmark challenge in 2024.



406886129_866085048493148_3000829008893060003_n.jpg

406883464_1383969772522837_609021011767765469_n.jpg

405203662_841012317803452_9637809064688481_n.jpg

405314124_3506833022925259_8114322311954892666_n.jpg

405286924_887884849151659_1066589885864524765_n.jpg

406886129_866085048493148_3000829008893060003_n.jpg

406883464_1383969772522837_609021011767765469_n.jpg





Collaboratively building on this research

The Ego4D consortium is a long-running collaboration between FAIR and more than a dozen universities around the world. Following the 2021 release of Ego4D, this team of expert faculty, graduate students, and industry researchers reconvened to launch the Ego-Exo4D effort. The consortium’s strengths are both its collective AI talent as well as its breadth in geography, which facilitates recording data in a wide variety of visual contexts. Overall, Ego-Exo4D includes video from six countries and seven U.S. states, offering a diverse resource for AI development. The consortium members and FAIR researchers collaborated throughout the project, from developing the initiative’s scope, to each collecting unique components of the dataset, to formulating the benchmark tasks. This project also marks the single largest coordinated deployment of the Aria glasses in the academic research community, with partners at 12 different sites using them.

In releasing this resource of unprecedented scale and variety, the consortium aims to supercharge the research community on core AI challenges in video learning. As this line of research advances, we envision a future where AI enables new ways for people to learn new skills in augmented reality and mixed reality (AR/MR), where how-to videos come to life in front of the user, and the system acts as a virtual coach to guide them through a new procedure and offer advice on how to improve. Similarly, we hope it will enable robots of the future that gain insight about complex dexterous manipulations by watching skilled human experts in action. Ego-Exo4D is a critical stepping stone to enable this future, and we can’t wait to see what the research community creates with it.




Visit the Ego-Exo4D website

Read the paper

Learn more about Project Aria Research Kit





 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,864


A Language Agent for Autonomous Driving​






Computer Science > Computer Vision and Pattern Recognition​

[Submitted on 17 Nov 2023 (v1), last revised 27 Nov 2023 (this version, v3)]

A Language Agent for Autonomous Driving​

Jiageng Mao, Junjie Ye, Yuxi Qian, Marco Pavone, Yue Wang
Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our approach on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods. Code will be released.
Comments:Project Page: this https URL
Subjects:Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)
Cite as:arXiv:2311.10813 [cs.CV]
(or arXiv:2311.10813v3 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2311.10813
Focus to learn more

Submission history​

From: Jiageng Mao [view email]
[v1] Fri, 17 Nov 2023 18:59:56 UTC (6,479 KB)
[v2] Tue, 21 Nov 2023 01:24:36 UTC (6,479 KB)
[v3] Mon, 27 Nov 2023 20:53:35 UTC (15,211 KB)

 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,864


AutoGen​

Build LLM applications via multiple agents

AutoGen_header_1920x720.jpg


AutoGen provides a multi-agent conversation framework as a high-level abstraction. It is an open-source library for enabling next-generation LLM applications with multi-agent collaborations, teachability and personalization. With this framework, users can build LLM workflows. The agent modularity and conversation-based programming simplifies development and enables reuse for developers. End-users benefit from multiple agents independently learning and collaborating on their behalf, enabling them to accomplish more with less work. Benefits of the multi agent approach with AutoGen include agents that can be backed by various LLM configurations; native support for a generic form of tool usage through code generation and execution; and, a special agent, the Human Proxy Agent that enables easy integration of human feedback and involvement at different levels.



Easily build LLM workflows​

With AutoGen, building a complex multi-agent conversation system boils down to:


  • Defining a set of agents with specialized capabilities and roles.
  • Defining the interaction behavior between agents, i.e., what to reply when an agent receives messages from another agent.

AutoGen

Read the paper



Related projects​

AutoGen is an open-source, community-driven project under active development (as a spinoff from FLAML, a fast library for automated machine learning and tuning), which encourages contributions from individuals of all backgrounds. Many Microsoft Research collaborators have made great contributions to this project, including academic contributors like Pennsylvania State University and the University of Washington, and product teams like Microsoft Fabric and ML.NET. AutoGen aims to provide an effective and easy-to-use framework for developers to build next-generation applications, and already demonstrates promising opportunities to build creative applications and provide a large space for innovation.

More about FLAML
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,864









https://medium.com/slope-stories/slope-transformer-the-first-llm-trained-to-understand-the-language-of-banks-88adbb6c8da9


Slope TransFormer: The first LLM trained to understand the language of banks


Alex Wu
Follow


1*FbKukA5DdQAbrwWjvlE9Eg.gif

Today, we’re excited to share that we’ve developed the first Large Language Model (LLM) trained specifically to understand the language of banks: Slope TransFormer. It categorizes messy bank transaction data with speed and accuracy that surpass Plaid, ChatGPT, and humans. As the successor to SlopeGPT, it is the first LLM we’ve trained in-house.


We will share the motivation for it, the methodology used, and its results — including how it stacks up to existing solutions. We will end with some immediate applications, and how it fits into our vision of redefining how underwriting is done.


Why do we care about transactions?

First, some context. At Slope, we spend a lot of time with bank transactions. Why? Simply put, we are a payments company, and central to every payments company is risk. To that end, there is no better way to understand a business — what it’s been up to, its financial outlook, its fraud risk — than looking at every $ that flows in and out of it. The transaction, as we see it, is the atomic unit of business. It is the lifeblood.

Additionally, bank transactions have 2 critical properties:


  • Real-time. Thanks to Open Banking (e.g. Plaid), once a business connects their bank accounts, we will see every new $ that flows in and out of the business in real-time.
  • Unfalsifiable. Thanks to banks, a transaction is proof of an exchange of money. One cannot fake a transaction that’s pulled directly from their bank’s records (contrast this to an income statement).

At Slope, we strive to understand our customers deeply. Doing so not only enables us to assess risk, but fundamentally to build better products for our customers: from AR automation, to payments, to financing that’s personalized to a business’s unique needs. Transaction fluency, therefore, is a fundamental problem for Slope.


However, transactions are hard to understand.

The issue is that transactions are not written in English, or even a single language, for that matter. It is a language of many dialects: a single transaction type can be expressed 10 different ways across 10 different banks:

1*lqMOF7-bdjCDgiopH5zzDQ.png

These are all payments from Shopify.

Additionally, a transaction can be complex. It may have components that represent different counterparties, channels, and intermediaries which obscure the true flow of money. This opaqueness is only furthered by the rise of payment processors and middlemen (e.g. PayPal, Zelle, and even Slope). Can you see where the money is going here?

[B]BILL.COM DES:ACCTVERIFY ID:025AYXVFMTBCRRX INDN:DAVID VAN ARCH CO ID:XXXXX634527 CCD[/B]

If you consider the combinations of (bank dialects X merchants X intermediaries) — and also that a “merchant” can be any individual or entity in the world, and that new intermediaries are spawning every day — it becomes clear that transactions cannot be solved with traditional, rules-based methods. It is a high-dimensional, long-tail problem that even specialist companies often struggle to get right.

1*D3E6UgIcGnMKxzeETI65pQ.png


What about existing solutions?

Plaid

As our Open Banking provider, Plaid serves us transaction data pulled directly from our customers’ bank accounts. On top of this, Plaid tags the counterparty of each transaction (e.g. Shopify). But only sometimes. We found that Plaid gives us less than 50% coverage across our customers’ transactions:

1*Rc-osijaGJ5LYkcLK20Cqg.png

And even when tags are provided, they can be noisy. Some examples:


  1. Noisy labels for even well-known merchants:

1*Op0td2cfuJwtuNHs7x-V2w.png

2. Confusing the person, Aldo, for the company, Aldo:


1*ACtHw1CouVmceX2LgXmvZw.png

3. A single description resulting in a wide range of labels:


1*unmDVeXlOpfMV6mEGjJCqA.png

While some of these mistakes may seem elementary, producing accurate tags on a consistent basis is a deceptively difficult task – with many hidden tradeoffs. For the most part, Plaid does a very good job. But in our application — B2B risk assessment — we have especially strict requirements when it comes to accuracy, coverage, and explainability. We cannot afford a mistake with so much on the line.



ChatGPT

What about LLMs? There are 2 promising properties of LLMs in the context of transaction tagging: 1) their ability to extract meaning from unstructured data and 2) their pre-trained knowledge of the world. Here are some of our experiments with ChatGPT:

1*NkmLb0_gPTtBmknRfmvBxA.png

Wrong answer & super wordy.

1*_b_ikfrmGd4UJsxWd0Q5yA.png

Better with some prompt engineering, but still wordy.

Assuming we solve for accuracy and wordiness, there are still fundamental issues with a chat-based approach: unpredictability (the same prompt asked 10x may give you 10 different responses) and scalability (slow and expensive to hit an API 1000’s of times for a single customer). Yet, we saw promise. We began to believe that in some form, LLMs held the key to our problem.


SlopeGPT

Earlier this year, we launched SlopeGPT. Using GPT embeddings, we clustered transactions by semantic similarity. This allowed us to reliably group transactions into distinct cashflows without explicitly labeling them. Additionally, as the clustering happened at the customer level, the cashflows were fit uniquely to each business.

The impact was massive: from raw transactions emerged a rich story. We could now see individual streams of incomes and expenses, how they changed over time, and where they were headed. It was a major leap forward in our ability to understand our customers. Still, it had limitations:

  1. The resulting clusters were unlabeled: it could tell you which transactions likely belonged to the same cashflow streams, but not what those streams were.
  2. It was not optimized for financial data. We used out-of-the-box GPT embeddings, meaning we used English semantic similarity as a proxy for transaction semantic similarity. It worked surprisingly well, but we believed we could do better.
  3. It was slow: ~500 ms/txn. This may seem fast, but a single customer may have thousands of transactions. Our SLA for underwriting is 7s.

We’re excited to say that TransFormer overcomes all these limitations.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,864

Meet Slope TransFormer: A Large Language Model (LLM) Trained Specifically to Understand the Language of Banks

By

Niharika Singh

November 28, 2023​

In payments, understanding transactions is crucial for assessing risks in businesses. However, deciphering messy bank transaction data poses a challenge, as it is expressed in various ways across different banks. Existing solutions like Plaid and ChatGPT have limitations, such as low coverage and wordiness. To address this, a new solution called Slope TransFormer has been developed—a Large Language Model (LLM) specifically trained to understand the language of banks.

Transactions are challenging to understand because they come in different forms, making traditional, rules-based methods ineffective. Plaid, a standard Open Banking provider, offers less than 50% coverage transaction data, and its labels can be noisy and confusing. LLMs like ChatGPT promise to extract meaning from unstructured data but need help with unpredictability and scalability.

Slope TransFormer, the new solution, overcomes these challenges by being a proprietary LLM fine-tuned to extract meaning from bank transactions. It addresses the limitations of its predecessor, SlopeGPT, by providing accurate and concise counterparty labels in an interpretable way. The key to its success lies in defining a new language during training, focusing solely on extracting the merchant name from transactions.

Using an efficient base model, OPT-125M, and a fine-tuning algorithm called LoRA, TransFormer achieves remarkable speed—labeling over 500 transactions per second, a 250x speedup over SlopeGPT. It boasts over 72% exact match accuracy against human experts, outperforming Plaid, which achieves only 62%. The solution is accurate and highly consistent, making it reliable in a production system.


[Featured AI Model] Check out LLMWare and It's RAG- specialized 7B Parameter LLMs


TransFormer’s performance has already led to its deployment in live credit monitoring dashboards. Its efficiency and functionality provide a detailed view into businesses, allowing for monitoring changing risks, alerting to abnormal events, and applying automated adjustments. The ultimate goal is to use TransFormer to power the entire underwriting system, reaching a precise understanding of businesses beyond traditional financials.

In conclusion, Slope TransFormer marks a significant milestone in redefining how underwriting is done in the B2B economy. Its efficiency, accuracy, and interpretability pave the way for a more precise understanding of businesses, unlocking new real-time signals to monitor and manage risks. This advancement aligns with the broader vision of SlopeAI to digitize the world’s B2B economy, using AI to automate workflows and eliminate inefficiencies that have hindered progress for decades.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,864






ziSzPyj.png

1BkRkRh.png




Computer Science > Machine Learning​

[Submitted on 1 Dec 2023]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces​

Albert Gu, Tri Dao
Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:arXiv:2312.00752 [cs.LG]
(or arXiv:2312.00752v1 [cs.LG] for this version)
[2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Focus to learn more

Submission history​

From: Albert Gu [view email]
[v1] Fri, 1 Dec 2023 18:01:34 UTC (1,264 KB)





 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
58,206
Reputation
8,623
Daps
161,864

About​

Magicoder: Source Code Is All You Need

large-language-models ai4code llm llm4code

🎩 Magicoder: Source Code Is All You Need​



🎩 Models | 📚 Dataset | 🚀 Quick Start | 👀 Demo | 📝 Citation | 🙏 Acknowledgements

Important
We are keeping improving the documents and adding more implementation details. Please stay tuned!

About​

  • 🎩Magicoder is a model family empowered by 🪄OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets for generating low-bias and high-quality instruction data for code.
  • 🪄OSS-Instruct mitigates the inherent bias of the LLM-synthesized instruction data by empowering them with a wealth of open-source references to produce more diverse, realistic, and controllable data.
Overview of OSS-Instruct

🎩 Models​

ModelCheckpointSizeHumanEval (+)MBPP (+)DemoLicense
Magicoder-CL-7B🤗 HF Link7B60.4 (55.5)64.2 (52.6)--Llama2
Magicoder-S-CL-7B🤗 HF Link7B70.7 (66.5)68.4 (56.6)--Llama2
Magicoder-DS-6.7B🤗 HF Link6.7B66.5 (60.4)75.4 (61.9)--DeepSeek
Magicoder-S-DS-6.7B🤗 HF Link6.7B76.8 (70.7)75.7 (64.4)--DeepSeek

📚 Dataset​

Note
Magicoder models are trained on the synthetic data generated by gpt-3.5-turbo-1106 developed by OpenAI. Please pay attention to OpenAI's terms of use when using the models and the datasets.





ECQbKnq.png
 
Top