bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,857



EucGNZA.png

5gGRV2K.png
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,857

Researchers from China Introduce DualToken-ViT: A Fusion of CNNs and Vision Transformers for Enhanced Image Processing Efficiency and Accuracy​

By
Aneesh Tickoo
-

October 1, 2023


In recent years, vision transformers (ViTs) have become a potent architecture for various vision applications, including object identification and picture classification. This is because, whereas the size of the convolutional kernel constrains convolutional neural networks (CNNs) and can only extract local information, self-attention can remove global information from the picture, delivering adequate and meaningful visual characteristics. There still needs to be an indication of performance saturation as the size of the dataset and the model for ViTs rise, which is a benefit over CNNs for both big models and huge datasets. Due to several inductive biases ViTs lack, CNNs are preferable over ViTs in lightweight models.

Self-attention’s quadratic complexity contributes to the potentially high computational cost of ViTs. Consequently, it isn’t easy to build lightweight, effective ViTs. Propose a pyramid structure that separates the model into multiple stages, with the number of tokens reducing and the number of channels growing per stage to construct more effective and lightweight ViTs. Emphasis on streamlining and refining self-attention structure to mitigate its quadratic complexity, but at the expense of attention’s usefulness. A typical strategy is to downsample the key and value of self-attention, which reduces the amount of tokens engaged in the process.

By conducting self-attention on the grouped tokens independently, certain locally grouped self-attention-based works lower the complexity of the overall attention component. Still, such techniques may harm the sharing of global knowledge. Some efforts additionally provide a few extra teachable parameters to enhance the backbone’s global information, including adding the branch of global tokens used at all stages. Local attention techniques like locally grouped self-attention-based and convolution-based structures can be enhanced using this method. However, all existing international token approaches only consider global information and disregard positional information, which is crucial for vision tasks.

YcrUcUBVMX9lw5Judf9M5jl8FBO-cRTRbU6OzoteCwkqLOWHKlfKFdCc5z2wTQOsSYpvT_g8OdZtvyewSxJP57DgMPRnbzDrO0UfSWiCn0GmcCZJBzhV6df6ZxSMQlEHDc9v6wC6DKufIrMpMB6kIYE

Figure 1: A visualization of the attention map for the key token (the most crucial component of the picture for the image classification challenge) and position-aware global tokens. The first picture in each row serves as the model’s input, while the second image depicts the correlation between each token in the position-aware global tokens, which each comprise seven tokens, where the red-boxed section is the first image’s key token.


In this study, researchers from East China Normal University and Alibaba Group put forth the DualToken-ViT, a compact and effective vision transformer model. Their suggested paradigm replaces self-attention with a more effective attentional framework. Convolution and self-attention are used together to extract local and global information. The outputs of the two processes are then fused to create an effective attention structure. Although window self-attention may also remove local information, they find that their lightweight model’s convolution is more effective. They step-wise downsample the feature map that creates key and value to retain more information throughout the downsampling process. This can lower the computational cost of self-attention in global information broadcasting.

Additionally, they employ position-aware global tokens at every level to improve global data quality. Their position-aware global tokens can also maintain and pass on picture location information, providing their model an edge in vision tasks, in contrast to the standard global tokens. The efficacy of their position-aware global tokens is seen in Figure 1, where the key token in the image produces a greater correlation with the equivalent tokens in the position-aware global tokens.

In a nutshell, their contributions are as follows:
• They develop a compact and effective vision transformer model called DualToken-ViT, which fuses local and global tokens containing local and global information, respectively, to achieve an efficient attention structure by combining the benefits of convolution and self-attention.
• They also suggest position-aware global tokens, which would expand the global information by including the image’s location data.
• Their DualToken-ViT exhibits the greatest performance on image classification, object identification, and semantic segmentation among vision models of the same FLOPs magnitude.


Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.


 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,857

The synthetic social network is coming​

Between ChatGPT’s surprisingly human voice and Meta’s AI characters, our feeds may be about to change forever​


CASEY NEWTON

SEP 28, 2023
110
2
Share


Meta’s AI assistant characters (Meta)
Meta’s AI assistant characters (Meta)


Today, let’s consider the implications of a truly profound week in the development of artificial intelligence, and discuss whether we may be witnessing the rise of a new era in the consumer internet.

I.

On Monday, OpenAI announced the latest updates for ChatGPT. One feature lets you interact with its large language model via voice. Another lets you upload images and ask questions about them. The result is that a tool which was already useful for lots of things suddenly became useful for much more. For one thing, ChatGPT feels even more powerful as a mobile app: you can now chat with it while walking around town, or snap a picture of a tree and ask the app what you’re looking at.

For another, though, adding a voice to ChatGPT begins to give it a hint of personality. I don’t want to overstate the case here — the app typically generates dry, sterile text unadorned by any hint of style. But something changes when you begin speaking with the app in one of its five native voices, which are much livelier and more dynamic than what we are used to with Alexa or the Google assistant. The voices are earnest, upbeat, and — by nature of the fact that they are powered by an LLM — tireless.

It is the earliest stage of all this; access to the voice feature is just rolling out to ChatGPT Plus subscribers, and free users won’t be able to us it for some time. And yet even in this 1.0 release, you can see the clear outlines of the sort of thing popularized in the decade-old film Her: a companion so warm, empathetic and helpful that in time its users fall in love with it. The Her comparisons are by now cliche when discussing AI in Silicon Valley, and yet until now its basic premise has felt like a distant sci-fi dream. On Thursday I asked the speaking version of ChatGPT to give me a pep talk to hit my deadline — I was running back from the Code Conference and already behind schedule — and as the model did its best to gas me up, it seemed to me that AI had taken an emotional step forward.

You can imagine the next steps here. A bot that gets to know your quirks; remembers your life history; offers you coaching or tutoring or therapy; entertains you in whichever way you prefer. A synthetic companion not unlike the real people you encounter during the day, only smarter, more patient, more empathetic, more available.

Those of us who are blessed to have many close friends and family members in our life may look down on tools like this, experiencing what they offer as a cloying simulacrum of the human experience. But I imagine it might feel different for those who are lonely, isolated, or on the margins. On an early episode of Hard Fork, a trans teenager sent in a voice memo to tell us about using ChatGPT to get daily affirmations about identity issues. The power of giving what were then text messages a warm and kindly voice, I think, should not be underestimated.

II.

OpenAI tends to present its products as productivity tools: simple utilities for getting things done. Meta, on the other hand, is in the entertainment business. But it, too, is building LLMs, and on Wednesday the company revealed that it has found its own uses for generative AI and voices.

In addition to an all-purpose AI assistant, the company unveiled 28 personality-driven chatbots to be used in Meta’s messaging apps. Celebrities including Charli D’Amelio, Dwyane Wade, Kendall Jenner, MrBeast, Snoop Dogg, Tom Brady, and Paris Hilton lent their voices to their effort. Each of their characters comes with a brief and often cringeworthy description; MrBeast’s Zach is billed as “the big brother who will roast you — because he cares.”

All of this feels like an intermediate step to me. To the extent that there is a market of people who want to have voice chats with a synthetic version of MrBeast, the character they want to interact with is MrBeast — not big brother Zach. I haven’t been able to chat with any of these character bots yet, but I struggle to understand how they will have more than passing novelty value.

At the same time, this technology is new enough that I imagine celebrities aren’t yet willing to entrust their entire personas to Meta for safekeeping. Better to give people a taste of what it’s like to talk to AI Snoop Dogg and iron out any kinks before delivering the man himself. And when that happens, the potential seems very real. How many hours would fans spend talking to a digital version of Taylor Swift this year, if they could? How much would they pay for the privilege?

While we wait to learn the answers, a new chapter of social networking may be beginning. Until now when we have talked about AI in consumer apps it has mostly had to do with ranking: using machine-learning tools to create more engaging and personalized feeds for billions of users.

This week we got at least two new ways to think about AI in social feeds. One is AI-generated imagery, in the form of the new stickers coming to the Meta’s messaging apps. It’s unclear to me how much time people want to spend creating custom images while they text their friends, but the demonstrations seemed nice enough.

More significantly, I think, is the idea that Meta plans to place its AI characters on every major surface of its products. They have Facebook pages and Instagram accounts; you will message them in the same inbox that you message your friends and family. Soon, I imagine they will be making Reels.

And when that happens, feeds that were once defined by the connections they enabled between human beings will have become something else: a partially synthetic social network.

Will it feel more personalized, engaging, and entertaining? Or will it feel uncanny, hollow, and junky? Surely there will be a range of views on this. But either way, I think, something new is coming into focus.
 
Last edited:

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,857

Pentagon Urges AI Companies to Share More About Their Technology​

  • Defense Department is holding symposium to discuss AI
  • Official says agency wants to use the algorithms safely

AI software relies on large language models, which use massive data sets to power tools such as chatbots and image generators.

AI software relies on large language models, which use massive data sets to power tools such as chatbots and image generators.

Photographer: AFP/Getty Images

Gift this article
Have a confidential tip for our reporters? Get in Touch
Before it’s here, it’s on the Bloomberg Terminal
LEARN MORE


By Katrina Manson

September 29, 2023 at 4:31 PM EDT

Save

The Defense Department’s top artificial intelligence official said the agency needs to know more about AI tools before it fully commits to using the technology and urged developers to be more transparent.


Craig Martell, the Pentagon’s chief digital and artificial intelligence officer, wants companies to share insights into how their AI software is built — without forfeiting their intellectual property — so that the department can “feel comfortable and safe” adopting it.

AI software relies on large language models, or LLMs, which use massive data sets to power tools such as chatbots and image generators. The services are typically offered without showing their inner workings — in a so-called black box. That makes it hard for users to understand how the technology comes to decisions or what makes it get better or worse at its job over time.

“We’re just getting the end result of the model-building — that’s not sufficient,” Martell said in an interview. The Pentagon has no idea how models are structured or what data has been used, he said.

Read More: How Large Language Models Work, Making Chatbots Lucid

Companies also aren’t explaining what dangers their systems could pose, Martell said.
“They’re saying: ‘Here it is. We’re not telling you how we built it. We’re not telling you what it’s good or bad at. We’re not telling you whether it’s biased or not,’” he said.

He described such models as the equivalent of “found alien technology” for the Defense Department. He’s also concerned that only a few groups of people have enough money to build LLMs. Martell didn’t identify any companies by name, but Microsoft Corp., Alphabet Inc.’s Google and Amazon.com Inc. are among those developing LLMs for the commercial market, along with startups OpenAI and Anthropic.

Martell is inviting industry and academics to Washington in February to address the concerns. The Pentagon’s symposium on defense data and AI aims to figure out what jobs LLMs may be suitable to handle, he said.

Martell’s team, which is already running a task force to assess LLMs, has already found 200 potential uses for them within the Defense Department, he said.
“We don’t want to stop large language models,” he said. “We just want to understand the use, the benefits, the dangers and how to mitigate against them.”

There is “a large upswell” within the department of people who would like to use LLMs, Martell said. But they also recognize that if the technology hallucinates — the term for when AI software fabricates information or delivers an incorrect result, which is not uncommon — they are the ones that must take responsibility for it.


He hopes the February symposium will help build what he called “a maturity model” to establish benchmarks relating to hallucination, bias and danger. While it might be acceptable for the first draft of a report to include AI-related mistakes — something a human could later weed out — those errors wouldn’t be acceptable in riskier situations, such as information that’s needed to make operational decisions.


A classified session at the three-day February event will focus on how to test and evaluate models, and protect against hacking.

Martell said his office is playing a consulting role within the Defense Department, helping different groups figure out the right way to measure the success or failure of their systems. The agency has more than 800 AI projects underway, some of them involving weapons systems.

Given the stakes involved, the Pentagon will apply a higher bar for how it uses algorithmic models than the private sector, he said.
“There’s going to be lots of use cases where lives are on the line,” he said. “So allowing for hallucination or whatever we want to call it — it’s just not going to be acceptable.”
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,857







SEAN HOLLISTERSEP 28
Say it with me: AI VR legs.
Remember the whole microscandal about Zuck’s VR legs?
Well... Meta is now using “machine learning models that are trained on large data sets of people” to let developers give you generative AI legs in the Meta Quest 3.
More new dev toys here.
You could say this tech has legs.

You could say this tech has legs. Image: Meta; GIF by Sean Hollister / The Verge







 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,857


Extremely impressed with what @dani_avila7 is building with CodeGPT (codegpt.co/)!! 🤯

CodeGPT pairs Code Documentation RAG with a GUI. So users can pick which docs they want to chat with, LangChain, LlamaIndex, Weaviate, ... This interface is then integrated in VSCode, as well as a standard chatbot interface.

Daniel and I met to chat about our shared love of the Gorilla 🦍 project from UC Berkeley (Daniel has created an awesome HuggingFace spaces demo to test the original model from Patil et al. - huggingface.co/spaces/davila…),🤗.

We had a really interesting debate on Zero-Shot LLMs + RAG versus Retrieval-Aware Fine-tuning the LlaMA 7B model Gorilla style.

It seems like there is a massive opportunity to help developers serve their LlaMA 7B models. The Gorilla paper shows that this achieves better performance (attached) -- and although it should in principal be cheaper to serve the 7B LLM, the DIY inference tooling is behind the commercial APIs.

Gorilla is also a fantastic opportunity to explore how these advanced query engines can better pack the prompt with information.

I am very curious how code assistant agents can benefit from non-formal documentation as well, such as blog posts, StackOverflow Q&A, and maybe even podcasts.

I also think there is a huge opportunity to explore how these advanced query engines can better pack the prompt with information. For example, the Weaviate Gorilla is simply translating from natural language commands to GraphQL APIs, but as we imagine integrations deeper into codebases, we may want to jointly retrieve from the user's codebase as well as the API docs to pack the prompt.

Thank you so much for the chat Daniel! Love learning more about how people are thinking about Gorilla LLMs and LLMs for Code! 🦍😎
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
51,839
Reputation
7,926
Daps
148,857

Mistral-7B-OpenOrca







AHNP4H6.png


ERw7j18.png


K1v8N7I.png



DEMO:
 
Last edited:
Top