bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001







1/7
(1/7) Physics of LM, Part 2.1 with 8 results for LLM reasoning is out: [2407.20311] Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process. Probing reveals that LLMs secretly develop some "level-2" reasoning skill beyond Humans. Although I recommend watching my ICML tutorial first... Come in this thread to see the slides.

GTyU4pNa8AA_aXO.jpg


2/7
Result 1: we don't want to chat with GPT4 to guess how it reasons. Instead, we create synthetic grade-school math data (using mod-23 and removing common sense, focusing solely on reasoning) and pretrain model directly on it. This allows for controlled experiments and probing.

GTyVAP4bwAERyKe.png


3/7
Result 2-3: Using this data, we show models can truly learn some reasoning skill (not by memorizing solution templates). Crucially, models can mentally do planning to generate shortest solutions (avoiding unnecessary computations) – a level 1 reasoning skill that Humans also do.

GTyVK5nbsAA-iZJ.png


4/7
Result 4-5: we invent probing technique to discover, before a question is asked, model already figures out (mentally!) what parameter recursively depends on what. This skill is not needed for solving the problem, and different from human reasoning. We call it "level-2" reasoning.

GTyVNiZaUAAOtc2.png


5/7
Result 6: We explain how reasoning errors occur. For instance, some error traces back to the model's mental planning stage => such error can be predicted even before the model starts to generate the first token; such errors are independent of the random generation process.

GTyVPs4bcAAfOVU.png


6/7
Result 7-8. Depth of model is crucial for reasoning; and we explain this necessity in depth by the complexity of the mental processes involved. This cannot be mitigated by CoT – deciding what’s the first CoT step may still require deep, multi-step mental reasoning (planning).

GTyVRbEbwAAnuB9.jpg


7/7
This is joint work with @yetian648(CMU/Meta), Zicheng Xu (Meta), Yuanzhi Li (MBZUAI). I'd like to thank once again my manager Lin Xiao for encouraging this exploratory research, FAIR's sponsorship on A100 + V100, and FAIR's wonderful engineering team that supported our heavy jobs




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001




1/4
Today, our CEO, @premakkaraju, announced that legendary filmmaker, technology innovator, and visual effects pioneer, James Cameron, has joined the Stability AI Board of Directors.

Cameron’s addition represents a significant step forward in our mission to transform visual media. His artist-centric perspective, paired with his business and technical acumen, will be invaluable as we continue to build a full stack AI pipeline that unlocks new opportunities for creators to tell stories in ways once unimaginable.

Read more here: James Cameron, Academy Award-Winning Filmmaker, Joins Stability AI Board of Directors — Stability AI

GYP2oSCbMAAJjgF.jpg


2/4
Couldn't have said it better ourselves! 👏



3/4
🚀



4/4
👀




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001

About​


This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive AI systems.

GenAI Agents: Comprehensive Repository for Development and Implementation 🚀

Welcome to one of the most extensive and dynamic collections of Generative AI (GenAI) agent tutorials and implementations available today. This repository serves as a comprehensive resource for learning, building, and sharing GenAI agents, ranging from simple conversational bots to complex, multi-agent systems.

📫 Stay Updated!​

Don't miss out on cutting-edge developments, new tutorials, and community insights!

Subscribe to the GenAI Agents Newsletter of DiamantAI

Introduction​

Generative AI agents are at the forefront of artificial intelligence, revolutionizing the way we interact with and leverage AI technologies. This repository is designed to guide you through the development journey, from basic agent implementations to advanced, cutting-edge systems.

Our goal is to provide a valuable resource for everyone - from beginners taking their first steps in AI to seasoned practitioners pushing the boundaries of what's possible. By offering a range of examples from foundational to complex, we aim to facilitate learning, experimentation, and innovation in the rapidly evolving field of GenAI agents.

Furthermore, this repository serves as a platform for showcasing innovative agent creations. Whether you've developed a novel agent architecture or found an innovative application for existing techniques, we encourage you to share your work with the community.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001

1/21
@AIatMeta
📣 Introducing Llama 3.2: Lightweight models for edge devices, vision models and more!

What’s new?
• Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for @Arm, @MediaTek & @Qualcomm on day one.
• Llama 3.2 11B & 90B vision models deliver performance competitive with leading closed models — and can be used as drop-in replacements for Llama 3.1 8B & 70B.
• New Llama Guard models to support multimodal use cases and edge deployments.
• The first official distro of Llama Stack simplifies and supercharges the way developers & enterprises can build around Llama to support agentic applications and more.

Details in the full announcement ➡️ Llama 3.2: Revolutionizing edge AI and vision with open, customizable models
Download Llama 3.2 models ➡️ Llama 3.2

These models are available to download now directly from Meta and @HuggingFace — and will be available across offerings from 25+ partners that are rolling out starting today, including @accenture, @awscloud, @AMD, @azure, @Databricks, @Dell, @Deloitte, @FireworksAI_HQ, @GoogleCloud, @GroqInc, @IBMwatsonx, @Infosys, @Intel, @kaggle, @NVIDIA, @OracleCloud, @PwC, @scale_AI, @snowflakeDB, @togethercompute and more.

With Llama 3.2 we’re making it possible to run Llama in even more places, with even more flexible capabilities. We’ve said it before and we’ll say it again: open source AI is how we ensure that these innovations reflect the global community they’re built for and benefit everyone. We’re continuing our drive to make open source the standard with Llama 3.2.



2/21
@reach_vb
Congrats on the release! I’m a huge fan of your commitment to open science and weights!

Thanks for the vision and an-device goodies:

Llama 3.2 - a meta-llama Collection



3/21
@ai_for_success
Congratulations this is huge.



4/21
@togethercompute
🙌 We love that Llama has gone multimodal! We're excited to partner with @AIatMeta to offer free access to the Llama 3.2 11B vision model for developers. Can't wait to see what everyone builds!

Try now with our Llama-Vision-Free model endpoint.

Sign up here: https://api.together.ai/playground/chat/meta-llama/Llama-Vision-Free



5/21
@Saboo_Shubham_
@ollama and @LMStudioAI time to go!!



6/21
@ollama
Let's go!!!! Open-source AI!



7/21
@joinnatural
GG 🙌



8/21
@borsavada
It is a real pity that Llama 3.2 is not available and accessible in Turkey. Restricting access to such innovative technologies can cause developers and researchers in Turkey to miss important opportunities.

Given the rapid developments in the field of artificial intelligence, it is crucial that our country is able to closely follow the advances in this field and utilize these technologies. Advanced language models such as Llama 3.2 can accelerate innovation and increase productivity in many sectors.

This may be due to license agreements, legal regulations or technical infrastructure issues. But whatever the reason, such obstacles need to be overcome to ensure that Turkey does not fall behind in the global AI race.

It is critical that policymakers, technology companies and academic institutions collaborate to ensure that Turkey has access to the latest AI technologies and strengthen the local ecosystem in this field. In this way, Turkey can become not only a consumer but also a producer and innovation center in the field of artificial intelligence.



9/21
@swissyp_
let's get these on-chain on /search?q=#ICP!! Llama 3.2 1B & 3B 🚀

cc: @icpp_pro



10/21
@AMD
AMD welcomes the latest Llama 3.2 release from Meta. We're excited to share how our collaboration with Meta is enabling developers with Day-0 support. Llama 3.2 and AMD: Optimal Performance from Cloud to Edge and AI PCs



11/21
@basetenco
We're excited to bring dedicated deployments of these new Llama 3.2 models to our customers!

90B vision looks especially powerful — congrats to the entire Llama team!



12/21
@janvivaddison
Congratulations that was amazing 😻



13/21
@Ming_Chun_Lee
"open source AI is how we ensure that these innovations reflect the global community they’re built for and benefit everyone."

This is also how we can ensure everyone can help to build and advance AI together with the same goal.

Very important.



14/21
@dhruv2038
Congrats to @AIatMeta for being this open.



15/21
@FirstBatchAI
Thank you for helping us build better for edge! 🚀



16/21
@testingcatalog
“Linux of AI”



17/21
@ryanraysr
Awesome! Looking forward to digging it!



18/21
@Neeraj_Kumar222
I am excited to see the new capabilities of Llama 3.2 models, especially the support for edge devices and the competitive performance of the vision models. The expanded partner ecosystem and commitment to open-source AI are great to see. Looking forward to exploring the potential applications of these advancements.



19/21
@JonathanRoseD
I've been trying out 3.2 3b on my phone and it's been super for a micro model! I just highly recommend using a temperature at 0.2 and no higher (hallucination). Amazing work!!



20/21
@CamelAIOrg
Congratulations, we are excited to test these out within our framework!



21/21
@philip_kiely
Excited to compare 1B and 3B to the Qwen 2.5 small language models — have been really enjoying Qwen 2.5 1.5B

Llama 1B will be especially useful as a small example model in all of the documentation that I write!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GYVg0COXYAEvj_-.jpg

GYVrlFXWQAIzPYA.jpg

GYV5x9NWoAA7Uap.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001

1/1
🚨 BREAKING

Llama 3.2 multimodal is here and 90B outperforms GPT-4o-mini and Claude Haiku in different benchmarks.

» Lightweight - 1B and 3B
» Multimodal - 11B and 90B


To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GYVsAv3aAAA2zl4.jpg



1/1
Together is serving Llama 3.2 vision for free - have fun!

[Quoted tweet]
🚀 Big news! We’re thrilled to announce the launch of Llama 3.2 Vision Models & Llama Stack on Together AI.

🎉 Free access to Llama 3.2 Vision Model for developers to build and innovate with open source AI. api.together.ai/playground/c…

➡️ Learn more in the blog together.ai/blog/llama-3-2-v…



To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GYV7UktWgAAZZwP.jpg











1/12
@nutlope
Announcing Napkins.dev – Screenshot to code!

An open source wireframe to app tool powered by Llama 3.2 vision. Upload a screenshot of a simple site/design & get code.

100% free and open source.



2/12
@nutlope
Here's the GitHub repo! Also, shoutout to @YoussefUiUx for the great design.

GitHub - Nutlope/napkins: napkins.dev – from screenshot to app



3/12
@nutlope
Tech Stack:

◆ @togethercompute's inference (AI API)
◆ @AIatMeta's Llama 3.2 Vision models
◆ @AIatMeta's Llama 3.1 405B for the LLM
◆ @codesandbox's sandpack for the sandbox
◆ @nextjs w/ tailwind & typescript
◆ @helicone_ai for AI observability
◆ @PlausibleHQ for analytics
◆ @awscloud's S3 for file uploads



4/12
@nutlope
How it works:

I ask the Llama 3.2 vision models to describe whatever screenshot the user uploaded, then pass it to Llama 3.1 405B to actually code it.

It's fairly limited in what it can handle right now – best for simple UI sketches!



5/12
@nutlope
Launched this as part of us at Together AI supporting the new Llama 3.2 models (including vision). Check it out!

[Quoted tweet]
🚀 Big news! We’re thrilled to announce the launch of Llama 3.2 Vision Models & Llama Stack on Together AI.

🎉 Free access to Llama 3.2 Vision Model for developers to build and innovate with open source AI. api.together.ai/playground/c…

➡️ Learn more in the blog together.ai/blog/llama-3-2-v…


6/12
@nutlope
Check out the app here!

Napkins.dev – Screenshot to code



7/12
@LM_22
It would be nice to change it as OCR and extracting particular data out of pictures, invoices, packing list, delivery notes, and structure it in json or csv for handover to agent



8/12
@nutlope
Agreed! It has a lot of really cool use cases and I'm planning to do one with receipts potentially



9/12
@olanetsoft
Wow



10/12
@nutlope
Still limited to fairly simple designs but gonna work to make it better!



11/12
@DamiDina
Fire



12/12
@nutlope
Thanks Dami!




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GYV_EahW4AAqnVd.png

GYV7UktWgAAZZwP.jpg








1/6
Llama 3.2 is available on Ollama! It's lightweight and multimodal! It's so fast and good!

🥕 Try it:

1B
ollama run llama3.2:1b

3B
ollama run llama3.2

🕶️ vision models are coming very soon!

llama3.2

[Quoted tweet]
📣 Introducing Llama 3.2: Lightweight models for edge devices, vision models and more!

What’s new?
• Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for @Arm, @MediaTek & @Qualcomm on day one.
• Llama 3.2 11B & 90B vision models deliver performance competitive with leading closed models — and can be used as drop-in replacements for Llama 3.1 8B & 70B.
• New Llama Guard models to support multimodal use cases and edge deployments.
• The first official distro of Llama Stack simplifies and supercharges the way developers & enterprises can build around Llama to support agentic applications and more.

Details in the full announcement ➡️ go.fb.me/229ug4
Download Llama 3.2 models ➡️ go.fb.me/w63yfd

These models are available to download now directly from Meta and @HuggingFace — and will be available across offerings from 25+ partners that are rolling out starting today, including @accenture, @awscloud, @AMD, @azure, @Databricks, @Dell, @Deloitte, @FireworksAI_HQ, @GoogleCloud, @GroqInc, @IBMwatsonx, @Infosys, @Intel, @kaggle, @NVIDIA, @OracleCloud, @PwC, @scale_AI, @snowflakeDB, @togethercompute and more.

With Llama 3.2 we’re making it possible to run Llama in even more places, with even more flexible capabilities. We’ve said it before and we’ll say it again: open source AI is how we ensure that these innovations reflect the global community they’re built for and benefit everyone. We’re continuing our drive to make open source the standard with Llama 3.2.


2/6
Amazing!!



3/6
lightweight and multimodal! ❤️❤️❤️



4/6
❤️



5/6
Amazing! Is this the 3B or 1B?



6/6
❤️




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GYVg0COXYAEvj_-.jpg

GYV6hYDWEAEJ5wJ.png


1/4
@awnihannun
Llama 3.2 1B in 4-bit generates at ~350 (!) toks/sec with MLX on an M2 Ultra. This is interesting.

Command: mlx_lm.generate --model mlx-community/Llama-3.2-1B-Instruct-4bit --prompt "Write a story about Einstein" --temp 0.0 --max-tokens 512

Not sped up:



2/4
@MemoSparkfield
Wow!



3/4
@ivanfioravanti
WOW! That’s ultra fast!



4/4
@shareastronomy
Is the MLX version of the model on Hugging Face already?




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001


1/3
@dani_avila7
Llama 3.2 - 1B 🦙 running locally in VSCode with Ollama and CodeGPT

You can now download the Llama 3.2 1B and 3B models.

Just open CodeGPT in VSCode, select @ollama and the Llama 3.2 models, and click "Download Model"

The latest Small Language Models from @AIatMeta running privately and securely on your computer :smile:



2/3
@dani_avila7
Follow this installation guide:

Llama 3.2 Running Locally in VSCode: How to Set It Up with CodeGPT and Olla



3/3
@lalopenguin
heck yeahhhh




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196



1/6
@akashnet_
Llama 3.2 is the latest open-source AI model from Meta, released only a few hours ago.

Here is the 3B parameter model running on Akash Chat at 165 tokens/second, powered by NVIDIA A100s on Akash.

Try Llama 3.2 for free, no sign-in required:
AkashChat



2/6
@aguirre_benja
is available with the apik key?



3/6
@donTimaty
$80 dollar by December pin this



4/6
@Exiled_eth
/search?q=#AKT llama is cookin



5/6
@mfrey33
/search?q=#AKT ships



6/6
@nakamotobased
LFG TRUMP GIV BIT




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
GYVCJtDW8AAkko7.jpg






1/11
@Teknium1
Llama-3.2 is out! Open access model (except for the EU I guess), 90B Parameters, but 18B of those are exclusive to its new Vision capabilities. It seemingly beats all closed source models on vision.

They also released a 1, 3, and 11B version as well.

HuggingFace Model Page: meta-llama/Llama-3.2-90B-Vision · Hugging Face

VLLM is adding support as we speak: [Model] Add support for the multi-modal Llama 3.2 model by heheda12345 · Pull Request #8811 · vllm-project/vllm



2/11
@zacharynado
Gemini Flash 1.5 has 62.3% MMMU vs 60.3% on Llama-3.2 90B

[Quoted tweet]
Full update Gemini model metrics : )


3/11
@Teknium1
But I cant run it on my home PC :[



4/11
@vSouthvPawv
Are the 1B and 3B language only? I'm really surprised to not see an 8B LMM when we've had access to LlaVa and Moondream for edge devices.



5/11
@Teknium1
the 11b is the vision 8B I think?



6/11
@bindureddy
Not the big ones, right?



7/11
@Teknium1
not the big ones for what



8/11
@mysticaltech
So 4o-mini is that good huh



9/11
@caviterginsoy
I love the 90B size, finally can max out my M3



10/11
@M_Chimiste
So… tiny Hermes?



11/11
@ikristoph
This is not correct. GPT-4o, Sonnet, Gemini all do better ( even Quen actually ).

MMMU




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GYWA3n1XsAEAzW1.jpg

GYQO3fHXEAAfdWt.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001






1/6
How it started vs how it’s goin.

[Quoted tweet]
I replied with this. Mira, thank you for everything.

It’s hard to overstate how much Mira has meant to OpenAI, our mission, and to us all personally.

I feel tremendous gratitude towards her for what she has helped us build and accomplish, but I most of all feel personal gratitude towards her for the support and love during all the hard times. I am excited for what she’ll do next.

We’ll say more about the transition plans soon, but for now, I want to take a moment to just feel thanks.

Sam


2/6
😂 it's almost nothing now



3/6
oh, maybe Matt hired Mira????



4/6
right. fully automated company



5/6
he hasn't left, but if a co-founder takes a long sabbatical, it usually ends up badly



6/6
my guess is AGI achieved internally, so soon they don't need a CEO




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196

GYWL3q2aYAAUOIu.jpg

GYWL4H8XwAAo9u9.jpg
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001





Exclusive: OpenAI to remove non-profit control and give Sam Altman equity​


By Krystal Hu and Kenrick Cai

September 25, 20244:35 PM EDTUpdated 4 hours ago

Item 1 of 2 Sam Altman, CEO of OpenAI, attends the 54th annual meeting of the World Economic Forum, in Davos, Switzerland, January 18, 2024. REUTERS/Denis Balibouse/File Photo

[1/2]Sam Altman, CEO of OpenAI, attends the 54th annual meeting of the World Economic Forum, in Davos, Switzerland, January 18, 2024. REUTERS/Denis Balibouse/File Photo Purchase Licensing Rights
, opens new tab


  • OpenAI plots to restructure into for-profit benefit corporation
  • Non-profit board no longer controls for-profit when done
  • CEO Sam Altman to receive equity in OpenAI for the first time

SAN FRANCISCO, Sept 25 (Reuters) - ChatGPT-maker OpenAI is working on a plan to restructure its core business into a for-profit benefit corporation that will no longer be controlled by its non-profit board, people familiar with the matter told Reuters, in a move that will make the company more attractive to investors.

The OpenAI non-profit will continue to exist and own a minority stake in the for-profit company, the sources said. The move could also have implications for how the company manages AI risks in a new governance structure.

Chief executive Sam Altman will also receive equity for the first time in the for-profit company, which could be worth $150 billion after the restructuring as it also tries to remove the cap on returns for investors, sources added. The sources requested anonymity to discuss private matters.

"We remain focused on building AI that benefits everyone, and we’re working with our board to ensure that we’re best positioned to succeed in our mission. The non-profit is core to our mission and will continue to exist," an OpenAI spokesperson said.

The details of the proposed corporate structure, first reported by Reuters, highlight significant governance changes happening behind the scenes at one of the most important AI companies. The plan is still being hashed out with lawyers and shareholders and the timeline for completing the restructuring remains uncertain, the sources said.

The restructuring also comes amid a series of leadership changes at the startup. OpenAI's longtime chief technology officer Mira Murati abruptly announced her departure from the company on Wednesday. Greg Brockman, OpenAI's president, has also been on leave.

Founded in 2015 as a non-profit AI research organization, OpenAI added the for-profit OpenAI LP entity in 2019 as a subsidiary of its non-profit, securing capital from Microsoft (MSFT.O)
, opens new tab to fund its research.

The company captured global attention with the launch of ChatGPT in late 2022, a generative AI app that spit out human-like responses to text queries, which has become one of the fastest-growing applications in history with over 200 million weekly active users, igniting a global race to invest in AI.

Along with ChatGPT's success, OpenAI's valuation has skyrocketed from $14 billion in 2021 to $150 billion in the new convertible debt round under discussion, attracting investors such as Thrive Capital and Apple (AAPL.O)
, opens new tab.


AI SAFETY​


The company’s unusual structure, which gives full control of the for-profit subsidiary to the OpenAI nonprofit, was originally set to ensure the mission of creating "safe AGI that is broadly beneficial," referring to artificial general intelligence that is at or exceeding human intelligence.

The structure came into focus last November during one of the biggest boardroom dramas in Silicon Valley, where members of the non-profit board ousted Altman over a breakdown in communication and loss of trust. He was reinstated after five days with overwhelming support from employees and investors.

Since then, OpenAI's board has been refreshed with more tech executives, chaired by Bret Taylor, former Salesforce co-CEO who now runs his own AI startup. Any corporate changes need approval from its nine-person non-profit board.

The removal of non-profit control could make OpenAI operate more like a typical startup, a move generally welcomed by its investors who have poured billions into the company.

However, it could also raise concerns from the AI safety community about whether the lab still has enough governance to hold itself accountable in its pursuit of AGI, as it has dissolved the superalignment team that focuses on the long-term risks of AI earlier this year.

It’s unclear how much equity Altman will receive. Altman, already a billionaire from his multiple startup investments, has previously stated that he chose not to take an equity stake in the company because the board needed a majority of disinterested directors with no stake in the company. He has also said he has enough money and is doing it because he loves the work.

The new structure of OpenAI would resemble that of its major rival Anthropic and Elon Musk's xAI, which are registered as benefit corporations, a form of for-profits that aim to promote social responsibility and sustainability in addition to making profits.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001


1/3
@tedx_ai
NotebookLM’s AI podcast tool is AWESOME, but did you know there’s an open source model that does this now?!



2/3
@tedx_ai


[Quoted tweet]
Code: github.com/lamm-mit/PDF2Audi…

HF space: huggingface.co/spaces/lamm-m…

@Gradio @_akhaliq @huggingface


3/3
@JagersbergKnut
this is not correct, it requires using closed source models to work at this point




To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196








1/11
@ProfBuehlerMIT
We are excited to share /search?q=#PDF2Audio, an open-source alternative to the /search?q=#podcast feature of /search?q=#NotebookLM with flexibility & tailored outputs that you can precisely control in the app: You can make a podcast, lecture, discussions, short/long form summaries & more, including the use of the amazing🍓o1 model (@sama @OpenAI: with stunning results!).

Code & HF Space: You can find /search?q=#PDF2Audio on GitHub for local use or try the Hugging Face space, all featuring @Gradio. Link to the repo & HF space in the reply.

Thank you @knowsuchagencyfor the great work on /search?q=#promptic and /search?q=#pdf2podcast, as well as @LiteLLM, & @_akhaliq for helping us with the @huggingface spaces version. We hope that this tool is useful for the community.

Background: Developing audio podcasts, lectures, & summaries from complex documents & data has become an exciting trend with impacts from research to education to business. Our open-source /search?q=#PDF2Audio tool that allows users to utilize various models such as /search?q=#o1 or local/open-source models, to develop deep-dives into technical content.

Example application - material design analysis: As an example to show what the system can do, check the video for a detailed 13-minute analysis of one of the designs created by /search?q=#SciAgents merging silk & dandelion pigments, created using 🍓o1.

The conversation describes the new material, an integration of silk proteins & luteolin/dandelion pigments to create a new biomaterial. Silk, a natural /search?q=#nanostructured protein-based fiber known for its strength & flexibility, is combined with dandelion pigments like luteolin, which offer unique optical properties. By merging these components at the nanoscale level, the resulting material displays structural coloration—vibrant, tunable colors created by the material's structure rather than synthetic dyes, and leverages silk's hierarchical organization as a scaffold for the pigments, ensuring uniform distribution and non-covalent bonding at the molecular level.

Key technical features include:

➡️Low-temperature processing to maintain the integrity of both silk and pigments while reducing energy consumption by 30%.
➡️Enhanced mechanical properties, with tensile strength up to 1.5 gigapascals.
➡️Potential self-healing capabilities and environmental responsiveness, allowing the material to repair minor damage and change color based on environmental conditions.
➡️UV protection and antimicrobial properties, which make this material ideal for smart textiles, eco-friendly coatings, and medical applications.

This development opens new doors for sustainable materials, offering an eco-friendly alternative to synthetic fibers with applications in various industries, from fashion to healthcare.



2/11
@ProfBuehlerMIT
Code: GitHub - lamm-mit/PDF2Audio

HF space: Pdf2audio - a Hugging Face Space by lamm-mit

@Gradio @_akhaliq @huggingface



3/11
@TyReid
This is exactly what I was hoping for after discovering the notebooklm podcast feature last week. Unfortunately all my attempts on Huggingface so far have errored out, but will keep digging.



4/11
@ProfBuehlerMIT
You can also run it locally quite easily, see the GitHub repo. The version on spaces works well for me also but some have had challenges also.



5/11
@TonyW
This is pretty cool. Thanks for putting it out there. Nice that you can adjust the prompts too. Anything notably different from the NotebookLM approach?



6/11
@ProfBuehlerMIT
They did an amazing job with NotebookLM. In our version you have a lot more control and can use a variety of LLMs. E.g. you can control the process of generating the dialog in detail, iterate on drafts & give feedback, and use a variety of models (we are getting excellent results using o1, for instance). You can also select the audio generation model and voices. We are working on implementing a version for open source models to generate the script and audio synthesis, which opens further possibilities for tailorability (e.g. fine-tuned models).



7/11
@lalopenguin
can you choose different voices ?



8/11
@ProfBuehlerMIT
Yes you can. We also have an edit functionality that allows you to iterate on a draft generated by the model with specific comments inside the transcript, or overall instructions.



9/11
@alexcovo_eth
I've been playing with NotebookLLM and Illuminate from Google also, this is such a great addition. Just installed locally. It's super fast and stable. I was able to output a podcast (in spanish) in less than a minute. The quality is great! Thank you for sharing with the world! 🙏



10/11
@MikeBirdTech
Nice! This looks great



11/11
@RobbyGtv





To post tweets in this format, more info here: https://www.thecoli.com/threads/tips-and-tricks-for-posting-the-coli-megathread.984734/post-52211196
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
52,796
Reputation
8,009
Daps
151,001



Molmo
Multimodal Open Language Model built by Ai2
Learn about Molmo

 
Top