Running LLM models at the crib

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,780
Reputation
1,567
Daps
22,335
I've had a passing interesting in AI for the last few years, but only realized in the last month or so that I don't need an expensive laptop to run models locally.

Share what you are running, what front-end you are using, any tips/tricks you know of, etc.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
62,891
Reputation
9,559
Daps
172,238
i've used LM Studio when i tested some quantized 6B & 7B models.



 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,780
Reputation
1,567
Daps
22,335
Just use jupyter and hugging face
I already have jupyterlab in Docker for data analysis, but that UX is meh. I might as well just use the terminal instead lol.

I have been using Jan.ai and it does allow you to grab new models directly from huggingface.


What models are yall using and what are your use cases @bnew @greenvale @Ty Daniels
 

Ty Daniels

Superstar
Joined
Dec 13, 2019
Messages
2,018
Reputation
3,409
Daps
14,301
I already have jupyterlab in Docker for data analysis, but that UX is meh. I might as well just use the terminal instead lol.

I have been using Jan.ai and it does allow you to grab new models directly from huggingface.


What models are yall using and what are your use cases @bnew @greenvale @Ty Daniels


I'm mainly using it for AI Art/Editing, more so Stable Diffusion 1.5 and XL, along With Flux

Using Forge UI, Krita AI Diffusion, and sometimes Fooocus

Tools I use
- Krita (with Krita Ai Difussion) (Like Adobe's Generative Fill, but Free)
- Pinokio
- Stablity Matrix
- Google Colab
- ControlNet (SD 1.5 and XL)

I've also played with LLama installed locally, but mainly use ChatGPT(Claude etc...) for any non-art related tasks

 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,780
Reputation
1,567
Daps
22,335
Wake up babe brehette, the new GGUF QAT model just dropped:


What They're Doing

The post describes Unsloth's "Dynamic 2.0" quantization method for large language models, which they claim outperforms other quantization approaches including QAT (Quantization-Aware Training). They're primarily focusing on:


  1. Improved quantization techniques that reduce KL Divergence (a measure of how much the quantized model differs from full precision)
  2. Better calibration datasets (using conversational style data instead of WikiText)
  3. More accurate benchmarking methodology

Key Comparisons​


For Gemma 3 27B, they show various quantization levels comparing old vs new methods and QAT vs non-QAT approaches. The notable claim is that their Dynamic 4-bit quantization achieves +1% better performance on MMLU than Google's QAT while using 2GB less disk space.

So it sounds like you can do more w/ less. I am updating my models on my devices as we speak.
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,780
Reputation
1,567
Daps
22,335
Forgot the links:

Long story short, find models that have UD in the name which stands for 'unsloth dynamic'.

I'm now running the unsloth:gemma-3-4b-it-GGUF:gemma-3-4b-it-UD-Q4_K_XL.gguf on my chrultrabook (i3-1125g4) and the unsloth:gemma-3-12b-it-GGUF:gemma-3-12b-it-UD-IQ3_XXS.gguf on my dekstop (i5-8500) in spite of both devices being CPU-only :banderas:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
62,891
Reputation
9,559
Daps
172,238
Forgot the links:

Long story short, find models that have UD in the name which stands for 'unsloth dynamic'.

I'm now running the unsloth:gemma-3-4b-it-GGUF:gemma-3-4b-it-UD-Q4_K_XL.gguf on my chrultrabook (i3-1125g4) and the unsloth:gemma-3-12b-it-GGUF:gemma-3-12b-it-UD-IQ3_XXS.gguf on my dekstop (i5-8500) in spite of both devices being CPU-only :banderas:

how many tokens per second?
 
Top