Running LLM models at the crib

Macallik86 · Monday at 12:13 AM

I've had a passing interesting in AI for the last few years, but only realized in the last month or so that I don't need an expensive laptop to run models locally.

Share what you are running, what front-end you are using, any tips/tricks you know of, etc.

bnew · Monday at 3:33 PM

i've used LM Studio when i tested some quantized 6B & 7B models.

LM Studio - Discover, download, and run local LLMs

Run Llama, Gemma 3, DeepSeek locally on your computer.

lmstudio.ai

About LM Studio | LM Studio Docs

Learn how to run Llama, DeepSeek, Phi, and other LLMs locally with LM Studio.

lmstudio.ai

GitHub - nomic-ai/gpt4all: GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use. - nomic-ai/gpt4all

github.com

Ty Daniels · Monday at 4:26 PM

For simplicity sake there is also https://pinokio.computer/

Also depends on what you are trying to do, general LLM, or AI Art.
Pinokio is simple to use and to get started

greenvale · Monday at 4:55 PM

Just use jupyter and hugging face

Macallik86 · Tuesday at 9:18 PM

greenvale said:
Just use jupyter and hugging face

I already have jupyterlab in Docker for data analysis, but that UX is meh. I might as well just use the terminal instead lol.

I have been using Jan.ai and it does allow you to grab new models directly from huggingface.

What models are yall using and what are your use cases @bnew @greenvale @Ty Daniels

Ty Daniels · Tuesday at 9:27 PM

Macallik86 said:
I already have jupyterlab in Docker for data analysis, but that UX is meh. I might as well just use the terminal instead lol.

I have been using Jan.ai and it does allow you to grab new models directly from huggingface.

What models are yall using and what are your use cases @bnew @greenvale @Ty Daniels

I'm mainly using it for AI Art/Editing, more so Stable Diffusion 1.5 and XL, along With Flux

Using Forge UI, Krita AI Diffusion, and sometimes Fooocus

Tools I use
- Krita (with Krita Ai Difussion) (Like Adobe's Generative Fill, but Free)
- Pinokio
- Stablity Matrix
- Google Colab
- ControlNet (SD 1.5 and XL)

I've also played with LLama installed locally, but mainly use ChatGPT(Claude etc...) for any non-art related tasks

Macallik86 · Tuesday at 9:49 PM

Damn I'm actually running KDE but never tried Krita. I'll have to spin the block for that one.

Macallik86 · Friday at 2:10 PM

Wake up ~~babe~~ brehette, the new GGUF QAT model just dropped:

What They're Doing

The post describes Unsloth's "Dynamic 2.0" quantization method for large language models, which they claim outperforms other quantization approaches including QAT (Quantization-Aware Training). They're primarily focusing on:

Improved quantization techniques that reduce KL Divergence (a measure of how much the quantized model differs from full precision)
Better calibration datasets (using conversational style data instead of WikiText)
More accurate benchmarking methodology

Key Comparisons

For Gemma 3 27B, they show various quantization levels comparing old vs new methods and QAT vs non-QAT approaches. The notable claim is that their Dynamic 4-bit quantization achieves +1% better performance on MMLU than Google's QAT while using 2GB less disk space.

So it sounds like you can do more w/ less. I am updating my models on my devices as we speak.

Macallik86 · Friday at 6:47 PM

Forgot the links:

Unsloth Dynamic 2.0 Quants - a unsloth Collection

New 2.0 version of our Dynamic GGUF + Quants. Dynamic 2.0 achieves superior accuracy & outperforms all leading quantization methods.

huggingface.co

Long story short, find models that have UD in the name which stands for 'unsloth dynamic'.

I'm now running the unsloth:gemma-3-4b-it-GGUF:gemma-3-4b-it-UD-Q4_K_XL.gguf on my chrultrabook (i3-1125g4) and the unsloth:gemma-3-12b-it-GGUF:gemma-3-12b-it-UD-IQ3_XXS.gguf on my dekstop (i5-8500) in spite of both devices being CPU-only :banderas:

bnew · Friday at 6:49 PM

Macallik86 said:
Forgot the links:

Unsloth Dynamic 2.0 Quants - a unsloth Collection

New 2.0 version of our Dynamic GGUF + Quants. Dynamic 2.0 achieves superior accuracy & outperforms all leading quantization methods.

huggingface.co

Long story short, find models that have UD in the name which stands for 'unsloth dynamic'.

I'm now running the unsloth:gemma-3-4b-it-GGUF:gemma-3-4b-it-UD-Q4_K_XL.gguf on my chrultrabook (i3-1125g4) and the unsloth:gemma-3-12b-it-GGUF:gemma-3-12b-it-UD-IQ3_XXS.gguf on my dekstop (i5-8500) in spite of both devices being CPU-only

how many tokens per second?

Running LLM models at the crib

More options

Macallik86

Superstar

bnew

Veteran

LM Studio - Discover, download, and run local LLMs

About LM Studio | LM Studio Docs

GitHub - nomic-ai/gpt4all: GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

Ty Daniels

Superstar

greenvale

Superstar

Macallik86

Superstar

Ty Daniels

Superstar

Macallik86

Superstar

Macallik86

Superstar

Key Comparisons

Macallik86

Superstar

Unsloth Dynamic 2.0 Quants - a unsloth Collection

bnew

Veteran

Unsloth Dynamic 2.0 Quants - a unsloth Collection

Similar threads

Running LLM models at the crib

Superstar

Veteran

Superstar

Superstar

Superstar

Superstar

Superstar

Superstar

Key Comparisons​

Superstar

Veteran

Similar threads

Key Comparisons