OpenWebUI, free, private alternative to ChatGPT

Heimdall

Pro
Joined
Dec 13, 2019
Messages
440
Reputation
301
Daps
984
Interesting.

My second foray into using LLMs was for offline/local transcription and it took forever on my laptop :russ: so I'm a little hesitant to try anything local, but certainly intrigued.
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,598
Reputation
1,473
Daps
21,573
Interesting.

My second foray into using LLMs was for offline/local transcription and it took forever on my laptop :russ: so I'm a little hesitant to try anything local, but certainly intrigued.
This.

I've never tried locally because I've heard previously that you need souped up video cards to run it at a respectable clip. I might try one of the 8B models down the road as they get smarter, but buying new hardware seems like an early investment.

Edit:
From the comment section by the creator:
CPU: Minimum: Modern processor with at least 4 cores. RAM: 7B Models: At least 8 GB. 13B Models: At least 16 GB. NVIDIA GPUs: Compute capability of at least 5.0. AMD GPUs: Supported for enhanced performance VRAM Requirements: 7B Models: 8 GB VRAM
 

Heimdall

Pro
Joined
Dec 13, 2019
Messages
440
Reputation
301
Daps
984
This.

I've never tried locally because I've heard previously that you need souped up video cards to run it at a respectable clip. I might try one of the 8B models down the road as they get smarter, but buying new hardware seems like an early investment.

Edit: From the comment section by the creator:
It seems that the speed comes from being able to fit the entire model within the VRAM of the GPU, plus Nvidia GPUs have some optimisations (CUDA?) though I just came across something that should help with AMD ones

So for those constrained by hardware and seeking to run things locally it may be worth looking for a model smaller than the dedicated VRAM of your GPU, (though I don't know if the ggml file size is the same amount of space it takes up in memory)

Or get a Mac :troll: Want to Run a LLM Like Llama3 at Home for a Self-Hosted ChatGPT? A Mac Might Be Your Best, Cheapest Option
On Macs, the architecture is a little different. CPU and GPU both share the same high speed memory, which can run at 800GB/sec. That’s well beyond what DDR5 (64GB/sec) or DDR6 (134GB/sec) can offer. So if you buy a Mac with 64GB, it can use that memory for either system CPU processing or GPU memory.

Out of the box, a Mac will max out at 75% of RAM for the GPU, though this can be adjusted. My 64GB M1 Max runs models that require 40GB+ of GPU VRAM just fine.

I suppose a middle ground would be to rent a cloud instance/GPU server, but I would probably only try that if I was serious about this stuff lol.

I have inherited an ancient desktop with a discrete card with more dedicated VRAM than my laptop's iGPU, and I'm wondering if the transcription would run faster on there (if at all)... :patrice:
 
Top