Secure Da Bag
Veteran
And it works offline
And it works offline
This.Interesting.
My second foray into using LLMs was for offline/local transcription and it took forever on my laptop so I'm a little hesitant to try anything local, but certainly intrigued.
CPU: Minimum: Modern processor with at least 4 cores. RAM: 7B Models: At least 8 GB. 13B Models: At least 16 GB. NVIDIA GPUs: Compute capability of at least 5.0. AMD GPUs: Supported for enhanced performance VRAM Requirements: 7B Models: 8 GB VRAM
It seems that the speed comes from being able to fit the entire model within the VRAM of the GPU, plus Nvidia GPUs have some optimisations (CUDA?) though I just came across something that should help with AMD onesThis.
I've never tried locally because I've heard previously that you need souped up video cards to run it at a respectable clip. I might try one of the 8B models down the road as they get smarter, but buying new hardware seems like an early investment.
Edit: From the comment section by the creator:
On Macs, the architecture is a little different. CPU and GPU both share the same high speed memory, which can run at 800GB/sec. That’s well beyond what DDR5 (64GB/sec) or DDR6 (134GB/sec) can offer. So if you buy a Mac with 64GB, it can use that memory for either system CPU processing or GPU memory.
Out of the box, a Mac will max out at 75% of RAM for the GPU, though this can be adjusted. My 64GB M1 Max runs models that require 40GB+ of GPU VRAM just fine.