remix logo

Hacker Remix

Show HN: Can I run this LLM? (locally)

42 points by asasidh 15 hours ago | 50 comments

One of the most frequent questions one faces while running LLMs locally is: I have xx RAM and yy GPU, Can I run zz LLM model ? I have vibe coded a simple application to help you with just that.

Update: A lot of great feedback for me to improve the app. Thank you all.

abujazar 14 hours ago

Nice concept – but unfortunstely I found it to be incorrect in all of the examples I tried with my Mac.

It'd also need to be much more precise in hardware specs and cover a lot more models and their variants to be actually useful.

Grading the compatibilty is also an absolute requirement – it's rarely an absolute yes or no, but often a question of available GPU memory. There's a lot of other factors too which don't seem to be considered.

rkagerer 14 hours ago

I found it to be incorrect in all of the examples I tried

Are you sure it's not powered by an LLM inside?

abujazar 13 hours ago

I believe it'd be more precise if it used an appropriately chosen and applied LLM in combination with web research – in contrast to juggling together some LLM generated code.

ggerules 11 hours ago

Confirmed. Nice idea but it doesn't really define"run". I can run some relatively large models compared to their choices. They just happen to be slow.

codingdave 15 hours ago

And herein lies the problem with vibe coding - accuracy is wanting.

I can absolutely run models that this site says cannot be run. Shared RAM is a thing - even with limited VRAM, shared RAM can compensate to run larger models. (Slowly, admittedly, but they work.)

lucb1e 14 hours ago

New word for me: vibe coding

> coined the term in February 2025

> Vibe coding is a new coding style [...] A programmer can describe a program in words and get an AI tool to generate working code, without requiring an understanding of the code. [...] [The programmer] surrenders to the "vibes" of the AI [without reading the resulting code.] When errors arise, he simply copies them into the system without further explanation.

https://en.wikipedia.org/wiki/Vibe_coding

thaumasiotes 13 hours ago

Austen Allred sold a group of investors on the idea that this was the future of everything.

https://www.gauntletai.com/

avereveard 15 hours ago

Also quantization and allocation strategies are a big thing for local usage. 16gb vram don't seem a lot, but you can run recent 32b model in IQ3 with their full 128k context if you allocate the kv matrix on system memory, with 15t/s and a decent prompt processing speed (just above 1000t/s on my hardware)

asasidh 15 hours ago

thanks for your feedback, there is room to show how fast or slow the model will run. I will try to update the app

asasidh 15 hours ago

yes I agree that you can run. I have personally run Ollama on a 2020 intel macbook pro. Its not a problem of vibe coding, but of the choice of logic i went with.

do_not_redeem 14 hours ago

> Can I Run DeepSeek R1

> Yes, you can run this model! Your system has sufficient resources (16GB RAM, 12GB VRAM) to run the smaller distilled version (likely 7B parameters or less) of this model.

Last I checked DeepSeek R1 was a 671B model, not a 7B model. Was this site made with AI?

jsheard 14 hours ago

> Was this site made with AI?

OP said they "vibe coded" it, so yes.

https://en.m.wikipedia.org/wiki/Vibe_coding

kennysoona 11 hours ago

Goodness. I love getting older and see the ridiculousness of the next generation.

reaperman 14 hours ago

It says “smaller distilled model” in your own quote which, generously, also implies quantized.

Here[0] are some 1.5B and 8B distilled+quantized derivatives of DeepSeek. However, I don’t find a 7B model, that seems totally made-up from whole cloth. Also, I personally wouldn’t call this 8B model “DeepSeek”.

0: https://www.reddit.com/r/LocalLLaMA/comments/1iskrsp/quantiz...

sudohackthenews 14 hours ago

> > smaller distilled version

Not technically the full R1 model, it’s talking about the distillations where Deepseek trained Qwen and Llama models based on R1 output

do_not_redeem 14 hours ago

Then how about DeepSeek R1 GGUF:

> Yes, you can run this model! Your system has sufficient resources (16GB RAM, 12GB VRAM) to run this model.

No mention of distillations. This was definitely either made by AI, or someone picking numbers for the models totally at random.

sudohackthenews 13 hours ago

Ok yeah that’s just weird

monocasa 14 hours ago

Is it maybe because DeepSeek is a MoE and doesn't require all parameters for a given token?

That's not ideal from a token throughput perspective, but I can see min working set of weight memory gains if you can load pieces into vram for each token.

throwaway314155 14 hours ago

It still wouldn't fit in 16 GB memory. Further there's too much swapping going on with MoE models to move expert layers to and from gpu without bottlenecks.

drodgers 14 hours ago

This doesn't mention quantisations. Also, it says I can run R1 with 128GB of ram, but even the 1.58 bit quantisation takes 160GB.