Hacker Remix

Show HN: LocalScore – Local LLM Benchmark

124 points by sipjca 2 weeks ago | 24 comments

Hey Folks!

I've been building an open source benchmark for measuring local LLM performance on your own hardware. The benchmarking tool is a CLI written on top of Llamafile to allow for portability across different hardware setups and operating systems. The website is a database of results from the benchmark, allowing you to explore the performance of different models and hardware configurations.

Please give it a try! Any feedback and contribution is much appreciated. I'd love for this to serve as a helpful resource for the local AI community.

For more check out: - Website: https://localscore.ai - Demo video: https://youtu.be/De6pA1bQsHU - Blog post: https://localscore.ai/blog - CLI Github: https://github.com/Mozilla-Ocho/llamafile/tree/main/localsco... - Website Github: https://github.com/cjpais/localscore

mentalgear 2 weeks ago

Congrats on the effort - the local-first / private space needs more performant AI, and AI in general needs more comparable and trustworthy benchmarks.

Notes: - Olama integration would be nice - Is there an anonymous federated score sharing? That way, users you approximate a model's performance before downloading it.

sipjca 1 week ago

Can you tell me more about the "anonymous federated score sharing"? Maybe something we can think about more

I totally agree with Ollama integration and if there is interest we will try to upstream into llama.cpp

jsatok 1 week ago

Contributed scores for the M3 Ultra 512 GB unified memory: https://www.localscore.ai/accelerator/404

Happy to test larger models that utilize the memory capacity if helpful.

deanputney 1 week ago

That's very interesting. I guess it just can't compete with any of the Nvidia cards? I would think your results should show up if sorted by "generation"– maybe the leaderboard is cached...

zamadatix 1 week ago

Non-VRAM based options of the moment are only competitive when it comes to the size of a model they can run, since the alternative is "you can't run it at that price otherwise".

Bandwidth at a given size is king, only then followed by enough compute to utilize it.

sipjca 1 week ago

Ty for pointing this out, the results are taken from the db based on LocalScore, I will make some modifications to make the sorting better here

david_draco 1 week ago

I don't know if I should trust and run this code. If it was associated to Mozilla I would. It says it is a Mozilla Builders project, but https://builders.mozilla.org/projects/ does not list it. I don't see a way to verify that localscore.ai is associated with Mozilla.

zamadatix 1 week ago

It doesn't seem they update the site often (the last 'latest' post is from December) but they reposted something claiming the same on X https://x.com/llamafile/status/1907917417118105751

zamadatix 1 week ago

The run and/or troubleshooting steps for Windows should probably include the note you need to install https://developer.nvidia.com/cuda-downloads?target_os=Window... if you have an Nvidia GPU (and probably something similar if you have an AMD GPU?). As it is right now the steps happily get you benchmarking your CPU and I'd say that might even be worth adding a "Warning: The benchmark is operating in CPU only mode, press y to continue if this is intended" type message to the program.

Edit: And for the same prompt and generated token counts it runs ~4x slower than `ollama run hf.co/bartowski/Qwen2.5-14B-Instruct-GGUF:Q4_K_M --verbose`. It's possible I'm mixing up a few things there but my results also post in the same ballpark slower than others with the same GPU so it seems something is up with the application in either case.