124 points by sipjca 2 weeks ago | 24 comments
I've been building an open source benchmark for measuring local LLM performance on your own hardware. The benchmarking tool is a CLI written on top of Llamafile to allow for portability across different hardware setups and operating systems. The website is a database of results from the benchmark, allowing you to explore the performance of different models and hardware configurations.
Please give it a try! Any feedback and contribution is much appreciated. I'd love for this to serve as a helpful resource for the local AI community.
For more check out: - Website: https://localscore.ai - Demo video: https://youtu.be/De6pA1bQsHU - Blog post: https://localscore.ai/blog - CLI Github: https://github.com/Mozilla-Ocho/llamafile/tree/main/localsco... - Website Github: https://github.com/cjpais/localscore
mentalgear 2 weeks ago
Notes: - Olama integration would be nice - Is there an anonymous federated score sharing? That way, users you approximate a model's performance before downloading it.
sipjca 1 week ago
I totally agree with Ollama integration and if there is interest we will try to upstream into llama.cpp
jsatok 1 week ago
Happy to test larger models that utilize the memory capacity if helpful.
deanputney 1 week ago
zamadatix 1 week ago
Bandwidth at a given size is king, only then followed by enough compute to utilize it.
sipjca 1 week ago
david_draco 1 week ago
zamadatix 1 week ago
zamadatix 1 week ago
Edit: And for the same prompt and generated token counts it runs ~4x slower than `ollama run hf.co/bartowski/Qwen2.5-14B-Instruct-GGUF:Q4_K_M --verbose`. It's possible I'm mixing up a few things there but my results also post in the same ballpark slower than others with the same GPU so it seems something is up with the application in either case.