85 points by cmogni1 21 hours ago | 29 comments
jbellis 4 hours ago
[I guess that must be a useful market niche though, apparently this is by a company selling batch compute on exactly those small open weights models.]
The problem is the author is evaluating by dividing the Artificial Analysis score by a blended cost per token, but most tasks have an intelligence "floor" below which it doesn't matter how cheap something is, it will never succeed. And when you strip out the very high results from super cheap 4B OSS models the rest are significantly outclassed by Flash 2.0 (not on his chart but still worth considering) and 2.5, not to mention other models that might be better in domain specific tasks like grok-3 mini for code.
(Nobody should be using Haiku in 2025. The OpenAI mini models are not as bad as Haiku in p/p and maybe there is a use case for prefering one over Flash but if so I don't know what it is.)
dinosaurdynasty 2 hours ago
(This is a big advantage of open weight models; even if they're too big to host yourself, if it's worth anything there's a lot of competition for inference)
delichon 18 hours ago
ramesh31 19 hours ago
cortesoft 11 hours ago
xfalcox 14 hours ago
diggan 7 hours ago
ChromaticPanic 3 hours ago
diggan 3 hours ago
achierius 34 minutes ago
oooyay 3 hours ago
diggan 2 hours ago
> I may be spoiled in having worked for companies that have ML
Sounds likely, yeah, how many companies have ML departments today? DS departments seem common, but ML i'm not too sure about
mkl 16 hours ago
genewitch 15 hours ago
I opened aider and gave a small prompt, roughly:
Implement a JavaScript 2048 game that exists as flat file(s) and does not require a server, just the game HTML, CSS, and js. Make it compatible with firefox, at least.
That's it. Several hours later, it finished. The game ran. It was worth it because this was in the winter and it heated my house a bit, yay. I think the resulting 1-shot output is on my github.I know it was in the training set, etc, but I wanted to see how big of a hassle it was, if it would 1-shot with such a small prompt, how long it would take.
Makes me want to try deepseek 671B, but I don't have any machines with >1TB of memory.
I do take donations of hardware.
mechagodzilla 4 hours ago
3036e4 2 hours ago
yb6677 2 hours ago
mgraczyk 15 hours ago
cortesoft 11 hours ago
mkl 15 hours ago
shmoogy 14 hours ago
I definitely do appreciate and believe in the value of open source / open weight LLMs - but inference is so cheap right now for non frontier models.
jacob019 18 hours ago
behnamoh 18 hours ago
grepfru_it 15 hours ago
cootsnuck 4 hours ago
Aeolun 15 hours ago
ekianjo 11 hours ago
jbellis 4 hours ago
lostmsu 2 hours ago
ByteDrifter 10 hours ago