remix logo

Hacker Remix

Microsoft BitNet: inference framework for 1-bit LLMs

170 points by galeos 5 days ago | 33 comments

newfocogi 5 days ago

I'm enthusiastic about BitNet and the potential of low-bit LLMs - the papers show impressive perplexity scores matching full-precision models while drastically reducing compute and memory requirements. What's puzzling is we're not seeing any major providers announce plans to leverage this for their flagship models, despite the clear efficiency gains that could theoretically enable much larger architectures. I suspect there might be some hidden engineering challenges around specialized hardware requirements or training stability that aren't fully captured in the academic results, but would love insights from anyone closer to production deployment of these techniques.

swfsql 5 days ago

I think that since training must happen on a non-bitnet architecture, tuning towards bitnet is always a downgrade on it's capabilities, so they're not really interested in it. But maybe they could be if they'd offer cheaper plans, since it's efficiency is relatively good.

I think the real market for this is for local inference.

strangescript 5 days ago

I find it a little confusing as well. I wonder if its because so many of these companies have went all in on the "traditional" approach that deviating now seems like a big shift?

waynenilsen 5 days ago

I suppose hardware support would be very helpful, new instructions for bitpacked operations?

danielmarkbruce 5 days ago

People are almost certainly working on it. The people who are actually serious and think about things like this are less likely to just spout out "WE ARE BUILDING A CHIP OPTIMIZED FOR 1-BIT" or "WE ARE TRAINING A MODEL USING 1-BIT" etc, before actually being quite sure they can make it work at the required scale. It's still pretty researchy.

zamadatix 5 days ago

For anyone that hasn't read the previous papers before the "1.58-bit" part comes from using 3 values (-1, 0, 1) and log2[3]=1.58...

trebligdivad 5 days ago

Has some one made an FPGA or ASIC implementation yet? It feels like it should be easy (and people would snap up for inference).

alkh 5 days ago

Sorry for a stupid question but to clarify, even though it is a 1-bit model, it is supposed to be working with any types of embeddings, even taken from larger LLMs(in their example, they use HF1BitLLM/Llama3-8B-1.58-100B-tokens). I.e. it doesn't have an embedding layer built-in and relies on embedding provided separately?

danielmarkbruce 5 days ago

No. You can't put any type of embedding in.