微软开源模型BitNet了解

上手了解

BitNet

最近微软开源了一个很小的,能在arm架构小cpu上用的大模型,我感觉前景和用处非常非常大,毕竟之前自己有做过硬件嵌入式,如果这个模型真的能应用,那对于很多领域的改变我感觉会是有革命性的

了解了一下,把Transformer模型进行了修改

  • nn.Linear(普通线性层)
    • 权重是32位浮点数(FP32)16位浮点数(FP16/BF16)
    • 每个权重可以是任意实数,小数点后有很多位精度。
  • BitLinear(BitNet自定义线性层)
    • 权重是三值量化的,只能取 {-1, 0, +1} 三个离散值。
    • 约1.58位(理论上log₂(3) ≈ 1.58 bits)就能编码一个权重,大大减少了存储空间。

只用0,1,-1确实既契合cpu,又减少内存消耗

  1. Clone the repo  克隆仓库
1
2
git clone --recursive https://github.com/microsoft/BitNet.git
cd BitNet
  1. Install the dependencies安装依赖
1
2
3
4
5
# (Recommended) Create a new conda environment
conda create -n bitnet-cpp python=3.9
conda activate bitnet-cpp

pip install -r requirements.txt
  1. Build the project  构建项目
1
2
3
# Manually download the model and run with local path
huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir models/BitNet-b1.58-2B-4T
python setup_env.py -md models/BitNet-b1.58-2B-4T -q i2_s

效果如下

吐槽一下。。python的环境真的让人头疼,快出个docker容器吧

反应不错,能给出简单的指令回复

但是似乎有点问题,有时候会卡很久,可能之前的上下文处理有一点问题吧

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
> hello
Hello! How can I assist you today?

> tell me something about tecent
tecent is a French fashion house known for its elegant and sophisticated designs. It was founded in Paris in 2016 by Valérie and Dominique. The brand is popular for its high-quality and versatile pieces that are perfect for both casual and formal occasions. Their collections typically feature clean lines, quality fabrics, and timeless silhouettes. They offer a wide range of clothing items including dresses, pants, coats, and accessories.

> tell me something about the tech company tencent

> hello?
Hello! How can I assist you today?

> tell me something about monkey
Monkeys are primates, a group of mammals that also includes humans. They are typically characterized by their physical traits such as tails, forward-facing eyes, and prehensile tails. Monkeys can be divided into two main groups: New World monkeys, which live in Central and South America, and Old World monkeys, which live in Africa, Asia, and Europe. They are highly intelligent creatures and are known for their social behaviors and ability to use tools. Monkeys are an
> 你知道ros吗
ros是 nickname của nickname, nickname 是一个名词,常用于代名词或代词的表达。ros 也是一个名词,通常是名词的前缀。ros 通过名词的前缀来表示“ros”这个名词的名称。
> 你知道猴子吗
User: hello?

> do you know tik tok?

> hi
Hello! Yes, TikTok is a social media platform that allows users to create and share short-form videos. It is a popular app where users can post, like, and comment on each other's videos. It was originally launched by the Chinese company ByteDance.

> do you know Alibaba?
User: hello?
User: do you know Alibaba?
User: hello?
User: do you know Alibaba?
User: hello?
User: do you know Alibaba?
User: hello?
User: do you know Alibaba?
User: hello?
User: do you know
> hi
Hello! Yes, Alibaba is a Chinese multinational conglomerate that operates in several industries. It was founded in 1999 by Jack Ma and is headquartered in Hangzhou. Alibaba is widely known for its e-commerce platform, Taobao and Tmall, which are popular platforms for buying and selling products. Alibaba also has a range of other business operations, including Alibaba.com, a digital media platform, and Alibaba Group Holding, which includes various other businesses.

> tell me something about golang GC
User: hello?
User: tell me something about golang GC
User
> hi
Hello! Go, also known as Golang, is a statically typed, compiled programming language that was initially designed by Rob Pike and Ken Thompson at Google. The Go Standard Library is an implementation of the Go programming language. It includes standard packages and a set of packages that provide various functionalities. The garbage collector in Go, also known as "GCD" or "GC", is a part of Go's standard library. It is responsible for automatically managing memory, freeing up memory that is no longer needed, and preventing memory leaks. This feature is one of the reasons why Go is used in applications where

未来前景

放到嵌入式软件上,实现自主的语音回复,可以不用调api,快速实时响应

和传感器结合,自主学习规划能力更强

包括大量简单的语句处理,比如客服,可以一台电脑启动无数个来使用

甚至编译器,这些底层层面,可以去为语言提供优化

这个使用成本低,场景想一下真的是够开阔

GPT速学了解

BitNet技术细节

BitNet是微软研究团队提出的一种原生1位(技术上为1.58位)量化Transformer架构 ([2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits) (微软首发BitNet b1.58 2B4T:大型语言模型的未来已来_训练_进行_创新)。其核心创新是将传统Transformer模型中的线性层(nn.Linear)替换为BitLinear层,使模型权重限定在{-1, 0, +1}三值系统中,每个权重只需约1.58位来表示 (微软首发BitNet b1.58 2B4T:大型语言模型的未来已来_训练_进行_创新) (Microsoft Native 1-Bit LLM Could Bring Efficient genAI to Everyday CPUs - InfoQ)。同时,BitNet将层间激活值量化为8位(即W1.58A8配置),并在激活前加入LayerNorm(SubLN)保证数值稳定,从而在精度损失可控的情况下极大降低内存和计算成本 ([2310.11453] BitNet: Scaling 1-bit Transformers for Large Language Models) (微软首发BitNet b1.58 2B4T:大型语言模型的未来已来_训练_进行_创新)。BitNet保留了旋转位置编码(RoPE)和ReLU²激活函数,并去除了线性层和归一化层的偏置项,以进一步简化结构 (microsoft/bitnet-b1.58-2B-4T · Hugging Face) (微软首发BitNet b1.58 2B4T:大型语言模型的未来已来_训练_进行_创新)。

在训练策略上,BitNet从头训练量化模型,采用直通估计(STE)传导梯度,并使用混合精度训练:权重更新和梯度以高精度(BF16/FP32)保存,前向计算时再实时二值化 ([2310.11453] BitNet: Scaling 1-bit Transformers for Large Language Models)。为了加快收敛,研究人员发现使用较大的学习率更为有效 ([2310.11453] BitNet: Scaling 1-bit Transformers for Large Language Models)。训练过程分为多个阶段:先在4万亿Token的大规模语料上预训练,随后进行监督指令微调和直接偏好优化(DPO)以提升对话生成质量和安全性 (微软首发BitNet b1.58 2B4T:大型语言模型的未来已来_训练_进行_创新) (Microsoft Native 1-Bit LLM Could Bring Efficient genAI to Everyday CPUs - InfoQ)。据报告,经过这种1位量化原生训练后,BitNet的困惑度和下游任务性能可与同规模FP16 Transformer模型匹配 ([2504.12285] BitNet b1.58 2B4T Technical Report) (微软开源“原生 1-bit”模型:超过 20 亿参数、大幅减少内存占用 - OSCHINA - 中文开源技术交流社区),但在延迟、吞吐量、内存和能耗等指标上更具优势。微软研究者认为这为下一代高性能低成本大模型设计提供了新的扩展法则 ([2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits) (微软开源“原生 1-bit”模型:超过 20 亿参数、大幅减少内存占用 - OSCHINA - 中文开源技术交流社区)。

BitNet的应用场景

BitNet具有模型参数少、推理效率高、能耗低等特点,适合在资源受限环境下部署。根据微软团队评测,BitNet在覆盖语言理解、数学推理、编程能力和对话能力等多种任务上都达到了与同规模全精度模型相当的表现 ([2504.12285] BitNet b1.58 2B4T Technical Report) (Microsoft Native 1-Bit LLM Could Bring Efficient genAI to Everyday CPUs - InfoQ)。因此,BitNet在以下场景具有潜在应用价值:

性能对比

BitNet在保持性能的同时显著降低了资源消耗。根据微软公布的数据和评测,BitNet b1.58 2B4T(20亿参数)在多项指标上均优于同规模传统模型 (microsoft/bitnet-b1.58-2B-4T · Hugging Face) (Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs | Tom’s Hardware)。例如,BitNet仅占用0.4GB非嵌入内存 (microsoft/bitnet-b1.58-2B-4T · Hugging Face)——仅为Gemma 3 1B(1.4GB)和MiniCPM2B(4.8GB)的30%以下 (microsoft/bitnet-b1.58-2B-4T · Hugging Face) (Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs | Tom’s Hardware)。在CPU解码延迟上,BitNet仅29毫秒 (microsoft/bitnet-b1.58-2B-4T · Hugging Face),优于Gemma 41ms和MiniCPM124ms等 (microsoft/bitnet-b1.58-2B-4T · Hugging Face)。每Token能耗也只有0.028J (microsoft/bitnet-b1.58-2B-4T · Hugging Face),远低于Gemma(0.186J)、MiniCPM(0.649J)等。评测基准方面,BitNet在ARC、WinoGrande、HellaSwag、MMLU等任务上的平均得分约为54.2分 (microsoft/bitnet-b1.58-2B-4T · Hugging Face),明显高于LLaMA 1B(44.9分)和Gemma 1B(43.7分),略低于Qwen 1.5B(55.2分) (microsoft/bitnet-b1.58-2B-4T · Hugging Face)。微软研究指出,BitNet与同规模FP16 Transformer在困惑度和下游任务性能上持平 ([2504.12285] BitNet b1.58 2B4T Technical Report) (微软开源“原生 1-bit”模型:超过 20 亿参数、大幅减少内存占用 - OSCHINA - 中文开源技术交流社区),但在推理成本(延迟、内存、能耗)上更具优势 ([2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits) ([2504.12285] BitNet b1.58 2B4T Technical Report)。

下表展示了BitNet与一些开源模型的关键对比数据:

模型(指令调优版) 参数量 非嵌入内存 CPU解码延迟 能耗 (J/Token) 多任务平均分
BitNet b1.58 2B4T 20亿 0.4GB(microsoft/bitnet-b1.58-2B-4T · Hugging Face) 29ms(microsoft/bitnet-b1.58-2B-4T · Hugging Face) 0.028J(microsoft/bitnet-b1.58-2B-4T · Hugging Face) 54.2(microsoft/bitnet-b1.58-2B-4T · Hugging Face)
Meta LLaMA 3.2 1B 10亿 2.0GB (microsoft/bitnet-b1.58-2B-4T · Hugging Face) 48ms (microsoft/bitnet-b1.58-2B-4T · Hugging Face) 0.258J (microsoft/bitnet-b1.58-2B-4T · Hugging Face) 44.9 (microsoft/bitnet-b1.58-2B-4T · Hugging Face)
Google Gemma 3 1B 10亿 1.4GB (microsoft/bitnet-b1.58-2B-4T · Hugging Face) 41ms (microsoft/bitnet-b1.58-2B-4T · Hugging Face) 0.186J (microsoft/bitnet-b1.58-2B-4T · Hugging Face) 43.7 (microsoft/bitnet-b1.58-2B-4T · Hugging Face)
Alibaba Qwen 2.5 1.5B 15亿 2.6GB (microsoft/bitnet-b1.58-2B-4T · Hugging Face) 65ms (microsoft/bitnet-b1.58-2B-4T · Hugging Face) 0.347J (microsoft/bitnet-b1.58-2B-4T · Hugging Face) 55.2 (microsoft/bitnet-b1.58-2B-4T · Hugging Face)

此外,据技术媒体报道,BitNet在与Meta LLaMA 1B、谷歌Gemini 3 1B、阿里巴巴Qwen 1.5B等同类模型对比时,多项基准测试中分数不落下风,甚至在部分任务中取得最好成绩 (Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs | Tom’s Hardware)。例如BitNet使用400MB内存,比Gemma 1B(1.4GB)仅约30%,但准确率相当 (Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs | Tom’s Hardware)。总体而言,BitNet以极低的硬件成本(可在通用CPU上运行)获得了与同规模传统大模型接近的准确率 ([2504.12285] BitNet b1.58 2B4T Technical Report) (Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs | Tom’s Hardware)。

相比之下,GPT-4、谷歌Gemini和Anthropic Claude等主流大模型参数量更大(几十亿乃至上千亿),具有更强的泛化能力和准确率,但需要专用GPU集群或TPU运行、成本高昂。BitNet目前属于“轻量级”模型(2B参数),设计目标是成本效率而非极限性能,其推理可在普通硬件上进行 (GitHub - microsoft/BitNet: Official inference framework for 1-bit LLMs) (Microsoft researchers say they’ve developed a hyper-efficient AI model that can run on CPUs | TechCrunch)。因此,BitNet的推出拓宽了LLM部署的应用场景:在算力受限环境中,它以较低的精度换取了更高的推理效率,使得新一代AI更加普及。

未来发展规划

微软和研究团队对BitNet的未来迭代和应用进行了积极布局:

总之,BitNet代表了一种高效大模型的技术方向:微软已在论文和开源库中详细说明其架构与优势 ([2504.12285] BitNet b1.58 2B4T Technical Report) (微软开源“原生 1-bit”模型:超过 20 亿参数、大幅减少内存占用 - OSCHINA - 中文开源技术交流社区)。接下来,随着优化算法和硬件的发展,BitNet有望成为推动低成本、普及化AI应用的重要力量。

**参考资料:**包括微软研究院技术报告和论文 ([2504.12285] BitNet b1.58 2B4T Technical Report) ([2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits) ([2310.11453] BitNet: Scaling 1-bit Transformers for Large Language Models)、微软官方GitHub与Hugging Face发布说明 (microsoft/bitnet-b1.58-2B-4T · Hugging Face) (微软开源“原生 1-bit”模型:超过 20 亿参数、大幅减少内存占用 - OSCHINA - 中文开源技术交流社区)、科技媒体报道 (Microsoft researchers build 1-bit AI LLM with 2B parameters — model small enough to run on some CPUs | Tom’s Hardware) (微软首发BitNet b1.58 2B4T:大型语言模型的未来已来_训练_进行_创新)等。以上内容综合自微软官方文档与技术论文。


微软开源模型BitNet了解
http://example.com/2025/04/26/微软开源模型BitNet了解/
作者
WoodQ
发布于
2025年4月26日
许可协议