在个人 Mac 电脑上, 跑大语言模型, 体验智慧涌现

方案是用 llama.cpp 跑 LLaMA 7B 模型. 先说一下我的 Mac 电脑配置, Intel 芯片, 32GB 内存, 如下图所示. 网上很多资料写的都是 Apple M1/M2 芯片, 还担心不支持 Intel 芯片呢. 测试可以的哈.

获取 LLaMA 7B 模型

完全下载需要 235GB 磁盘, 而我的电脑只有 214GB 了, 我只想测试一下, 于是只下载 7B 的模型, 如下截图. 由于一些使用协议, 详细下载信息就不贴了, 如有需要, 可以私聊.

编译 llama.cpp

克隆到本地, 执行 make 命令, 如下所示

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

将上一步下载到的模型文件, 移到 llama.cpp/models 目录下. 如下所示, 其中 consolidated.00.pth 文件有 13GB.

models
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── ggml-vocab.bin
├── llama.sh
├── tokenizer.model
└── tokenizer_checklist.chk

模型量化

准备环境, 建议用 Python 3.10, 我是用 conda 搭建的环境, 过程参见这里.

安装依赖

pip install torch numpy sentencepiece

执行转换, 生成 ggml-model-f16.bin 文件, 非常快就结束了.

python convert-pth-to-ggml.py models/7B/ 1

INT4 量化, 生成 ggml-model-q4_0.bin 文件, 也是非常快.

./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

最后看看生成文件, ggml-model-f16.bin 有 13GB, 量化后的 ggml-model-q4_0.bin 只有 3.9GB.

对话测试

./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128 -p 'The first man on the moon was '

输出结果: The first man on the moon was 25 years old in July of that year. NASA has announced its newest class of astronauts, with 12 men and women heading to training for a potential mission into space aboard one of the U.S.’s three remaining space shuttles—a first-time-in-more-than-two-decades opportunity. The group will also be the first since the late 1960s that was chosen without the aid of a computer algorithm to whittle down applicants. NASA has said it would be using its own selection process for astronaut candidates after being critic

本文测试的是一个很小的模型, 7B 参数, 而且为了在个人 Mac 电脑上能跑起来, 用了 INT4 量化, 另外, 对中文支持不好. 如果想要体验一下当下很火的 ChatGPT, 可以关注如下「知源笔记」公众号, 输入 “chat”, 即可开启对话. 这个工具还定制了很多常用的提示词, 随时随地开箱即用.

获取 LLaMA 7B 模型

编译 llama.cpp

模型量化

对话测试

参考资料