一。安装依赖
1. 依赖要求
- python 3.8及以上版本
- pytorch 2.0及以上版本
- 建议使用CUDA 11.4及以上(GPU用户、flash-attention用户等需考虑此选项)
- python 3.8 and above
- pytorch 2.0 and above, 2.0 and above are recommended
- CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)
2. 安装miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh
sh Miniconda3-py38_4.12.0-Linux-x86_64.sh
3. 安装CUDA11.8
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run
添加环境变量
echo "export PATH=/u
添加环境变量 echo "export PATH=/u
添加环境变量 echo "export PATH=/u
sr/local/cuda-11.8/bin${PATH:+:${PATH}}" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" >> ~/.bashrc
sudo ldconfig载入安装后的动态链接库
source ~/.bashrc 使环境变量生效
测试安装情况
nvcc -V
4. 安装pytorch包
安装pytorch2.1.0,cuda11.8对应版本
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118 -i https://mirrors.aliyun.com/pypi/simple
5. 安装千问依赖项
运行Qwen-14B-Chat-Int4,请确保满足上述要求,再执行以下pip命令安装依赖库。如安装auto-gptq遇到问题,我们建议您到官方repo搜索合适的预编译wheel。
pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed -i https://mirrors.aliyun.com/pypi/simple
pip install auto-gptq optimum
另外,推荐安装flash-attention库(当前已支持flash attention 2),以实现更高的效率和更低的显存占用。
需要确认,网络是否可以访问:https://objects.githubusercontent.com/以及https://github.com/
否则pip install 过程会非常缓慢,而且失败
git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention && pip install .
# 下方安装可选,安装可能比较缓慢。
# pip install csrc/layer_norm
# pip install csrc/rotary
6. 下载模型
git clone https://www.modelscope.cn/qwen/Qwen-72B-Chat.git
7.测试
一般在魔搭社区的模型介绍页,提供的有测试脚本
#下面我们展示了一个使用Qwen-14B-Chat-Int4模型的样例:
from modelscope import AutoTokenizer, AutoModelForCausalLM
# Note: The default behavior now has injection attack prevention off.
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen-14B-Chat-Int4", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"qwen/Qwen-14B-Chat-Int4",
device_map="auto",
trust_remote_code=True
).eval()
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
# 你好!很高兴为你提供帮助。
安装modelspcop库:
pip install modelscope -i https://mirrors.aliyun.com/pypi/simple
运行脚本,就可以自动安装下载 模型文件了