华为昇腾910B基于Ctyunos部署Qwen3
华为昇腾910B基于Ctyunos部署Qwen3
环境准备
下载Qwen3模型文件
由于 modelscope
下载需要 Python 3.10 以上的支持,为了不影响系统的 Python 版本,可以选择在 Docker 容器中下载模型文件。
步骤 1: 启动 Docker 容器
运行以下命令启动一个容器:
1
2
docker run -d --name mypython --rm -v /mnt/nvme01/download:~/.cache/modelscope/hub \
666860.xyz/python:3.11-slim tail -f /dev/null
步骤 2: 进入容器内部
使用以下命令进入容器内部:
1
docker exec -it mypython bash
步骤 3: 下载模型文件
在容器内执行以下命令安装 modelscope
并下载模型文件:
1
2
pip install modelscope -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Llama-70B
下载完成后,模型文件会存放在指定的目录 /mnt/nvme01/download
。
拉取mindie容器镜像
建议选择我这个相同的版本,因为我到华为官网没有发现这个版本的,于是就下载了他们的最新版本(是不是最新的我不知道),会出现各种各样的问题,比如出现什么【qwen3不支持】哈哈🤣
1
docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/thxcode/mindie:2.0.T17-800I-A2-py311-openeuler24.03-lts-linuxarm64
拉取好镜像我们的食材就准备好了,接下来可以进行做饭了。
创建mindie配置文件
1
vim /root/config.json
填入下面的内容(下面的内容是我从容器中复制出来的,并且亲测过可以使用,没有特殊需求不需要改,相关配置信息可以查看官方手册):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
{
"Version" : "1.0.0",
"ServerConfig" :
{
"ipAddress" : "0.0.0.0",
"managementIpAddress" : "0.0.0.0",
"port" : 1025,
"managementPort" : 1026,
"metricsPort" : 1027,
"allowAllZeroIpListening" : true,
"maxLinkNum" : 1000,
"httpsEnabled" : false,
"fullTextEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/key_pwd.txt",
"tlsCrlPath" : "security/certs/",
"tlsCrlFiles" : ["server_crl.pem"],
"managementTlsCaFile" : ["management_ca.pem"],
"managementTlsCert" : "security/certs/management/server.pem",
"managementTlsPk" : "security/keys/management/server.key.pem",
"managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
"managementTlsCrlPath" : "security/management/certs/",
"managementTlsCrlFiles" : ["server_crl.pem"],
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"inferMode" : "standard",
"interCommTLSEnabled" : false,
"interCommPort" : 1121,
"interCommTlsCaPath" : "security/grpc/ca/",
"interCommTlsCaFiles" : ["ca.pem"],
"interCommTlsCert" : "security/grpc/certs/server.pem",
"interCommPk" : "security/grpc/keys/server.key.pem",
"interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
"interCommTlsCrlPath" : "security/grpc/certs/",
"interCommTlsCrlFiles" : ["server_crl.pem"],
"openAiSupport" : "vllm",
"tokenTimeout" : 600,
"e2eTimeout" : 600,
"distDPServerEnabled":false
},
"BackendConfig" : {
"backendName" : "mindieservice_llm_engine",
"modelInstanceNumber" : 1,
"npuDeviceIds" : [[0,1,2,3,4,5,6,7]],
"tokenizerProcessNumber" : 8,
"multiNodesInferEnabled" : false,
"multiNodesInferPort" : 1120,
"interNodeTLSEnabled" : false,
"interNodeTlsCaPath" : "security/grpc/ca/",
"interNodeTlsCaFiles" : ["ca.pem"],
"interNodeTlsCert" : "security/grpc/certs/server.pem",
"interNodeTlsPk" : "security/grpc/keys/server.key.pem",
"interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
"interNodeTlsCrlPath" : "security/grpc/certs/",
"interNodeTlsCrlFiles" : ["server_crl.pem"],
"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
"ModelDeployConfig" :
{
"maxSeqLen" : 32768,
"maxInputTokenLen" : 32768,
"truncation" : false,
"ModelConfig" : [
{
"modelInstanceType" : "Standard",
"modelName" : "Qwen3-32B",
"modelWeightPath" : "/mnt/nvme01/model/Qwen3-32B",
"worldSize" : 8,
"cpuMemSize" : 5,
"npuMemSize" : -1,
"backendType" : "atb",
"trustRemoteCode" : false
}
]
},
"ScheduleConfig" :
{
"templateType" : "Standard",
"templateName" : "Standard_LLM",
"cacheBlockSize" : 128,
"maxPrefillBatchSize" : 50,
"maxPrefillTokens" : 32768,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,
"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,
"maxBatchSize" : 200,
"maxIterTimes" : 32768,
"maxPreemptCount" : 0,
"supportSelectBatch" : false,
"maxQueueDelayMicroseconds" : 5000
}
}
}
创建容器
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
docker run -it -d --net=host --shm-size=1g \
--name Qwen3-32B \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
-v /usr/local/sbin:/usr/local/sbin:ro \
-v /root/qwen/config.json:/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json \
-v /mnt/nvme01/model/Qwen3-32B:/mnt/nvme01/model/Qwen3-32B:ro \
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/thxcode/mindie:2.0.T17-800I-A2-py311-openeuler24.03-lts-linuxarm64 bash
这样我们就创建了一个名为Qwen3-32B
的容器,可以使用下面命令进入容器进行操作:
1
docker exec -it Qwen3-32B bash
进入容器后,需要更新transformers,因为Qwen3需要使用新版本的transformers,在此感谢:@Lucent的博客https://lucent.blog/?p=xKdMOl2l
1
pip install --upgrade transformers==4.51.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
前台测试是否正常启动:
1
/usr/local/Ascend/mindie/latest/mindie-service/bin/mindieservice_daemon
如果有问题可以查看日志,看不懂可以在左侧找我的联系方式。
在后台启动:
1
nohup /usr/local/Ascend/mindie/latest/mindie-service/bin/mindieservice_daemon > /root/mindie/log/mindieservice_daemon.log 2>&1 &
如果需要重启的话,可以直接重启容器,但是重启容器后需要进入容器再次后台运行一下。后续查看运行日志在/root/mindie/log/mindieservice_daemon.log
本文由作者按照
CC BY 4.0
进行授权