对接openclaw
启动命令 原版的llama.cpp
4个并发,单个并发131072上下文
/workspace/llama-server -m /workspace/model/Qwen3.6-27B-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 5000 -ngl 99 -t 8 --parallel 2 -c 262144 --mlock --cont-batching --reasoning off -ctk q8_0 -ctv q8_0 --no-context-shift
4个并发,单个并发131072上下文
/workspace/llama-server -m /workspace/model/Qwen3.6-27B-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 5000 -ngl 99 -t 8 --parallel 4 -c 524288 --mlock --cont-batching --reasoning off -ctk q8_0 -ctv q8_0 --no-context-shift
8个并发,单个并发131072上下文,显存占用45G多
/workspace/llama-server -m /workspace/model/Qwen3.6-27B-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 5000 -ngl 99 -t 8 --parallel 8 -c 1048576 --mlock --cont-batching --reasoning off -ctk q8_0 -ctv q8_0 --no-context-shift
turboquant版本的llama.cpp
-ctk turbo3 -ctv turbo3 -ctk q8_0 -ctv turbo4
2个并发,单个并发131072上下文
/workspace/llama-server-tq -m /workspace/model/Qwen3.6-27B-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 5000 -ngl 99 -t 8 --parallel 2 -c 262144 --mlock --cont-batching --reasoning off -ctk turbo4 -ctv turbo3 --no-context-shift
4个并发,单个并发131072上下文
/workspace/llama-server-tq -m /workspace/model/Qwen3.6-27B-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 5000 -ngl 99 -t 8 --parallel 4 -c 524288 --mlock --cont-batching --reasoning off -ctk turbo4 -ctv turbo3 --no-context-shift
8个并发,单个并发131072上下文,显存占用45G多
/workspace/llama-server-tq -m /workspace/model/Qwen3.6-27B-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 5000 -ngl 99 -t 8 --parallel 8 -c 1048576 --mlock --cont-batching --reasoning off -ctk turbo4 -ctv turbo3 --no-context-shift
dflash版本
2个并发,单个并发131072上下文
/workspace/llama-server-dflash -m /workspace/model/Qwen3.6-27B-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 5000 -ngl 99 -t 8 --parallel 2 -c 262144 --mlock --cont-batching --reasoning off -md /workspace/model/dflash-draft-3.6-q8_0.gguf --spec-type dflash -ngld 99 -np 1 -cd 512 --repeat-penalty 1.2 --temp 0.0 --jinja -fa on -ub 128 --draft-max 10 --draft-min 1 --chat-template-kwargs '{"enable_thinking": false}' -ctk turbo4 -ctv turbo3
4个并发,单个并发131072上下文
/workspace/llama-server-dflash -m /workspace/model/Qwen3.6-27B-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 5000 -ngl 99 -t 8 --parallel 4 -c 524288 --mlock --cont-batching --reasoning off -md /workspace/model/dflash-draft-3.6-q8_0.gguf --spec-type dflash -ngld 99 -np 1 -cd 512 --repeat-penalty 1.2 --temp 0.0 --jinja -fa on -ub 128 --draft-max 10 --draft-min 1 --chat-template-kwargs '{"enable_thinking": false}' -ctk turbo4 -ctv turbo3
8个并发,单个并发131072上下文,显存占用45G多
/workspace/llama-server-dflash -m /workspace/model/Qwen3.6-27B-UD-Q4_K_XL.gguf --host 0.0.0.0 --port 5000 -ngl 99 -t 8 --parallel 8 -c 1048576 --mlock --cont-batching --reasoning off -md /workspace/model/dflash-draft-3.6-q8_0.gguf --spec-type dflash -ngld 99 -np 1 -cd 512 --repeat-penalty 1.2 --temp 0.0
--jinja -fa on -ub 128 --draft-max 10 --draft-min 1
--chat-template-kwargs '{"enable_thinking": false}' -ctk turbo4 -ctv turbo3
openclaw config
"agents": {
"defaults": {
"workspace": "/home/mls/.openclaw/workspace",
"model": {
"primary": "cnb/Qwen3.6-27B-Q4"
},
"models": {
"modelscope/ZhipuAI/GLM-5.1": {"alias": "GLM-5.1"},
"cnb/Qwen3.6-27B-UD-Q4_K_XL.gguf": {"alias": "Qwen3.6-27B-Q4"}
}
}
}
"models": {
"mode": "merge",
"providers": {
"cnb": {
"baseUrl": "https://vd1odtlvc7-8082.cnb.run/v1",
"api": "openai-completions",
"apiKey": "ss-",
"models": [
{
"id": "Qwen3.6-27B-UD-Q4_K_XL.gguf",
"name": "Qwen3.6-27B-Q4",
"contextWindow": 262144,
"maxTokens": 262144,
"input": ["text"],
"cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0},
"reasoning": false
}
]
},
"modelscope": {
"baseUrl": "https://api-inference.modelscope.cn/v1",
"api": "openai-completions",
"apiKey": "ms-",
"models": [
{
"id": "ZhipuAI/GLM-5.1",
"name": "GLM-5.1",
"contextWindow": 202752,
"maxTokens": 202752,
"input": ["text"],
"cost": {"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0},
"reasoning": false
}
]
}
}
}
apikey 随便填,关闭了,视觉识别,个人感觉,目前针对openclaw没啥用,还不完善,报错太多.