ModelScope Community Website
中文 | English
Paper | English Documentation | 中文文档
You can contact us and communicate with us by adding our group:
| Discord Group | WeChat Group |
|---|---|
![]() | ![]() |
🍲 ms-swift is a large model and multimodal large model fine-tuning and deployment framework provided by the ModelScope community. It now supports training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment for 600+ text-only large models and 400+ multimodal large models. Large models include: Qwen3, Qwen3.5, InternLM3, GLM4.5, Mistral, DeepSeek-R1, Llama4, etc. Multimodal large models include: Qwen3-VL, Qwen3-Omni, Llava, InternVL3.5, MiniCPM-V-4, Ovis2.5, GLM4.5-V, DeepSeek-VL2, etc.
🍔 In addition, ms-swift integrates the latest training technologies, including Megatron parallelism techniques such as TP, PP, CP, EP to accelerate training, as well as numerous GRPO algorithm family reinforcement learning algorithms including: GRPO, DAPO, GSPO, SAPO, CISPO, RLOO, Reinforce++, etc. to enhance model intelligence. ms-swift supports a wide range of training tasks, including preference learning algorithms such as DPO, KTO, RM, CPO, SimPO, ORPO, as well as Embedding, Reranker, and sequence classification tasks. ms-swift provides full-pipeline support for large model training, including acceleration for inference, evaluation, and deployment modules using vLLM, SGLang, and LMDeploy, as well as model quantization using GPTQ, AWQ, BNB, and FP8 technologies.
Why Choose ms-swift?
--sequence_parallel_size N.--enable_dft_loss true. Training scripts can be found here.swift sample command is now supported. The reinforcement fine-tuning script can be found here, and the large model API distillation sampling script is available here.--infer_backend vllm/lmdeploy.To install using pip:
pip install ms-swift -U
# Using uv
pip install uv
uv pip install ms-swift -U --torch-backend=auto
To install from source:
# pip install git+https://github.com/modelscope/ms-swift.git
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
# The main branch is for swift 4.x. To install swift 3.x, please run the following command:
# git checkout release/3.12
pip install -e .
# Using uv
uv pip install -e . --torch-backend=auto
Running Environment:
| Range | Recommended | Notes | |
|---|---|---|---|
| python | >=3.9 | 3.11/3.12 | |
| cuda | cuda12 | No need to install if using CPU, NPU, MPS | |
| torch | >=2.0 | 2.8.0/2.10.0 | |
| transformers | >=4.33 | 4.57.6/5.2.0 | |
| modelscope | >=1.23 | ||
| peft | >=0.11,<0.19 | ||
| flash_attn | 2.8.3/3.0.0b1 | ||
| trl | >=0.15,<0.29 | 0.28.0 | RLHF |
| deepspeed | >=0.14 | 0.18.8 | Training |
| vllm | >=0.5.1 | 0.11.0/0.17.1 | Inference/Deployment |
| sglang | >=0.4.6 | Inference/Deployment | |
| lmdeploy | >=0.5 | 0.10.1 | Inference/Deployment |
| evalscope | >=1.0 | Evaluation | |
| gradio | 5.32.1 | Web-UI/App |
For more optional dependencies, you can refer to here.
10 minutes of self-cognition fine-tuning of Qwen3-4B-Instruct-2507 on a single 3090 GPU:
# 13GB
CUDA_VISIBLE_DEVICES=0 \
swift sft \
--model Qwen/Qwen3-4B-Instruct-2507 \
--tuner_type lora \
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500' \
'AI-ModelScope/alpaca-gpt4-data-en#500' \
'swift/self-cognition#500' \
--torch_dtype bfloat16 \
--num_train_epochs 1 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--gradient_accumulation_steps 16 \
--eval_steps 50 \
--save_steps 50 \
--save_total_limit 2 \
--logging_steps 5 \
--max_length 2048 \
--output_dir output \
--warmup_ratio 0.05 \
--dataloader_num_workers 4 \
--model_author swift \
--model_name swift-robot
Tips:
--dataset <dataset_path>.--model_author and --model_name parameters are only effective when the dataset includes swift/self-cognition.--model <model_id/model_path>.--use_hf true.After training is complete, use the following command to infer with the trained weights:
--adapters should be replaced with the last checkpoint folder generated during training. Since the adapters folder contains the training parameter file args.json, there is no need to specify --model, --system separately; Swift will automatically read these parameters. To disable this behavior, you can set --load_args false.# Using an interactive command line for inference.
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--adapters output/vx-xxx/checkpoint-xxx \
--stream true \
--temperature 0 \
--max_new_tokens 2048
# merge-lora and use vLLM for inference acceleration
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--adapters output/vx-xxx/checkpoint-xxx \
--stream true \
--merge_lora true \
--infer_backend vllm \
--vllm_max_model_len 8192 \
--temperature 0 \
--max_new_tokens 2048
Finally, use the following command to push the model to ModelScope:
CUDA_VISIBLE_DEVICES=0 \ swift export \ --adapters output/vx-xxx/checkpoint-xxx \ --push_to_hub true \ --hub_model_id '<your-model-id>' \ --hub_token '<your-sdk-token>' \ --use_hf false
The Web-UI is a zero-threshold training and deployment interface solution based on Gradio interface technology. For more details, you can check here.
SWIFT_UI_LANG=en swift web-ui

ms-swift also supports training and inference using Python. Below is pseudocode for training and inference. For more details, you can refer to here.
Training:
from peft import LoraConfig, get_peft_model
from swift import get_model_processor, get_template, load_dataset, EncodePreprocessor
from swift.trainers import Seq2SeqTrainer, Seq2SeqTrainingArguments
# Retrieve the model and template, and add a trainable LoRA module
model, tokenizer = get_model_processor(model_id_or_path, ...)
template = get_template(tokenizer, ...)
lora_config = LoraConfig(...)
model = get_peft_model(model, lora_config)
# Download and load the dataset, and encode the text into tokens
train_dataset, val_dataset = load_dataset(dataset_id_or_path, ...)
train_dataset = EncodePreprocessor(template=template)(train_dataset, num_proc=num_proc)
val_dataset = EncodePreprocessor(template=template)(val_dataset, num_proc=num_proc)
# Train the model
training_args = Seq2SeqTrainingArguments(...)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
template=template,
train_dataset=train_dataset,
eval_dataset=val_dataset,
)
trainer.train()
Inference:
from swift import TransformersEngine, InferRequest, RequestConfig
# Perform inference using the native Transformers engine
engine = TransformersEngine(model_id_or_path, adapters=[lora_checkpoint])
infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}])
request_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)
resp_list = engine.infer([infer_request], request_config)
print(f'response: {resp_list[0].choices[0].message.content}')
Here is a minimal example of training to deployment using ms-swift. For more details, you can check the examples.
--model to specify the corresponding model's ID or path, and modify --dataset to specify the corresponding dataset's ID or path.--use_hf true.| Useful Links |
|---|
| 🔥Command Line Parameters |
| Megatron-SWIFT |
| GRPO |
| Supported Models and Datasets |
| Custom Models, 🔥Custom Datasets |
| LLM Tutorial |
Supported Training Methods:
| Method | Full-Parameter | LoRA | QLoRA | Deepspeed | Multi-Machine | Multimodal |
|---|---|---|---|---|---|---|
| Pre-training | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Supervised Fine-Tuning | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| GRPO | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| GKD | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| PPO | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| DPO | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| KTO | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Reward Model | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| CPO | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| SimPO | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ORPO | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Embedding | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Reranker | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Sequence Classification | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Pre-training:
# 8*A100
NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift pt \
--model Qwen/Qwen2.5-7B \
--dataset swift/chinese-c4 \
--streaming true \
--tuner_type full \
--deepspeed zero2 \
--output_dir output \
--max_steps 10000 \
...
Fine-tuning:
CUDA_VISIBLE_DEVICES=0 swift sft \ --model Qwen/Qwen2.5-7B-Instruct \ --dataset AI-ModelScope/alpaca-gpt4-data-en \ --tuner_type lora \ --output_dir output \ ...
RLHF:
CUDA_VISIBLE_DEVICES=0 swift rlhf \ --rlhf_type dpo \ --model Qwen/Qwen2.5-7B-Instruct \ --dataset hjh0119/shareAI-Llama3-DPO-zh-en-emoji \ --tuner_type lora \ --output_dir output \ ...
ms-swift supports using Megatron parallelism techniques to accelerate training, including large-scale cluster training and MoE model training. The following training methods are supported:
| Method | Full-Parameter | LoRA | MoE | Multimodal | FP8 |
|---|---|---|---|---|---|
| Pre-training | ✅ | ✅ | ✅ | ✅ | ✅ |
| Supervised Fine-Tuning | ✅ | ✅ | ✅ | ✅ | ✅ |
| GRPO | ✅ | ✅ | ✅ | ✅ | ✅ |
| GKD | ✅ | ✅ | ✅ | ✅ | ✅ |
| DPO | ✅ | ✅ | ✅ | ✅ | ✅ |
| KTO | ✅ | ✅ | ✅ | ✅ | ✅ |
| RM | ✅ | ✅ | ✅ | ✅ | ✅ |
| Embedding | ✅ | ✅ | ✅ | ✅ | ✅ |
| Reranker | ✅ | ✅ | ✅ | ✅ | ✅ |
| Sequence Classification | ✅ | ✅ | ✅ | ✅ | ✅ |
NPROC_PER_NODE=2 CUDA_VISIBLE_DEVICES=0,1 megatron sft \ --model Qwen/Qwen2.5-7B-Instruct \ --save_safetensors true \ --dataset AI-ModelScope/alpaca-gpt4-data-zh \ --tuner_type lora \ --output_dir output \ ...
ms-swift supports a rich set of GRPO family algorithms:
| Method | Full-Parameter | LoRA | Multimodal | Multi-Machine |
|---|---|---|---|---|
| GRPO | ✅ | ✅ | ✅ | ✅ |
| DAPO | ✅ | ✅ | ✅ | ✅ |
| GSPO | ✅ | ✅ | ✅ | ✅ |
| SAPO | ✅ | ✅ | ✅ | ✅ |
| CISPO | ✅ | ✅ | ✅ | ✅ |
| CHORD | ✅ | ✅ | ✅ | ✅ |
| RLOO | ✅ | ✅ | ✅ | ✅ |
| Reinforce++ | ✅ | ✅ | ✅ | ✅ |
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 \ swift rlhf \ --rlhf_type grpo \ --model Qwen/Qwen2.5-7B-Instruct \ --tuner_type lora \ --use_vllm true \ --vllm_mode colocate \ --dataset AI-MO/NuminaMath-TIR#10000 \ --output_dir output \ ...
CUDA_VISIBLE_DEVICES=0 swift infer \
--model Qwen/Qwen2.5-7B-Instruct \
--stream true \
--infer_backend transformers \
--max_new_tokens 2048
# LoRA
CUDA_VISIBLE_DEVICES=0 swift infer \
--model Qwen/Qwen2.5-7B-Instruct \
--adapters swift/test_lora \
--stream true \
--infer_backend transformers \
--temperature 0 \
--max_new_tokens 2048
CUDA_VISIBLE_DEVICES=0 swift app \ --model Qwen/Qwen2.5-7B-Instruct \ --stream true \ --infer_backend transformers \ --max_new_tokens 2048
CUDA_VISIBLE_DEVICES=0 swift deploy \ --model Qwen/Qwen2.5-7B-Instruct \ --infer_backend vllm
CUDA_VISIBLE_DEVICES=0 swift sample \ --model LLM-Research/Meta-Llama-3.1-8B-Instruct \ --sampler_engine transformers \ --num_return_sequences 5 \ --dataset AI-ModelScope/alpaca-gpt4-data-zh#5
CUDA_VISIBLE_DEVICES=0 swift eval \ --model Qwen/Qwen2.5-7B-Instruct \ --infer_backend lmdeploy \ --eval_backend OpenCompass \ --eval_dataset ARC_c
CUDA_VISIBLE_DEVICES=0 swift export \ --model Qwen/Qwen2.5-7B-Instruct \ --quant_bits 4 --quant_method awq \ --dataset AI-ModelScope/alpaca-gpt4-data-zh \ --output_dir Qwen2.5-7B-Instruct-AWQ
swift export \ --model <model-path> \ --push_to_hub true \ --hub_model_id '<model-id>' \ --hub_token '<sdk-token>'
This framework is licensed under the Apache License (Version 2.0). For models and datasets, please refer to the original resource page and follow the corresponding License.
@misc{zhao2024swiftascalablelightweightinfrastructure, title={SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning}, author={Yuze Zhao and Jintao Huang and Jinghan Hu and Xingjun Wang and Yunlin Mao and Daoze Zhang and Zeyinzi Jiang and Zhikai Wu and Baole Ai and Ang Wang and Wenmeng Zhou and Yingda Chen}, year={2024}, eprint={2408.05517}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.05517}, }