Qwen3-30B-A3B新版本发布，更轻更好用，提升指令遵循与长上下文理解能力！

魔搭ModelScope社区

173人浏览 · 2025-07-31 10:06:53

魔搭ModelScope社区 · 2025-07-31 10:06:53 发布

通义千问Qwen团队宣布本周进入“Flash week”，第一弹推出了更新版本的 Qwen3-30B-A3B 非思考和思考模式，命名为 Qwen3-30B-A3B-Instruct-2507 和 Qwen3-30B-A3B-Thinking-2507，具有以下关键改进：

显著提升 的通用能力，包括 指令遵循、逻辑推理、文本理解、数学、科学、编程和工具使用。
在 多种语言 中对长尾知识的覆盖有了 实质性提高。
在 主观和开放式任务 中与用户偏好 更紧密地对齐，能够提供更有帮助的响应和更高品质的文本生成。
增强了 256K 长上下文理解 能力。

Instruct性能：

Thinking性能：

模型链接：

Qwen3-30B-A3B-Instruct-2507：

https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507

Qwen3-30B-A3B-Thinking-2507：

https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Thinking-2507

01.模型推理

使用ModelScope SDK推理：

from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507-FP8"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)

对于部署，可以使用 sglang>=0.4.6.post1 或 vllm>=0.8.5 来创建一个与 OpenAI 兼容的 API 端点：

SGLang:

SGLANG_USE_MODELSCOPE=true python -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 --context-length 262144

vLLM:

VLLM_USE_MODELSCOPE=true vllm serve Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 --max-model-len 262144

02.模型微调

我们介绍使用ms-swift对Qwen3-30B-A3B-Instruct-2507进行自我认知LoRA微调。你需要准备2*80GiB的显卡资源。ms-swift是魔搭社区官方提供的大模型与多模态大模型训练部署框架。

ms-swift开源地址：https://github.com/modelscope/ms-swift

在开始微调之前，请确保您的环境已准备妥当。

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .

微调数据集准备格式如下（system字段可选），在训练脚本中指定`--dataset <dataset_path>`即可。

{"messages": [{"role": "user", "content": "浙江的省会在哪？"}, {"role": "assistant", "content": "浙江的省会在杭州。"}]}

使用transformers

显存占用：60GiB；训练速度：185s/it

训练脚本：

# Manually select `target_modules` to avoid 'all-linear' selecting 'gate'
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model Qwen/Qwen3-30B-A3B-Instruct-2507 \
    --train_type lora \
    --dataset 'swift/Chinese-Qwen3-235B-2507-Distill-data-110k-SFT#2000' \
              'swift/self-cognition#1000' \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules q_proj k_proj v_proj o_proj gate_proj up_proj down_proj \
    --gradient_accumulation_steps 16 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --system 'You are a helpful assistant.' \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --model_author swift \
    --model_name swift-robot

使用megaron

对megatron相关依赖的安装以及HF与MCore权重格式转换可以查看megatron-swift训练文档（可直接使用镜像）：https://swift.readthedocs.io/zh-cn/latest/Instruction/Megatron-SWIFT%E8%AE%AD%E7%BB%83.html

显存占用：2*50GiB，训练速度：6s/(gpu*it)

训练脚本：

PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=2 \
megatron sft \
    --load Qwen3-30B-A3B-Instruct-2507-mcore \
    --dataset 'swift/Chinese-Qwen3-235B-2507-Distill-data-110k-SFT#2000' \
              'swift/self-cognition#1000' \
    --train_type lora \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --split_dataset_ratio 0.01 \
    --expert_model_parallel_size 2 \
    --moe_grouped_gemm true \
    --moe_shared_expert_overlap true \
    --moe_aux_loss_coeff 1e-3 \
    --micro_batch_size 8 \
    --global_batch_size 16 \
    --recompute_granularity full \
    --recompute_method uniform \
    --recompute_num_layers 1 \
    --max_epochs 1 \
    --finetune true \
    --cross_entropy_loss_fusion true \
    --lr 1e-4 \
    --lr_warmup_fraction 0.05 \
    --min_lr 1e-5 \
    --save megatron_output/Qwen3-30B-A3B-Instruct-2507 \
    --eval_interval 200 \
    --save_interval 200 \
    --max_length 2048 \
    --num_workers 8 \
    --dataset_num_proc 8 \
    --no_save_optim true \
    --no_save_rng true \
    --sequence_parallel true \
    --attention_backend flash \
    --model_author swift \
    --model_name swift-robot

训练完成后，使用以下命令进行推理：

# 使用transformers训练最终会产生lora增量权重，请使用`--adapters output/vx-xxx/checkpoint-xxx`推理
# 使用megatron训练会在mcore->hf环节merge-lora，请使用`--model output/vx-xxx/checkpoint-xxx`推理
CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --model output/vx-xxx/checkpoint-xxx \
    --infer_backend vllm \
    --stream true \
    --vllm_max_model_len 8192 \
    --max_new_tokens 2048

推送模型到ModelScope：


swift export \
    --adapters output/vx-xxx/checkpoint-xxx \
    --push_to_hub true \
    --hub_model_id '<your-model-id>' \
    --hub_token '<your-sdk-token>'

点击阅读原文，即可跳转模型链接~

https://modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507