魔搭社区模型速递（7.20-7.26）

魔搭ModelScope社区

38人浏览 · 2025-07-28 11:36:23

魔搭ModelScope社区 · 2025-07-28 11:36:23 发布

🙋魔搭ModelScope本期社区进展：

📟1698个模型：Qwen3-Coder-480B-A35B-Instruct、Intern-S1、HunyuanWorld-1、Qwen3-235B-A22B-Instruct-2507、Qwen3-235B-A22B-Thinking-2507、KAT-V1-40B 等；

📁216个数据集：Doc-750K、ru-arena-hard、TransEvalnia等；

🎨103个创新应用：Qwen3-Coder-WebDev、记忆消除术、index-tts-vllm等；

📄 7 篇内容：

来了！腾讯混元3D世界模型正式发布并开源
全能高手&科学明星，上海AI实验室开源发布『书生』科学多模态大模型Intern-S1 | WAIC 2025
上海创智学院联合无问芯穹发布Megrez2.0，本征架构突破端模型不可能三角，以终端算力撬动云端智能
Agent×MCP线下沙龙来咯，8.2杭州见！
Qwen3-“SmVL”：超小中文多模态LLM的多模型拼接微调之路
Qwen3 双弹发布！Qwen3-Coder + Instruct 更新版来袭
体验有礼！全新平台级 ModelScope MCP 实验场重磅上线！

01.模型推荐

Qwen3-Coder-480B-A35B-Instruct

Qwen3-Coder-480B-A35B-Instruct 是Qwen团队在本周开源的Qwen3-Coder系列的旗舰版本，模型采用480B参数总量、35B激活参数的MoE架构，原生支持256K上下文长度（通过YaRN可扩展至1M），在Agentic Coding、Browser-Use和Tool-Use等任务中达到开源模型的SOTA水平，表现接近Claude Sonnet4。

此外，Qwen团队还开源了配套工具Qwen Code ，其基于Gemini Code二次开发，通过Prompt工程与工具调用协议适配，最大化Qwen3-Coder在代理式编程场景下的表现。该模型还可与Claude Code、Cline等工具集成，推动“Agentic Coding in the World”的落地实践。

模型链接：

https://www.modelscope.cn/models/Qwen/Qwen3-Coder-480B-A35B-Instruct

示例代码：

推荐使用vLLM进行推理

VLLM_USE_MODELSCOPE=True vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
  --enable-expert-parallel \
  --data-parallel-size 8 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3-coder

Intern-S1

上海人工智能实验室在WAIC 2025发布并开源了『书生』科学多模态大模型Intern-S1，基于2350亿参数MoE语言架构与60亿参数视觉编码器，该模型首创"跨模态科学解析引擎"，可精准解读化学分子式、蛋白质结构、地震波信号等复杂科学数据。

Intern-S1支持规划化合物合成路线、分析蛋白质序列等专业科研任务。在5万亿个多模态数据token上预训练，其中超2.5万亿来自科学领域。多模态综合能力超越同期最优开源模型，多学科性能超Grok4等前沿闭源模型，兼具强大的通用任务处理与顶尖科学领域性能，为开发者提供专业级科学推理能力。

模型链接：https://modelscope.cn/models/Shanghai_AI_Laboratory/Intern-S1

示例代码：

提供基于文本和多模态输入的推理代码示例，需使用 transformers>=4.53.0 以确保模型正常工作

文本输入

from modelscope import AutoProcessor, AutoModelForCausalLM
import torch
model_name = "Shanghai_AI_Laboratory/Intern-S1"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "tell me about an interesting physical phenomenon."},
        ],
    }
]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)

图像输入

from modelscope import AutoProcessor, AutoModelForCausalLM
import torch
model_name = "Shanghai_AI_Laboratory/Intern-S1"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"},
            {"type": "text", "text": "Please describe the image explicitly."},
        ],
    }
]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)

HunyuanWorld-1

在WAIC 2025腾讯论坛上，腾讯正式发布混元3D世界模型 1.0（HunyuanWorld-1），并全面开源。这是业界首个开源可沉浸漫游、可交互、可仿真的世界生成模型，为游戏开发、VR、数字内容创作等领域带来了全新可能。

腾讯混元3D世界模型1.0融合了全景视觉生成与分层3D重建技术，同时支持文字和图片输入，实现高质量、风格多样的可漫游3D场景生成。混元3D世界模型1.0在文生世界、图生世界的美学质量和指令遵循能力等关键维度均全面超越当前SOTA的开源模型。

模型链接：https://modelscope.cn/models/Tencent-Hunyuan/HunyuanWorld-1

Qwen3-235B-A22B-Instruct-2507

作为Qwen3系列的旗舰通用模型更新版，Qwen3-235B-A22B-Instruct-2507 延续了Qwen3-235B-A22B的非思考模式（Non-Thinking）架构设计，但针对关键指标进行了系统性优化。该模型保持94层深度、分组查询注意力（GQA, Q=64, KV=4）及MoE结构（128专家，激活8专家），总参数量235B（激活22B），非嵌入参数占比达234B。其原生支持262,144 tokens上下文长度，为复杂长文本任务提供底层支持。在GPQA（知识）、AIME25（数学）、LiveCodeBench（编程）等评测中，超越了目前主流开源模型及 Claude-Opus4-Non-thinking等闭源模型。

Qwen团队通过预训练与后训练双阶段范式进一步提升模型性能，具体优化方向包括：

多语言长尾知识覆盖：通过数据增强提升低频语言与领域的泛化能力；
用户偏好对齐：在主观任务中增强回复有用性与文本质量；
长文本建模：将上下文理解能力扩展至256K tokens，优化复杂场景下的推理连贯性。

模型链接：

Qwen3-235B-A22B-Instruct-2507：

https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507

Qwen3-235B-A22B-Instruct-2507-FP8：

https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

示例代码：

SGLang

SGLANG_USE_MODELSCOPE=true python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B-Instruct-2507 --tp 8 --context-length 262144

vLLM

VLLM_USE_MODELSCOPE=true vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 --tensor-parallel-size 8 --max-model-len 262144

更多模型微调实战教程，详见

Qwen3 双弹发布！Qwen3-Coder + Instruct 更新版来袭

Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507是Qwen3-235B-A22B 的思考模式（Thinking）强化版本，该模型专为复杂推理任务设计，强制启用思维链机制以实现深度问题拆解。其推理性能突破开源模型极限，在编程（LiveCodeBench）、数学（AIME25）等核心领域达到 SOTA 水平，同时知识广度（SuperGPQA）、创意写作（WritingBench）及多语言理解（MultilF）均有显著进步，综合能力可对标 Gemini-2.5 Pro 等闭源方案。同时，模型支持 256K 长文本理解，处理超长上下文不费力。

模型链接：

Qwen3-235B-A22B-Thinking-2507

https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507-FP8

https://www.modelscope.cn/models/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8

示例代码：

使用sglang>=0.4.6.post1或vllm>=0.8.5

SGLang

SGLANG_USE_MODELSCOPE=true python -m sglang.launch_server --model-path Qwen/Qwen3-235B-A22B-Thinking-2507 --tp 8 --context-length 262144  --reasoning-parser deepseek-r1

vLLM

VLLM_USE_MODELSCOPE=true vllm serve Qwen/Qwen3-235B-A22B-Thinking-2507 --tensor-parallel-size 8 --max-model-len 262144 --enable-reasoning --reasoning-parser deepseek_r1

KAT-V1-40B

KAT (Kwaipilot-AutoThink) 是由快手科技近期开源的一款融合思考与非思考能力、并且可以根据问题难度自动切换思考形态的模型。

KAT-V1模型共有40B和200B两个版本。在自动思考模式下，40B版本的性能可追平DeepSeek-R1-0528（参数量6850亿）。而200B版本的模型，则在多项基准测试中超过了主流开源模型家族中的旗舰模型。

模型链接：

https://www.modelscope.cn/models/Kwaipilot/KAT-V1-40B

示例代码：

from modelscope import AutoTokenizer, AutoModelForCausalLM
model_name = "Kwaipilot/KAT-V1-40B"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=65536,
    temperature=0.6,
    top_p=0.95,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
content = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
print("prompt:\n", prompt)
print("content:\n", content)
"""
prompt:
Give me a short introduction to large language model.
content:
<judge>
The user's request is to provide a concise factual introduction to large language models, which involves retrieving and summarizing basic information. This task is straightforward as it only requires recalling and presenting well-known details without deeper analysis. No complex reasoning is needed here—just a simple explanation will suffice.
</judge>
<think_off>
<answer>
A **Large Language Model (LLM)** is an advanced AI system trained on vast amounts of text data to understand, generate, and process human-like language. Here’s a concise introduction:
### Key Points:
1. **Training**: Trained on diverse text sources (books, websites, etc.) using deep learning.
2. **Capabilities**: 
   - Answer questions, generate text, summarize content, translate languages.
   - Understand context, sentiment, and nuances in language.
3. **Architecture**: Often based on **transformer models** (e.g., BERT, GPT, LLaMA).
4. **Scale**: Billions of parameters, requiring massive computational resources.
5. **Applications**: Chatbots, content creation, coding assistance, research, and more.
### Examples:
- **OpenAI’s GPT-4**: Powers ChatGPT.
- **Google’s Gemini**: Used in Bard.
- **Meta’s LLaMA**: Open-source alternative.
### Challenges:
- **Bias**: Can reflect biases in training data.
- **Accuracy**: May hallucinate "facts" not grounded in reality.
- **Ethics**: Raises concerns about misinformation and job displacement.
LLMs represent a leap forward in natural language processing, enabling machines to interact with humans in increasingly sophisticated ways. 🌐🤖
</answer>
"""