魔搭社区模型速递（7.6-7.12）

魔搭ModelScope社区

11人浏览 · 2025-07-14 18:04:05

魔搭ModelScope社区 · 2025-07-14 18:04:05 发布

🙋魔搭ModelScope本期社区进展：

📟1481个模型：Kimi K2系列、SmolLM3-3B、Skywork-R1V3-38B、Phi-4-mini-flash-reasoning、Devstral-Small-2507、WebSailor-3B等；

📁324个数据集：arXiv-abstract-model2vec、SadeedDiac-25、opendebate等；

🎨528个创新应用：VibeDoc - AI驱动的开发计划生成器、宣传文本鉴查官等；

📄 7 篇内容：

Kimi K2 开源发布：擅长代码与 Agentic 任务！
TEN VAD 与 Turn Detection 让 Voice Agent 对话更拟人
2025魔搭开发者大会 · 全景回顾
AFAC2025金融智能创新大赛启动仪式圆满成功，汇聚各方力量共启创新赛事
黑森林Kontext LoRA：多种新奇图像编辑新玩法！魔搭首发！附炼丹训练指南
Gemma 3n正式版开源：谷歌全新端侧多模态大模型，2GB 内存就能跑，重提升编码和推理能力！
魔搭文生图MCP：一个MCP调用魔搭模型库的12800+个文生图模型！

01.模型推荐

Kimi K2

Kimi K2 是月之暗面最新开源发布的一款具备更强代码能力、更擅长通用 Agent 任务的 MoE 架构基础模型，总参数 1T，激活参数 32B。在 SWE Bench Verified、Tau2、AceBench 等基准性能测试中，Kimi K2 均取得开源模型中的 SOTA 成绩，展现出在代码、Agent、数学推理任务上的领先能力。

模型链接：

Kimi-K2-Base

https://www.modelscope.cn/models/moonshotai/Kimi-K2-Base

Kimi-K2-Instruct

https://www.modelscope.cn/models/moonshotai/Kimi-K2-Instruct

更多技术细节详见：

Kimi K2 开源发布：擅长代码与 Agentic 任务！

TEN 系列

为解决Voice Agent交互中“抢话”、“发呆”等问题，声网与RTE开发者社区联合开发并开源了TEN VAD与Turn Detection模型，依托声网十余年实时语音技术积累与超低延迟解决方案，致力于提升AI Agent的交互自然度。

TEN VAD 是一款轻量级、低延迟、低功耗的高精度语音活动检测模型，专为实时场景设计。其核心功能包括：识别音频帧中是否存在人声、定位语句的起止边界、过滤背景噪音与静音片段，从而为语音输入大模型（LLM）提供高效预处理。相比传统方案（如WebRTC VAD和Silero VAD），它在精确度与召回率上表现更优，可显著降低对话系统的端到端响应延迟。

TEN Turn Detection 则聚焦对话论次的智能判断，通过分析语言模式识别用户是否完成表达，实现类似人类对话的自然交替。该模型支持中英文双语及全双工语音交互（允许用户与AI同时发声），能够捕捉停顿、犹豫等细微线索，精准判断发言权切换时机，从而减少机械式等待，提升人机对话流畅度。

模型链接：

TEN VAD

https://modelscope.cn/models/TEN-framework/ten-vad

TEN Turn Detection

https://modelscope.cn/models/TEN-framework/TEN_Turn_Detection

示例代码：

通过 git clone 使用：

1、克隆仓库

git clone https://github.com/TEN-framework/ten-vad.git
cd ten-vad
apt install libc++-dev

2、进入 examples 目录

cd ./examples

3、测试

python test.py s0724-s0730.wav out.txt

TEN 轮次检测模型推理代码

from modelscope import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_id = 'TEN-framework/TEN_Turn_Detection'
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
# Move model to GPU
model = model.cuda()
model.eval()
# Function for inference
def analyze_text(text, system_prompt=""):
    inf_messages = [{"role":"system", "content":system_prompt}] + [{"role":"user", "content":text}]
    input_ids = tokenizer.apply_chat_template(
        inf_messages, 
        add_generation_prompt=True, 
        return_tensors="pt"
    ).cuda()
    with torch.no_grad():
        outputs = model.generate(
            input_ids, 
            max_new_tokens=1, 
            do_sample=True, 
            top_p=0.1, 
            temperature=0.1, 
            pad_token_id=tokenizer.eos_token_id
        )
    response = outputs[0][input_ids.shape[-1]:]
    return tokenizer.decode(response, skip_special_tokens=True)
# Example usage
text = "Hello I have a question about"
result = analyze_text(text)
print(f"Input: '{text}'")
print(f"Turn Detection Result: '{result}'")

Phi-4-mini-flash-reasoning

Phi-4-mini-flash-reasoning 是微软Phi-4 模型家族成员，一个基于合成数据构建的轻量级开放模型，专注于高质量、密集推理数据，并进一步微调以增强其高级数学推理能力，支持 64K token上下文长度。

Phi-4-mini-flash-reasoning 专为在内存/计算受限环境和延迟受限场景下的多步骤、逻辑密集型数学问题解决任务设计，擅长在需要深度分析思维的领域中跨步骤保持上下文、应用结构化逻辑并提供准确可靠的解决方案。

模型链接：

https://modelscope.cn/models/LLM-Research/Phi-4-mini-flash-reasoning

示例代码：

import torch
from modelscope import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)
model_id = "LLM-Research/Phi-4-mini-flash-reasoning"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [{
    "role": "user",
    "content": "How to solve 3*x^2+4*x+5=1?"
}]   
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
)
outputs = model.generate(
    **inputs.to(model.device),
    max_new_tokens=32768,
    temperature=0.6,
    top_p=0.95,
    do_sample=True,
)
outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])
print(outputs[0])

SmolLM3系列

SmolLM3 是由HuggingFace开源的一个 3B 参数的语言模型，旨在突破小型模型的界限。它支持双模式推理、6 种语言和长上下文，支持从64K扩展至128K的上下文处理，SmolLM3 是一个完全开放的模型，在 3B-4B 规模上提供了强大的性能。SmolLM3不仅公开了模型权重，还完整开源了训练数据混合、训练配置和代码。开发者可以通过Hugging Face的smollm存储库获取详细资料。

模型链接：

SmolLM3-3B

https://modelscope.cn/models/HuggingFaceTB/SmolLM3-3B

SmolLM3-3B-Base

https://modelscope.cn/models/HuggingFaceTB/SmolLM3-3B-Base

SmolLM3-3B-ONNX

https://modelscope.cn/models/HuggingFaceTB/SmolLM3-3B-ONNX

示例代码：

SmolLM3 的建模代码在 transformers v4.53.0 中可用，也可以使用最新的 vllm 加载模型。

pip install -U transformers

from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "HuggingFaceTB/SmolLM3-3B"
device = "cuda"  # for GPU usage or "cpu" for CPU usage
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
).to(device)
# prepare the model input
prompt = "Give me a brief explanation of gravity in simple terms."
messages_think = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages_think,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate the output
generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
# Get and decode the output
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

Skywork-R1V3-38B

Skywork-R1V3-38B 是昆仑天工Skywork-R1V 系列中最新且最强大的开源多模态推理模型。基于 InternVL-38B 构建，它显著推动了多模态和跨学科智能的边界。主要通过后训练中的 RL 算法，R1V3 的推理能力得到了增强，在众多多模态推理基准测试中达到了开源的最先进（SOTA）性能。

模型链接：

https://modelscope.cn/models/Skywork/Skywork-R1V3-38B

示例代码：

transformer推理

import torch
from modelscope import AutoModel, AutoTokenizer
from utils import load_image, split_model
import argparse
def main():
    parser = argparse.ArgumentParser(description="Run inference with Skywork-R1V model.")
    parser.add_argument('--model_path', type=str, default='Skywork/Skywork-R1V3-38B', help="Path to the model.")
    parser.add_argument('--image_paths', type=str, nargs='+', required=True, help="Path(s) to the image(s).")
    parser.add_argument('--question', type=str, required=True, help="Question to ask the model.")
    args = parser.parse_args()
    device_map = split_model(args.model_path)
    model = AutoModel.from_pretrained(
        args.model_path,
        torch_dtype=torch.bfloat16,
        load_in_8bit=False,
        low_cpu_mem_usage=True,
        use_flash_attn=True,
        trust_remote_code=True,
        device_map=device_map
    ).eval()
    tokenizer = AutoTokenizer.from_pretrained(args.model_path, trust_remote_code=True, use_fast=False)
    pixel_values = [load_image(img_path, max_num=12).to(torch.bfloat16).cuda() for img_path in args.image_paths]
    if len(pixel_values) > 1:
        num_patches_list = [img.size(0) for img in pixel_values]
        pixel_values = torch.cat(pixel_values, dim=0)
    else:
        pixel_values = pixel_values[0]
        num_patches_list = None

    prompt = "<image>\n"*len(args.image_paths) + args.question
    generation_config = dict(max_new_tokens=64000, do_sample=True, temperature=0.6, top_p=0.95, repetition_penalty=1.05)
    response = model.chat(tokenizer, pixel_values, prompt, generation_config, num_patches_list=num_patches_list)
    print(f'User: {args.question}\nAssistant: {response}')
if __name__ == '__main__':
    main()

vllm 推理


python -m vllm.entrypoints.openai.api_server --model $MODEL_PATH  --max_model_len 32768  --limit-mm-per-prompt "image=20" --tensor-parallel-size $N_GPU --dtype auto  --trust-remote-code

02.数据集推荐

arXiv-abstract-model2vec

“arXiv-abstract-model2vec” 数据集是基于 arXiv 论文摘要的文本向量化数据集，主要用于自然语言处理任务中的语义表示学习、文本相似性分析、主题建模以及文献推荐系统等研究和开发，能够帮助研究人员和开发者更好地理解和利用学术文献的语义信息。

数据集链接：
https://www.modelscope.cn/datasets/sleeping-ai/arXiv-abstract-model2vec