轻量高效，8B 性能强劲书生科学多模态模型Intern-S1-mini开源

魔搭ModelScope社区

42人浏览 · 2025-08-25 13:15:45

魔搭ModelScope社区 · 2025-08-25 13:15:45 发布

继 7 月 26 日开源『书生』科学多模态大模型 Intern-S1 之后，上海人工智能实验室（上海AI实验室）在8月23日推出了轻量化版本 Intern-S1-mini。

凭借领先的通用与专业科学能力，Intern-S1 上线后连续多日霸榜 HuggingFace 多模态 Trending 全球第一，并在开源社区引发了广泛关注。作为 8B 参数的“迷你模型”，Intern-S1-mini 同样兼具通用能力与专业科学能力，且更加适合快速部署和二次开发。Intern-S1-mini 性能速览：

通用能力稳居同量级第一梯队：在 MMLU-Pro、AIME2025、MMMU 等多项权威基准上表现卓越，展现出兼具稳定性与竞争力的综合实力；
科学专业能力优异：在化学、材料等任务表现尤为突出，在 SmolInstruct、ChemBench、MatBench 等基准测试中显著领先；在物理、地球、生物等学科任务中也保持第一梯队水平，体现出扎实的科学理解与跨领域泛化能力；
减身材不减实力，助力科研、开发、教育多种场景。

Intern-S1-mini 体验页面：

https://chat.intern-ai.org.cn

GitHub：

https://github.com/InternLM/Intern-S1

模型：

https://modelscope.cn/models/Shanghai_AI_Laboratory/Intern-S1-mini

01.模型亮点

在多技能学习过程中，模型常面临数据冲突，导致能力之间此消彼长。尤其是轻量模型，这种冲突更为明显。

Intern-S1-mini 延续大模型 Intern-S1 的设计理念，覆盖文本、图像、分子式、蛋白质等多模态、多任务数据领域。多模态与多任务的叠加，再加上轻量化的模型尺寸，对能力融合算法提出了极高要求。

研究团队充分发挥通专融合路线的技术优势，使 Intern-S1-mini 在各种能力之间实现了极致的平衡，能够同时兼顾文本、图文与科学能力，实现轻量模型也能拥有 “大模型” 的综合实力。

Intern-S1-mini 在多项通用评测基准（如 MMLU-Pro、AIME2025、MMMU）中表现出色，通用能力稳居第一梯队；在科学任务中同样卓越，尤其在化学、材料等领域表现突出，在 SmolInstruct、ChemBench、MatBench 等基准中显著领先；在物理、地球、生物等学科任务中也保持领先水平，展现强劲的专业学科实力。

02.模型推理&部署

模型推理

官方推荐使用以下超参数以确保更好的结果

top_p = 1.0
top_k = 50
min_p = 0.0
temperature = 0.8

Transformers

提供以下示例代码，说明如何基于文本和多模态输入生成内容

请使用 transformers>=4.55.2 以确保模型正常工作

文本输入

from modelscope import AutoProcessor, AutoModelForCausalLM,snapshot_download
import torch
model_name = snapshot_download("Shanghai_AI_Laboratory/Intern-S1-mini")
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"},
            {"type": "text", "text": "Please describe the image explicitly."},
        ],
    }
]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)

图像输入

from modelscope import AutoProcessor, AutoModelForCausalLM
import torch
model_name = "Shanghai_AI_Laboratory/Intern-S1-mini"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"},
            {"type": "text", "text": "Please describe the image explicitly."},
        ],
    }
]
inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to(model.device, dtype=torch.bfloat16)
generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)

视频输入

需通过 pip install decord 安装 decord 视频解码库。为了避免 OOM，请安装 flash_attention 并使用至少 2 块 GPU。

from modelscope import AutoProcessor, AutoModelForCausalLM
import torch
model_name = "Shanghai_AI_Laboratory/Intern-S1-mini"
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True)
messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "url": "https://huggingface.co/datasets/hf-internal-testing/fixtures_videos/resolve/main/tennis.mp4",
                },
                {"type": "text", "text": "What type of shot is the man performing?"},
            ],
        }
    ]
inputs = processor.apply_chat_template(
        messages,
        return_tensors="pt",
        add_generation_prompt=True,
        video_load_backend="decord",
        tokenize=True,
        return_dict=True,
    ).to(model.device, dtype=torch.float16)
generate_ids = model.generate(**inputs, max_new_tokens=32768)
decoded_output = processor.decode(generate_ids[0, inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(decoded_output)

服务部署

部署 Intern-S1 系列模型的最低硬件要求是：

模型

A100

(GPUs)

H800

(GPUs)

H100

(GPUs)

H200

(GPUs)

Intern-S1-mini

Intern-S1-mini-FP8

可以使用以下任意LLM 推理框架之一来创建一个与 OpenAI 兼容的服务器：

下载模型

modelscope download Shanghai_AI_Laboratory/Intern-S1-mini --local_dir ./Intern-S1-mini

lmdeploy(>=0.9.2)

lmdeploy serve api_server ./Intern-S1-mini --reasoning-parser intern-s1 --tool-call-parser intern-s1

vllm

vllm serve ./Intern-S1-mini --trust-remote-code

sglang

python3 -m sglang.launch_server \
    --model-path ./Intern-S1-mini \
    --trust-remote-code \
    --grammar-backend none

ollama 用于本地部署

# install ollama
curl -fsSL https://ollama.com/install.sh | sh
# fetch model
ollama pull internlm/interns1-mini
# run model
ollama run internlm/interns1-mini
# then use openai client to call on http://localhost:11434/v1

03.模型微调

ms-swift已经支持对Intern-S1系列模型进行训练。ms-swift是魔搭社区官方提供的大模型与多模态大模型训练部署框架。

ms-swift开源地址：

https://github.com/modelscope/ms-swift

下面以Intern-S1-mini模型为例，展示可运行的微调demo，并给出自定义数据集的格式。

在开始微调之前，请确保您的环境已准备妥当。

# pip install git+https://github.com/modelscope/ms-swift.git
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .
pip install git+https://github.com/huggingface/transformers.git

如果您需要自定义数据集微调模型，你可以将数据准备成以下格式。

{"messages": [{"role": "user", "content": "<image><image>What is the difference between the two images?"}, {"role": "assistant", "content": "The first one is a kitten, and the second one is a puppy."}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}

训练脚本：

# 36G
CUDA_VISIBLE_DEVICES=0 \
swift sft \
    --model Shanghai_AI_Laboratory/Intern-S1-mini \
    --dataset 'AI-ModelScope/LaTeX_OCR:human_handwrite#20000' \
    --split_dataset_ratio 0.01 \
    --train_type lora \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --freeze_vit true \
    --gradient_accumulation_steps 16 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4

显存占用：

训练完成后，使用以下命令进行推理：

CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --adapters output/vx-xxx/checkpoint-xxx \
    --stream true \
    --load_data_args true \
    --max_new_tokens 2048

推送模型到ModelScope：

swift export \
    --adapters output/vx-xxx/checkpoint-xxx \
    --push_to_hub true \
    --hub_model_id '<your-model-id>' \
    --hub_token '<your-sdk-token>'

点击链接，即可跳转模型链接~

https://modelscope.cn/models/Shanghai_AI_Laboratory/Intern-S1-mini