概述
Seed2.0 系列模型正式发布, 提供 Pro、Lite、Mini 三款不同尺寸的通用 Agent 模型。
该系列通用模型的多模态理解能力实现全面升级,并强化了 LLM 与 Agent 能力,使模型在真实长链路任务中能够稳定推进。
Seed2.0 还进一步把能力边界从竞赛级推理扩展到研究级任务,在高经济价值与科研价值任务评测中达到业界第一梯队水平。
*Seed2.0 Lite 于 4 月底升级至新版本,是 Seed 大模型系列中首款全模态理解模型,支持视频、图像、音频、文本原生统一理解,并同时升级了 Agent、Coding 与 GUI 能力。
Seed2.0 Pro
Focuses on long-chain reasoning and robustness in complex workflows. Optimized for complex scenarios in real-world tasks.

Seed2.0 Lite
Balances output quality and response speed.
Ideal as a general-purpose, production-grade model.




Seed2.0 Mini
Optimized for inference throughput and deployment density. Designed for high concurrency and batch generation scenarios.




模型表现
Seed2.0 在视觉推理与感知上有着显著提升,在 BabyVision 等基准测试中达到 SOTA 水平。
面对视频的动态场景,Seed2.0 强化了时序理解与运动感知的基础能力,并在多个视频推理评测中取得 SOTA 结果。
Seed2.0 还重点强化了指令遵循能力,并在复杂 Agent 能力评估中,达到业界第一梯队水平。
*最新版本的 Seed2.0 Lite 可实现“音画结合”的跨模态推理,在视频及音频理解基准上均处于业界领先水平。
除普通基准测试外,我们更注重用户的实际体验。在该版本上线前,我们邀请了超 50 位方舟用户参与评测 Seed2.0 Lite 最新模型的 Coding Agent 能力 ,结果显示,新版本在该方向上的表现也明显提升。
全模态理解与交互应用
Seed2.0 可以处理复杂视觉输入,并完成实时交互和应用生成。无论是从图像中提取结构化信息,还是通过视觉输入生成交互式内容,Seed2.0 都能高效、稳定地完成任务。同时,Seed2.0 Lite 自 4 月底更新后支持音频输入,实现全模态理解。
可统一解析人类语音、背景音等音频输入,结合视频视觉信号,实现对复杂事件的整体分析。
专业复杂任务稳定推进
Seed2.0 大幅强化了 LLM 与 Agent 表现,在长链路、多步骤指令的任务中稳定且可靠。
Workflow gym - FreeCAD 双凸台建模体积/表面积读取任务
基于 FreeCAD Part Design 工作台,完成双凸台全流程建模与几何参数提取。
Workflow gym - FreeCAD 双凸台建模体积/表面积读取任务
评测结果
我们对 Seed2.0 系列进行了全面评估,其在推理、复杂指令执行、多模态理解等关键任务中均表现突出。最新 Seed2.0 Lite 模型性能提升显著,且在多个视频及音频理解基准中达到 SOTA 表现。
*右滑浏览全部模型结果。
*右滑浏览全部模型结果。
Capability
Benchmark
Seed2.0 Lite(0428)
Seed2.0 Lite(0215)
Seed2.0 Pro(0215)
GPT-5.4 Mini
Gemini 3 Flash
-
-
Capability
Benchmark
Seed2.0 Lite(0428)
Seed2.0 Lite(0215)
Seed2.0 Pro(0215)
GPT-5.4 High
Gemini 3 Flash
Gemini 3.1 Pro High
-
Capability
Benchmark
Seed2.0 Lite(0428)
Seed1.8
Claude Opus 4.7
Claude Sonnet 4.6
Claude Sonnet 4.5
GPT-5.4 High
Gemini 3.1 Pro
Capability
Benchmark
Seed2.0 Lite(0428)
Seed2.0 Lite(0215)
Seed2.0 Pro(0215)
Seed2.0 Mini(0215)
Gemini 3 Pro High
Gemini 3 Flash High
-
Capability
Benchmark
Seed2.0 Lite(0428)
Gemini-3.1-Pro
-
-
-
-
-
Capability
Benchmark
Seed2.0 Lite(0428)
Seed2.0 Lite(0215)
Seed2.0 Pro(0215)
GPT-5.4 Mini
Gemini 3 Flash
-
-
Knowledge
GPQA Diamond
88.4%
85.1%
88.9%
88.0%
90.7%
-
-
SuperGPQA
69.6%
67.5%
68.7%
63.9%
72.7%
-
-
HLE (no tool, text only)
25.7%
28.2%
32.4%
28.2%
31.7%
-
-
Reasoning
BeyondAIME
79.0%
76.0%
86.5%
80.0%
82.0%
-
-
FrontierSci-olympiad
72.0%
70.0%
74.0%
70.0%
73.0%
-
-
Superchem (text-only)
55.0%
48.0%
51.6%
29.1%
54.4%
-
-
BABE
57.9%
50.2%
53.5%
49.0%
55.2%
-
-
Instruction Following
CL-Bench
20.1%
20.0%
20.8%
14.9%
16.1%
-
-
MultiChallenge
69.9%
63.2%
68.3%
62.5%
69.3%
-
-
SearchAgent
WideSearch
70.3%
74.5%
74.7%
73.0%
64.0%
-
-
BrowseComp
64.0%
72.1%
77.3%
61.3%
41.5%
-
-
ResearchRubrics
59.2%
50.8%
50.7%
47.1%
36.9%
-
-
XPert Bench
56.8%
63.3%
64.5%
41.8%
50.1%
-
-
Real World
SkillsBench
43.7%
42.1%
42.3%
45.4%
26.4%
-
-
GDPval
53.1%
47.3%
54.4%
50.6%
13.7%
-
-
FinSearchComp
63.8%
65.1%
70.2%
61.8%
43.7%
-
-
Tob-Agent
51.4%
45.2%
52.6%
43.0%
37.4%
-
-
CodingAgent
SWE Multilingual
66.6%
64.4%
71.7%
73.6%
71.1%
-
-
SWE-Bench Pro
46.6%
46.0%
46.9%
54.4%
46.7%
-
-
NL2Repo-Bench
28.7%
24.6%
27.9%
37.3%
27.6%
-
-
PaperBench
52.5%
54.6%
53.8%
49.1%
33.9%
-
-
Terminal Bench 2.0
43.3%
45.0%
55.8%
60.0%
60.0%
-
-
Vibe Coding 人工评估
49.4%
48.7%
48.4%
57.4%
56.9%
-
-
Capability
Benchmark
Seed2.0 Lite(0428)
Seed2.0 Lite(0215)
Seed2.0 Pro(0215)
GPT-5.4 High
Gemini 3 Flash
Gemini 3.1 Pro High
-
STEM
MathVision
89.8
86.4
88.8
90.6
87.5
89.0
-
MMMU_Pro
78.4
76.0
78.2
79.2
80.4
82.5
-
HiPhO
83.8
72.5
74.1
84.3
78.0
86.6
-
MedXpertQA-MM
79.6
64.0
68.1
76.9
78.0
80.2
-
Perception
BabyVision
64.7
57.5
60.6
53.4
47.2
54.4
-
VLMBias
80.6
74.8
77.4
42.8
66.1
73.5
-
Visual Knowledge
SimpleVQA
72.7
67.2
71.4
56.0
68.4
70.5
-
WorldVQA
50.2
44.0
49.9
30.2
46.5
44.4
-
InfoGraphics
CharXiv-DQ
94.5
93.3
93.5
94.1
94.0
94.9
-
CharXiv-RQ
82.4
79.9
80.5
82.6
79.7
84.0
-
Embodied
ERQA
71.5
65.8
68.5
64.5
65.8
70.8
-
Capability
Benchmark
Seed2.0 Lite(0428)
Seed1.8
Claude Opus 4.7
Claude Sonnet 4.6
Claude Sonnet 4.5
GPT-5.4 High
Gemini 3.1 Pro
GUI
OSWorld-Verfied
64.4%
61.9%
78.0%
72.5%
62.9%
75.0%
64.0%
MobileWorld
64.6%
52.1%
56.4%
-
47.8%
-
57.3%
Capability
Benchmark
Seed2.0 Lite(0428)
Seed2.0 Lite(0215)
Seed2.0 Pro(0215)
Seed2.0 Mini(0215)
Gemini 3 Pro High
Gemini 3 Flash High
-
Video Knowledge
VideoMMMU
88.3
84.1
86.9
80.6
87.6*
88.1
-
MMVU
76.7
75.0
78.2
69.0
76.3
77.9
-
VideoSimpleQA-v2
69.0
65.0
71.5
64.9
-
-
-
VideoSimpleQA
71.7
66.6
71.9
67.7
72.4
70.0
-
SciVideo
70.3
51.4
52.3
35.3
-
74.1
-
Video Reasoning
VideoReasonBench
59.4
64.2
77.8
40.5
59.5
61.2
-
VideoHolmes
67.4
63.8
67.4
58.6
64.2
65.6
-
Minerva
68.5
63.8
66.5
54.7
65.0
64.4
-
Motion & Perception
TVBench
80.4
71.5
75.0
70.5
71.1
69.6
-
TOMATO
72.5
57.3
59.9
47.4
55.8
60.8
-
EgoTempo
68.4
61.8
71.8
67.2
65.4
58.4
-
MotionBench
72.4
70.9
75.2
65.1
70.3
68.9
-
ContPhy
62.4
56.1
67.4
55.9
61.1
62.0
-
Morese-500
34.6
32.2
37.4
32.2
33.0
32.4
-
Long Video
VideoMME
89.0
87.7
89.5
81.2
88.4*
85.2
-
VideoMMEv2
64.9
-
60.5*
-
66.1*
61.1*
-
CGBench
65.5
59.3
65.0
-
65.5
65.3
-
LongVideoBench
79.0
77.3
80.3
74.8
78.2
74.5
-
LVBench
76.4
73.0
76.4
66.6
-
-
-
VideoEval-Pro
49.5
44.3
47.3
43.7
-
51.9
-
Streaming Video
OVBench
63.2
65.5
69.2
60.1
63.5
59.2
-
ODVBench
66.0
69.6
72.5
65.1
63.6
56.7
-
LiveSports-3K
78.1
77.8
78.0
73.3
74.5
73.2
-
OVOBench
75.4
76.7
77.0
70.4
70.1
68.7
-
ViSpeak
87.0
84.0
78.5
77.5
89.0
88.0
-
Multi-video
CrossVid
63.7
57.7
61.0
58.6
53.0
48.7
-
Visual-Audio Understanding
OmniVideoBench
61.7
44.5
49.5
40.8
61.4(3.1Pro)
-
-
AVMeme
69.5
60.6
61.2
50.7
77.3(3.1Pro)
-
-
JointAVBench
69.5
56.7
62.3
52.7
-
-
-
WorldSense
67.3
57.0
57.0
52.7
65.5(3.1Pro)
-
-
Capability
Benchmark
Seed2.0 Lite(0428)
Gemini-3.1-Pro
-
-
-
-
-
Audio Understanding
MMSU
86.54
85.94
-
-
-
-
-
WildSpeech
75.81
75.41
-
-
-
-
-
ASR
WenetSpeech test-net
4.47
9.52
-
-
-
-
-
WenetSpeech test-meeting
5.31
12.80
-
-
-
-
-
Librispeech test-clean
1.07
1.94
-
-
-
-
-
Librispeech test-other
2.17
3.60
-
-
-
-
-
S2TT
Fleurs(15 langs)(zh/en<->xx)
74.70
73.14
-
-
-
-
-
ASR 性能采用 WER/CER 进行评估,其中数值越低表示性能越好。