Overview

The Seed2.1 model family is officially released, offering two AI productivity models in different sizes: Pro and Turbo.

Seed2.1 brings major improvements to both general agents and code engineering. For high-value office work and complex everyday consultations, it reliably handles multi-step tasks—from project planning and file processing to tool use—and delivers practical results across tools and environments. In code engineering, it strengthens end-to-end delivery, enabling more reliable requirement understanding, coding, debugging, and validation in real development workflows.

The model family also advances core capabilities in knowledge, reasoning, and multimodal understanding, handling complex visual and video content more accurately. Together, these improvements provide a stronger foundation for agentic scenarios, code engineering, and frontier exploration.

Model Performance

Seed2.1 delivers significantly stronger general-agent capabilities, with standout performance on benchmarks such as GDPVal and more reliable delivery for high-value tasks.Seed2.1 greatly improves delivery stability in coding tasks, independently handling software architecture design and code implementation.Seed2.1 continues to strengthen visual understanding, spatial reasoning, and long-context processing, helping agents make steady progress on complex tasks.Seed2.1 achieves SOTA results across multiple video understanding benchmarks and continues to improve hour-long video processing, with more accurate understanding of temporal changes, actions, and physical motion.

Showcase

Seed2.1 further strengthens agent execution and coding delivery, enabling the model to make steady progress on complex real-world productivity tasks and deliver practical, verifiable results.

Seed2.1 further connects multimodal perception, understanding, and execution, enabling multimodal inputs to directly support generation, editing, and task execution.

Evaluation Results

Across a broad set of benchmarks, Seed2.1 delivers strong, competitive performance.

BenchmarkCapability

Seed2.1 Pro

Seed2.1 Turbo

Claude Opus 4.7

GPT-5.5

Gemini 3.1 Pro

KINAKnowledge

48.3

46.6

46.7

52.6

53.2

SuperGPQAKnowledge

70.8

67.4

68.5

72.7

76.6

BeyondAIMEReasoning

87.0

88.0

79.0

91.0

90.0

Workspace BenchHigh-Economic-Value

53.0

54.7

55.1

58.7

32.8

Agent Startup BenchHigh-Economic-Value

68.8

54.0

62.3

68.1

45.7

xDailyBenchWhite-Collar Office Work

61.0

56.4

69.0

73.0

35.2

NL2Repo-BenchLong-Horizon End-to-End Code

47.0

43.7

58.2

45.1

33.4

ProgramBenchLong-Horizon End-to-End Code

0/1/50.3

0/0/49.4

0/2.5/52.1

0.5/5.5/65.9

0/1/40.7

Terminal Bench 2.1Terminal Usage

71.0

67.6

71.7

73.8

70.7

SWE-AtlasDebugging

35.2

30.6

38.7

44.7

23.6

MathVision (w. Tool)MultiModal Reasoning

92.6 (94.5)

90.1 (92.7)

83.1

92.2

89.2

MMMU-Pro (w. Tool)MultiModal STEM

81.6 (82.7)

80.1 (82.2)

74.0

81.2

80.5

WorldVQAVisual Knowledge

53.0

48.6

35.9

34.6

44.3

ZEROBench (w. Tool)Visual Puzzle

18.0 (22.0)

11.0 (20.0)

8.0

13.0

12.0

BabyVisionPerception

73.7

62.9

22.2

55.9

54.4

CharXiv-RQ (w. Tool)Infographics

85.4 (86.4)

82.5 (83.6)

82.1

83.2

83.5

ERQASpatial Reasoning

72.0

71.3

52.5

64.5

70.8

MMLongBench-128KMultiModal Long-Context

78.3

76.9

70.7

BenchmarkCapability

Seed2.1 Pro

Seed2.1 Turbo

Gemini 3.5 Flash

Gemini 3.1 Pro

VideoMMELong Video Understanding

89.2

87.2

86.7

TOMATOMotion & Perception

79.5

56.8

71.9

60.4

MinervaVideo Reasoning

70.7

65.9

68.6

63.5

OVOBenchStreaming

80.7

79.2

64.5

64.1

VideoSimpleQAKnowledge

76.4

71.4