Robotics
The Seed-Robotics team tackles challenges in general-purpose intelligent robotics, developing industry-leading technologies in foundational models, perception, manipulation, and interaction, while incubating intelligent robotic systems.

Research topics

Multimodal robotic foundation models
We research the pre-training, fine-tuning, and optimization of large-scale robotic foundation models. By focusing on the model itself, we aim to expand the boundaries of robot intelligence. We explore advanced issues in multimodal large models and promote their large-scale application in robotics, such as grasping and manipulation, motion control, and world modeling.
Foundation Model
Multimodal
Large-scale Application
Foundation Model
Multimodal

Reinforcement learning for robotics
We conduct cutting-edge technical research on deep reinforcement learning for robots, and push forward the large-scale application of the latest reinforcement learning algorithms in robotics.
Reinforcement Learning
Large-scale Application
Reinforcement Learning
Large-scale Application

Robotic data intelligence
We investigate new approaches to overcoming the data bottleneck in robotics. With an end-to-end view across data, model training, evaluation, and deployment, we explore methods to expand data diversity and improve data quality. We also develop automated mechanisms that scale and refine robotic datasets, enabling a more autonomous data engine that efficiently transforms data into robotic intelligence.
Data
Agent
Training
Data
Agent
Selected Papers

Dec 02, 2025
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting Q-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation that greatly improves the generalization and performance of GR-RL. Lastly, to better align the VLA policy with its deployment behaviors for high-precision control, we perform online RL by learning a latent space noise predictor. With this pipeline, GR-RL is, to our knowledge, the first learning-based policy that can autonomously lace up a shoe by threading shoelaces through multiple eyelets with an 83.3% success rate, a task requiring long-horizon reasoning, millimeter-level precision, and compliant soft-body interaction. We hope GR-RL provides a step toward enabling generalist robot foundations models to specialize into reliable real-world experts.
Yunfei Li, Xiao Ma, Jiafeng Xu, Yu Cui, Zhongren Cui, Zhigang Han, Liqun Huang, Tao Kong, Yuxiao Liu, Hao Niu, Wanli Peng, Jingchao Qiao, Zeyu Ren, Haixin Shi, Zhi Su, Jiawen Tian, Yuyang Xiao, Shenyu Zhang, Liwei Zheng, Hang Li, Yonghui Wu
Robotics
2025.12.02
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
Yunfei Li, Xiao Ma, Jiafeng Xu, Yu Cui, Zhongren Cui, Zhigang Han, Liqun Huang, Tao Kong, Yuxiao Liu, Hao Niu, Wanli Peng, Jingchao Qiao, Zeyu Ren, Haixin Shi, Zhi Su, Jiawen Tian, Yuyang Xiao, Shenyu Zhang, Liwei Zheng, Hang Li, Yonghui Wu
Robotics

Jul 21, 2025
GR-3 Technical Report
We report our recent progress towards building generalist robot policies, the development of GR-3. GR-3 is a large-scale vision-language-action (VLA) model. It showcases exceptional capabilities in generalizing to novel objects, environments, and instructions involving abstract concepts. Furthermore, it can be efficiently fine-tuned with minimal human trajectory data, enabling rapid and cost-effective adaptation to new settings. GR-3 also excels in handling long-horizon and dexterous tasks, including those requiring bi-manual manipulation and mobile movement, showcasing robust and reliable performance. These capabilities are achieved through a multi-faceted training recipe that includes co-training with web-scale vision-language data, efficient fine-tuning from human trajectory data collected via VR devices, and effective imitation learning with robot trajectory data. In addition, we introduce ByteMini, a versatile bi-manual mobile robot designed with exceptional flexibility and reliability, capable of accomplishing a wide range of tasks when integrated with GR-3. Through extensive real-world experiments, we show GR-3 surpasses the state-of-the-art baseline method, π0, on a wide variety of challenging tasks. We hope GR-3 can serve as a step towards building generalist robots capable of assisting humans in daily life.
Seed Robotics Team
Robotics
2025.07.21
GR-3 Technical Report
Seed Robotics Team
Robotics

Jul 04, 2025
Dexterous Teleoperation of 20-DoF ByteDexter Hand via Human Motion Retargeting
Replicating human--level dexterity remains a fundamental robotics challenge, requiring integrated solutions from mechatronic design to the control of high degree--of--freedom (DoF) robotic hands. While imitation learning shows promise in transferring human dexterity to robots, the efficacy of trained policies relies on the quality of human demonstration data. We bridge this gap with a hand--arm teleoperation system featuring: (1) a 20--DoF linkage--driven anthropomorphic robotic hand for biomimetic dexterity, and (2) an optimization--based motion retargeting for real--time, high--fidelity reproduction of intricate human hand motions and seamless hand--arm coordination. We validate the system via extensive empirical evaluations, including dexterous in-hand manipulation tasks and a long--horizon task requiring the organization of a cluttered makeup table randomly populated with nine objects. Experimental results demonstrate its intuitive teleoperation interface with real--time control and the ability to generate high--quality demonstration data. Please refer to the accompanying video for further details.
Ruoshi Wen, Jiajun Zhang, Guangzeng Chen, Zhongren Cui, Min Du, Yang Gou, Zhigang Han, Junkai Hu, Liqun Huang, Hao Niu, Wei Xu, Haoxiang Zhang, Zhengming Zhu, Hang Li, Zeyu Ren
Robotics
2025.07.04
Dexterous Teleoperation of 20-DoF ByteDexter Hand via Human Motion Retargeting
Ruoshi Wen, Jiajun Zhang, Guangzeng Chen, Zhongren Cui, Min Du, Yang Gou, Zhigang Han, Junkai Hu, Liqun Huang, Hao Niu, Wei Xu, Haoxiang Zhang, Zhengming Zhu, Hang Li, Zeyu Ren
Robotics
Learn More
Technical capability demonstration

Seed GR-RL
GR-RL is a reinforcement learning framework designed for long-horizon, dexterous manipulation, enabling robots to reliably execute multi-step, high-precision tasks in real-world settings. It is the first in the industry to achieve continuous end-to-end shoe-lacing on a full shoe. The system can automatically retry, adjust the scene, detect misalignment and correct errors, and it generalizes across shoes of different colors, sizes, and materials.

Seed GR-3
GR-3 is a large-scale vision-language-action (VLA) model. It showcases capabilities in generalization to novel objects, environments, and instructions involving abstract concepts. Furthermore, it can be efficiently fine-tuned with minimal human trajectory data, enabling rapid and cost-effective adaptation to new settings. GR-3 also excels in handling long-horizon and dexterous tasks, including those requiring bi-manual mobile manipulation skills, showcasing its robust and reliable performance.

ByteDexter
ByteDexter is a dexterous robotic hand with 21 active degrees of freedom (DoF), fitted with highly sensitive tactile sensors. When paired with its dedicated teleoperation system, it can reproduce complex human hand movements in real time with high fidelity, while enabling seamless hand-arm coordination.