AI for Science

Seed-AI for Science 团队专注于科学计算领域的前瞻技术探索，围绕生物领域基础模型、量子化学、分子动力学等方向，用 AI 推动科学领域的研究范式突破

课题方向

多模态生物基础大模型

开发自然科学的多模态基础大模型，用于蛋白质、DNA、RNA 等生物分子的设计、构象生成和结构预测

Multimodal Foundation Models

Natural Sciences

量子化学

专注于机器学习与量子物理、量子化学的交叉研究，实现大规模高精度科学计算数值模拟

Machine Learning

Quantum Physics

Quantum Chemistry

多模态生物分子结构大模型

构建以结构为中心的生物分子大模型，支撑全生物分子类型（蛋白、DNA、RNA、小分子、离子、翻译后修饰）的复合物结构和动态预测、功能建模、分子设计等关键任务，打造有全球影响力的 Protenix 开源模型系列

Biomolecular Structure

Foundation Model

Open-source Model

AI 分子动力学

探索机器学习方法在力场开发、分子动力学模拟、增强采样和其他性质计算方法中的应用，并规模化应用在药物和材料的发现中

Machine Learning

Molecular Dynamics

Drug

Material

精选论文

2025.09.02

PXDesign: Fast, Modular, and Accurate De Novo Design of Protein Binders

PXDesign achieves nanomolar binder hit rates of 20–73% across five of six diverse protein targets, surpassing prior methods such as AlphaProteo. This experimental success rate is enabled by advances in both binder generation and filtering. We develop both a diffusion-based generative model (PXDesign-d) and a hallucination-based approach (PXDesign-h), each showing strong in silico performance that outperforms existing models. Beyond generation, we systematically analyze confidence-based filtering and ranking strategies from multiple structure predictors, comparing their accuracy, efficiency, and complementarity on datasets spanning de novo binders and mutagenesis. Finally, we validate the full design process experimentally, achieving high hit rates and multiple nanomolar binders. To support future work and community use, we release a unified benchmarking framework at https://github.com/bytedance/PXDesignBench, provide public access to PXDesign via a webserver at https://protenix-server.com, and share all designed binder sequences at https://protenix.github.io/pxdesign.

Milong Ren, Jinyuan Sun, Jiaqi Guan, Cong Liu, Chengyue Gong, Yuzhe Wang, Lan Wang, Qixu Cai, Xinshi Chen, Wenzhi Xiao, Protenix Team

Molecular Biology

2025.06.12

Elucidating the Design Space of Multimodal Protein Language Models

Multimodal protein language models (PLMs) integrate sequence and token-based structural information, serving as a powerful foundation for protein modeling, generation, and design. However, the reliance on tokenizing 3D structures into discrete tokens causes substantial loss of fidelity about fine-grained structural details and correlations. In this paper, we systematically elucidate the design space of multimodal PLMs to overcome their limitations. We identify tokenization loss and inaccurate structure token predictions by the PLMs as major bottlenecks. To address these, our proposed design space covers improved generative modeling, structure-aware architectures and representation learning, and data exploration. Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling. The effective design methods dramatically improve the structure generation diversity, and notably, folding abilities of our 650M model by reducing the RMSD from 5.52 to 2.36 on PDB testset, even outperforming 3B baselines and on par with the specialized folding models.

Cheng-Yen Hsieh, Xinyou Wang, Daiheng Zhang, Dongyu Xue, Fei Ye, Shujian Huang, Zaixiang Zheng, Quanquan Gu

AI for Science

查看更多论文

热招岗位

科学计算云原生工程师-Seed

北京

社招/校招

立即投递

CADD/结构生物学/计算生物算法研究员-Seed

北京

社招/校招

立即投递

生物分子结构大模型算法研究员-Seed

北京

社招/校招

立即投递

机器学习算法研究员-Seed

北京

社招/校招

立即投递

量子化学与机器学习研究员-Seed

北京

社招/校招

立即投递

多模态生物基础大模型研究员-Seed

上海

社招/校招

立即投递

查看更多岗位