首页模型博客&论文加入我们
EN
中文
首页模型博客&论文加入我们

2024-12-08

MaskBit: Embedding-free Image Generation via Bit Tokens

Download PDF
上一篇下一篇

摘要

Masked transformer models for class-conditional image generation have become a compelling alternative to diffusion models. Typically comprising two stages - an initial VQGAN model for transitioning between latent space and image space, and a subsequent Transformer model for image generation within latent space - these frameworks offer promising avenues for image synthesis. In this study, we present two primary contributions: Firstly, an empirical and systematic examination of VQGANs, leading to a modernized VQGAN. Secondly, a novel embedding-free generation network operating directly on bit tokens - a binary quantized representation of tokens with rich semantics. The first contribution furnishes a transparent, reproducible, and high-performing VQGAN model, enhancing accessibility and matching the performance of current state-of-the-art methods while revealing previously undisclosed details. The second contribution demonstrates that embedding-free image generation using bit tokens achieves a new state-of-the-art FID of 1.52 on the ImageNet 256x256 benchmark, with a compact generator model of mere 305M parameters. The code for this project is available on this https URL[https://github.com/markweberdev/maskbit].

作者

Mark Weber, Lijun Yu, Qihang Yu, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh Chen

期刊/会议

ICLR 2025

模型成果
Seed2.0Seedance 2.0Seedream 5.0 LiteSeeduplexSeed GR-RL
研究团队
LLMInfrastructuresVisionSpeechMultimodal Interaction & World ModelAI for ScienceRoboticsResponsible AI
了解更多
博客Seed Edge校园招聘
模型成果
Seed2.0
Seedance 2.0
Seedream 5.0 Lite
Seeduplex
Seed GR-RL
研究团队
LLM
Infrastructures
Vision
Speech
Multimodal Interaction & World Model
AI for Science
Robotics
Responsible AI
了解更多
博客
Seed Edge
校园招聘
追求智能上限,创造社会价值
欢迎加入字节跳动 Seed
Copyright © 2026 Bytedance Seed
网站声明
联系我们 : seed.feedback@bytedance.com
欢迎加入字节跳动 Seed
Copyright © 2026 Bytedance Seed
网站声明