Tencent Unveils Voyager: A High‑Power AI Model for Turning Video Into 3D Worlds

Tencent has released Voyager, an AI model that converts video footage into navigable 3D environments. Built on the Hunyuan ecosystem, Voyager learns camera motion and depth from over 100,000 video clips without manual labeling. The system demands at least 60 GB of GPU memory for 540p output, with 80 GB recommended, and runs best on multi‑GPU setups. Licensing blocks use in the EU, UK and South Korea, and large‑scale commercial deployments need separate agreements. In Stanford’s WorldScore benchmark Voyager posted the highest overall score of 77.62, excelling in object control, style consistency and subjective quality, though it trails in camera control.

Overview of Voyager

Tencent’s new AI model, Voyager, extends the company’s Hunyuan suite, which already includes Hunyuan3D‑2 for text‑to‑3D generation and HunyuanVideo for video synthesis. Voyager focuses on converting existing video clips into three‑dimensional worlds that can be explored interactively.

Training Methodology

Researchers built software that automatically analyzes video footage to extract camera movements and compute per‑frame depth. This approach removed the need for labor‑intensive manual labeling of thousands of hours of footage. The system processed more than 100,000 video clips drawn from both real‑world recordings and renders generated with the Unreal Engine.

Hardware Requirements

Running Voyager at a resolution of 540p requires a minimum of 60 GB of GPU memory, while Tencent recommends 80 GB for optimal results. The model can operate on single‑GPU or multi‑GPU configurations; using eight GPUs delivers processing speeds approximately 6.69 times faster than a single‑GPU setup.

Licensing Restrictions

The model’s license prohibits usage in the European Union, the United Kingdom and South Korea. Additionally, any commercial deployment serving more than 100 million monthly active users must obtain a separate licensing agreement from Tencent.

Benchmark Performance

In the WorldScore benchmark created by Stanford University researchers, Voyager achieved the highest overall score of 77.62, surpassing WonderWorld’s 72.69 and CogVideoX‑I2V’s 62.15. Voyager excelled in object control (66.92), style consistency (84.89) and subjective quality (71.09). It placed second in camera control with a score of 85.95, behind WonderWorld’s 92.98.

Deployment Considerations

Despite strong benchmark results, the model’s computational demands present challenges for widespread adoption. Developers seeking faster inference can leverage the xDiT framework for parallel processing across multiple GPUs.

Future Outlook

Voyager’s ability to generate coherent 3D worlds from video marks a step toward more immersive generative experiences, though real‑time interactive applications may still be some way off due to the required hardware power.