VOOZH about

URL: https://zkf1997.github.io/

⇱ Kaifeng Zhao



I am a fourth-year PhD student in the Computer Vision and Learning Group (VLG) at ETH Zürich, supervised by Prof. Siyu Tang. I am currently interning at the NVIDIA Spatial Intelligence Lab. Additionally, I have had the pleasure of collaborating with Thabo Beeler. I obtained my Master's degree in Computer Science with distinction from ETH Zürich in 2022, and my Bachelor's degree in Computer Science from Beihang University in 2019.

My research focuses on the intersection of computer vision and computer graphics, particularly in human motion modeling and the synthesis of human-scene interaction behaviors. My research is supported by the Swiss Data Science Center (SDSC) PhD fellowship.



Kaifeng Zhao, ,
ICLR 2025,
Project page
Code
arXiv

DartControl achieves high-quality and efficient ( > 300 frames per second ) motion generation conditioned on online streams of text prompts. Furthermore, by integrating latent space optimization and reinforcement learning-based controls, DartControl enables various motion generation applications with spatial constraints and goals, including motion in-between, waypoint goal reaching, and human-scene interaction generation.

👁 Image

Marko Mihajlovic, Siwei Zhang, Gen Li, Kaifeng Zhao, Lea Müller,
ICCV 2025,
Project page
Code
arXiv

VolumetricSMPL is a lightweight extension that adds volumetric capabilities to SMPL(-X) models for efficient 3D interactions and collision detection.

👁 Image

, Yutong Chen*, Yiqian Wu*, Kaifeng Zhao*, ,
ICCV 2025
Project page
Code
arXiv

EgoM2P: A large-scale egocentric multimodal and multitask model, pretrained on eight extensive egocentric datasets.

👁 Image

, Kaifeng Zhao, , , , , ,
CVPR 2024,
Project page
Code
arXiv

EgoGen is new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.

👁 Image

Kaifeng Zhao, , , ,
ICCV 2023
Project page
Code
arXiv

In this work, we propose a method to generate a sequence of natural human-scene interaction events in real-world complex scenes as illustrated in this figure. The human first walks to sit on a stool ( to ), then walk to another chair to sit down ( to ), and finally walk to and lie on the sofa ( to ).

👁 Image

Kaifeng Zhao, , , ,
ECCV 2022
Project page
Code
arXiv

We propose , for mpositional teraction Synthesis with emantic Control. Given a pair of action and object instance as the semantic specification, our method generates virtual humans naturally interacting with the scene objects.

  • Reviewer for ICCV, CVPR, ECCV, 3DV, NeurIPS, SIGGRAPH Asia, ICLR, TPAMI.

Template adapted from Siwei Zhang's website.