Streetforward

StreetForward: Perceiving Dynamic Street with Feedforward Causal Attention

¹Li Auto Inc., ²Zhejiang University

Abstract

We present StreetForward, a pose-free and tracker-free feedforward framework for dynamic street reconstruction. Building upon the alternating attention mechanism from Visual Geometry Grounded Transformer (VGGT), we propose a simple yet effective temporal mask attention module that captures dynamic motion information from image sequences and produces motion-aware latent representations. Static content and dynamic instances are represented uniformly with 3D Gaussian Splatting, and are optimized jointly by cross-frame rendering with spatio-temporal consistency, allowing the model to infer per-pixel velocities and produce high-fidelity novel views at new poses and times. We train and evaluate our model on the Waymo Open Dataset, demonstrating superior performance on novel view synthesis and depth estimation compared to existing methods. Furthermore, zero-shot inference on CARLA and other datasets validates the generalization capability of our approach.

Pipeline

The input video is first encoded into per-frame patchified features and then processed by L times alternating global- and frame-attention to aggregate information across frames. These aggregated features are directly decoded by a camera head, a depth head and a Gaussian Head to obtain poses, depth and Gaussian attributes. Then causal masked attention is introduced to form motion-aware features, which are used to estimate both forward and backward motion as well as dynamic mask for separating static and dynamic Gaussians. The final 4D scene is obtained by combining static Gaussians with dynamic Gaussians propagated across time using the predicted motion.

Qualitative Comparison

BibTeX

@article{yu2026streetforward,
  title={StreetForward: Perceiving Dynamic Street with Feedforward Causal Attention},
  author={Yu, Zhongrui and Wang, Zhao and Xie, Yijia and Wang, Yida and Zhang, Xueyang and Zhan, Yifei and Zhan, Kun},
  journal={arXiv preprint arXiv:2603.19552},
  year={2026}
}