Both Sora and Stable Diffusion 3 adopt diffusion transformers, but do we really need a super large DiT for all sampling steps for generation?

No

Introduce Trajectory Stitching (T-Stitch), a training-free method that complements existing efficient sampling methods by dynamically allocating computation to different denoising steps.
Paper:
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency...

GitHub - NVlabs/T-Stitch: [ICLR 2025] Official PyTorch implmentation of paper "T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stitching"
[ICLR 2025] Official PyTorch implmentation of paper "T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stitching" - NVlabs/T-Stitch
T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stitching
T-Stitch: Accelerating Sampling in Pre-trained Diffusion Models with Trajectory Stitching