Computer Vision

[arXiv'2512] Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time 阅读报告

动态场景的 3D 重建一直是个硬骨头，通常需要堆叠光流、深度、位姿等多个模型。Google DeepMind 刚刚发布的 D4RT 提出了一种大道至简的思路：将所有几何任务降维成一个通用的“坐标查询”函数。它不仅在单次前馈中解决了 SLAM + 重建 + 跟踪，还跑出了 200+ FPS 的惊人速度。

[ICCV'25 Highlight] Shape of Motion: 4D Reconstruction from a Single Video 阅读报告

这篇文章提出了一种从单目视频中重建动态场景并估计长程 3D 运动轨迹的新方法。

[arXiv'2512] Generative Video Motion Editing with 3D Point Tracks 阅读报告

这篇文章提出了一种基于 3D 点轨迹 (3D Point Tracks) 的视频生成式编辑框架，能够同时精确控制摄像机运动和物体运动。

4D动态场景重建研究方向综述与选题建议

探讨一下当前时代下博士生在 4D Reconstruction & Generation 研究方向上，面对高校科研环境有限算力，以及目标是CVPR 2026的情况下，对该方向的一些选题思考。

[ICCV'25] St4RTrack: Simultaneous 4D Reconstruction and Tracking in the World 阅读报告

本文提出了一个 feed-forward 框架，通过引入一种创新的、依赖于时间的 pointmap 表示，并利用一个双分支 Transformer 架构，实现了在统一的世界坐标系中同时进行动态场景的密集追踪与三维重建。

[ICCV'25 Oral] Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction 阅读报告

本文巧妙地提出了一种“运动解耦”机制，通过一个学习的 3D Tracker 将动态物体的自身运动从观测运动中剥离，使得经典的 Bundle Adjustment 能够首次被统一地应用于含动态物体的场景中，极大地提升了动态场景重建中的相机位姿精度和三维重建质量。

[ICCV'25] SpatialTrackerV2: 3D Point Tracking Made Easy 阅读报告

本文提出了一个 feed-forward 3D point tracking architecture，它将 video depth、camera pose 和 object motion 进行统一建模和 end-to-end 优化，并通过在 17 个异构数据集上的可扩展训练，实现了 SOTA 的 3D 追踪精度和推理速度。

A Brief Exploration to Variational Autoencoder (VAE) with Code Implementation

Learn variational autoencoder (VAE) by reading and analyzing the paper: “Auto-Encoding Variational Bayes”. This post will introduce the basic work of VAE, including the derivation of formulas and simple code verification.

[NeurIPS'19 Oral] Generative Modeling by Estimating Gradients of the Data Distribution 阅读报告

This paper introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. And it is important to learn Score-Based generative network and Ito diffusion SDE.

[T-PAMI'23] Image Super-Resolution via Iterative Refinement 阅读报告

Image super-resolution with conditional diffusion model.