o.MeMix
Project Page / Preprint 2026
MeMix:
Writing Less, Remembering More for
Streaming 3D Reconstruction
1Institute for AI Industry Research, Tsinghua University 2Zhejiang University
* Equal contribution. † Corresponding author.
TL;DR: Training-free selective memory updates for long-horizon recurrent streaming 3D reconstruction.
Abstract
Reconstruction is a fundamental task in 3D vision and a fundamental capability for spatial intelligence. Particularly, streaming 3D reconstruction is central to real-time spatial perception, yet existing recurrent online models often suffer from progressive degradation on long sequences due to state drift and forgetting, motivating inference-time remedies. We present MeMix, a training-free, plug-and-play module that improves streaming reconstruction by recasting the recurrent state into a Memory Mixture. MeMix partitions the state into multiple independent memory patches and updates only the least-aligned memory patches while exactly preserving others. This selective update mitigates catastrophic forgetting while retaining O(1) inference memory, and requires no fine-tuning or additional learnable parameters, making it directly applicable to existing recurrent reconstruction models. Across standard benchmarks (ScanNet, 7-Scenes, KITTI, etc.), under identical backbones and inference settings, MeMix reduces reconstruction completeness error by 15.3% on average (up to 40.0%) across 300-500 frame streams on 7-Scenes.
Overview of MeMix
Interactive Examples
Current Scene
Office Seq-07
CUT / CUT w.MeMix · 300 views · 1200 pts/frame
w.MeMix
Results are downsampled for efficient online rendering, with each frame capped at 1200 points, and camera motion is synced with ~100ms delay.
Experiments
All tables transcribed from the paper and supplementary material.
Table 1. Unified Memory Update Rules
Shared gate formulation for CUT3R, TTT3R, TTSA3R, and their MeMix variants.
| Model | Memory Update Rule |
|---|---|
| Unified form | $S_t = G_t \odot \widehat{S}_t + (1 - G_t) \odot S_{t-1}$ |
| CUT3R | $G_t = 1$ |
| TTT3R / TTSA3R | $G_t = \beta_t$ |
| CUT3R + MeMix | $G_t = M_t$ |
| TTT3R / TTSA3R + MeMix | $G_t = M_t \odot \beta_t$ |
Table 2. Sparse 3D Reconstruction Results
7-Scenes-S and NRGBD-S, one frame sampled every two frames.
| Model | MeMix | Input | 7-Scenes-S | NRGBD-S | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc Mean ↓ | Acc Med ↓ | Comp Mean ↓ | Comp Med ↓ | NC Mean ↑ | NC Med ↑ | Acc Mean ↓ | Acc Med ↓ | Comp Mean ↓ | Comp Med ↓ | NC Mean ↑ | NC Med ↑ | |||
| VGGT (Offline) [2] | – | 300 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| VGGT (Offline) [2] | – | 400 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| VGGT (Offline) [2] | – | 500 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| StreamVGGT [25] | – | 300 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| StreamVGGT [25] | – | 400 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| StreamVGGT [25] | – | 500 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| CUT3R [3] | × | 300 | 0.141 | 0.096 | 0.076 | 0.034 | 0.543 | 0.564 | 0.234 | 0.139 | 0.074 | 0.018 | 0.575 | 0.614 |
| CUT3R [3] | ✓ | 300 | 0.106 | 0.076 | 0.053 | 0.019 | 0.550 | 0.575 | 0.186 | 0.086 | 0.050 | 0.009 | 0.595 | 0.651 |
| CUT3R [3] | × | 400 | 0.178 | 0.121 | 0.115 | 0.069 | 0.532 | 0.546 | 0.342 | 0.227 | 0.127 | 0.067 | 0.561 | 0.591 |
| CUT3R [3] | ✓ | 400 | 0.147 | 0.100 | 0.076 | 0.039 | 0.540 | 0.559 | 0.321 | 0.180 | 0.099 | 0.031 | 0.565 | 0.594 |
| CUT3R [3] | × | 500 | 0.190 | 0.138 | 0.090 | 0.033 | 0.530 | 0.543 | 0.359 | 0.264 | 0.173 | 0.081 | 0.560 | 0.591 |
| CUT3R [3] | ✓ | 500 | 0.167 | 0.119 | 0.077 | 0.026 | 0.533 | 0.547 | 0.328 | 0.218 | 0.161 | 0.040 | 0.560 | 0.590 |
| TTT3R [11] | × | 300 | 0.040 | 0.025 | 0.024 | 0.005 | 0.567 | 0.602 | 0.101 | 0.044 | 0.025 | 0.005 | 0.610 | 0.678 |
| TTT3R [11] | ✓ | 300 | 0.034 | 0.020 | 0.023 | 0.005 | 0.567 | 0.603 | 0.099 | 0.037 | 0.020 | 0.004 | 0.616 | 0.692 |
| TTT3R [11] | × | 400 | 0.052 | 0.031 | 0.027 | 0.005 | 0.558 | 0.588 | 0.143 | 0.065 | 0.071 | 0.012 | 0.600 | 0.658 |
| TTT3R [11] | ✓ | 400 | 0.043 | 0.025 | 0.026 | 0.005 | 0.560 | 0.590 | 0.146 | 0.066 | 0.070 | 0.018 | 0.602 | 0.665 |
| TTT3R [11] | × | 500 | 0.066 | 0.039 | 0.031 | 0.006 | 0.551 | 0.577 | 0.166 | 0.092 | 0.087 | 0.021 | 0.593 | 0.647 |
| TTT3R [11] | ✓ | 500 | 0.059 | 0.032 | 0.030 | 0.005 | 0.553 | 0.580 | 0.183 | 0.094 | 0.094 | 0.031 | 0.595 | 0.650 |
| TTSA3R [15] | × | 300 | 0.036 | 0.020 | 0.035 | 0.006 | 0.566 | 0.600 | 0.090 | 0.036 | 0.020 | 0.004 | 0.620 | 0.696 |
| TTSA3R [15] | ✓ | 300 | 0.026 | 0.013 | 0.021 | 0.004 | 0.568 | 0.604 | 0.086 | 0.031 | 0.015 | 0.004 | 0.626 | 0.709 |
| TTSA3R [15] | × | 400 | 0.036 | 0.019 | 0.024 | 0.004 | 0.561 | 0.592 | 0.104 | 0.045 | 0.035 | 0.006 | 0.618 | 0.692 |
| TTSA3R [15] | ✓ | 400 | 0.030 | 0.015 | 0.023 | 0.004 | 0.561 | 0.593 | 0.100 | 0.042 | 0.031 | 0.005 | 0.617 | 0.692 |
| TTSA3R [15] | × | 500 | 0.042 | 0.021 | 0.024 | 0.004 | 0.556 | 0.585 | 0.121 | 0.054 | 0.050 | 0.006 | 0.613 | 0.684 |
| TTSA3R [15] | ✓ | 500 | 0.033 | 0.016 | 0.023 | 0.004 | 0.558 | 0.587 | 0.114 | 0.050 | 0.040 | 0.007 | 0.615 | 0.687 |
Table 3. Ablations on Routing Policy and Score Design
The Default row corresponds to $\texttt{Bottom-k} + \texttt{Dot} + \mathrm{score}(\widehat{S}_t, \bar{x}_t)$; other rows change one component at a time.
| Variant | KITTI AbsRel ↓ | KITTI δ < 1.25 ↑ | TUM ATE ↓ | TUM RPE trans ↓ | TUM RPE rot ↓ | NRGBD Acc ↓ | NRGBD Comp ↓ | NRGBD NC ↑ |
|---|---|---|---|---|---|---|---|---|
| Default (TTT3R with MeMix) | 0.103 | 92.1 | 0.028 | 0.013 | 0.376 | 0.099 | 0.020 | 0.616 |
| Patch selection | ||||||||
| Top-k | 0.102 | 91.8 | 0.068 | 0.021 | 0.595 | 0.178 | 0.049 | 0.587 |
| Random-k | 0.108 | 91.4 | 0.028 | 0.013 | 0.382 | 0.102 | 0.023 | 0.616 |
| Scoring function | ||||||||
| Cosine ($s_{\mathrm{cos}}$) | 0.105 | 91.6 | 0.028 | 0.013 | 0.375 | 0.099 | 0.023 | 0.616 |
| Attn ($s_{\mathrm{attn}}$) | 0.107 | 90.7 | 0.030 | 0.014 | 0.418 | 0.105 | 0.025 | 0.611 |
| Update strategy | ||||||||
| Full-update | 0.108 | 91.4 | 0.035 | 0.015 | 0.443 | 0.127 | 0.034 | 0.604 |
| No-update | 0.114 | 89.0 | 0.162 | 0.066 | 1.624 | 0.461 | 0.672 | 0.528 |
| Routing score | ||||||||
| score($S_{t-1}$, $X_t$) | 0.107 | 91.1 | 0.039 | 0.016 | 0.437 | 0.111 | 0.027 | 0.608 |
| score($S_{t-1}$, $Z_t$) | 0.124 | 85.9 | 0.041 | 0.016 | 0.426 | 0.249 | 0.055 | 0.575 |
| score($\widehat{S}_t$, $Z_t$) | 0.128 | 84.7 | 0.046 | 0.046 | 0.465 | 0.308 | 0.112 | 0.573 |
Table 4. Efficiency
Inference FPS and peak GPU memory on KITTI. MeMix adds 0 extra GPU memory overhead under the reported settings.
| Method | FPS × | FPS ✓ | GPU × | GPU ✓ |
|---|---|---|---|---|
| CUT3R | 14.39 | 14.13 | 5.31 GB | 5.31 GB |
| TTT3R | 12.72 | 12.81 | 6.96 GB | 6.96 GB |
| TTSA3R | 12.58 | 12.78 | 6.63 GB | 6.63 GB |
Table 5. Dense 3D Reconstruction Results
Main-paper Table 5. Supplementary Table 1 reports the same dense 7-Scenes-D / NRGBD-D values.
| Model | MeMix | Input | 7-Scenes-D | NRGBD-D | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc Mean ↓ | Acc Med ↓ | Comp Mean ↓ | Comp Med ↓ | NC Mean ↑ | NC Med ↑ | Acc Mean ↓ | Acc Med ↓ | Comp Mean ↓ | Comp Med ↓ | NC Mean ↑ | NC Med ↑ | |||
| VGGT (Offline) [2] | – | 300 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| VGGT (Offline) [2] | – | 400 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| VGGT (Offline) [2] | – | 500 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| StreamVGGT [25] | – | 300 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| StreamVGGT [25] | – | 400 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| StreamVGGT [25] | – | 500 | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
| CUT3R [3] | × | 300 | 0.099 | 0.062 | 0.048 | 0.014 | 0.542 | 0.562 | 0.137 | 0.092 | 0.066 | 0.024 | 0.572 | 0.609 |
| CUT3R [3] | ✓ | 300 | 0.076 | 0.045 | 0.039 | 0.010 | 0.549 | 0.573 | 0.113 | 0.081 | 0.060 | 0.035 | 0.578 | 0.618 |
| CUT3R [3] | × | 400 | 0.150 | 0.093 | 0.090 | 0.037 | 0.531 | 0.543 | 0.225 | 0.155 | 0.119 | 0.076 | 0.554 | 0.579 |
| CUT3R [3] | ✓ | 400 | 0.117 | 0.071 | 0.056 | 0.015 | 0.536 | 0.552 | 0.196 | 0.128 | 0.098 | 0.062 | 0.572 | 0.609 |
| CUT3R [3] | × | 500 | 0.165 | 0.114 | 0.094 | 0.039 | 0.522 | 0.531 | 0.313 | 0.203 | 0.202 | 0.148 | 0.554 | 0.580 |
| CUT3R [3] | ✓ | 500 | 0.146 | 0.094 | 0.067 | 0.022 | 0.528 | 0.541 | 0.273 | 0.173 | 0.162 | 0.110 | 0.568 | 0.602 |
| TTT3R [11] | × | 300 | 0.030 | 0.016 | 0.019 | 0.004 | 0.558 | 0.588 | 0.057 | 0.035 | 0.016 | 0.003 | 0.595 | 0.650 |
| TTT3R [11] | ✓ | 300 | 0.030 | 0.016 | 0.019 | 0.004 | 0.559 | 0.589 | 0.052 | 0.032 | 0.015 | 0.003 | 0.599 | 0.656 |
| TTT3R [11] | × | 400 | 0.044 | 0.026 | 0.024 | 0.004 | 0.551 | 0.577 | 0.093 | 0.053 | 0.018 | 0.003 | 0.587 | 0.635 |
| TTT3R [11] | ✓ | 400 | 0.039 | 0.023 | 0.025 | 0.004 | 0.552 | 0.578 | 0.078 | 0.042 | 0.016 | 0.003 | 0.592 | 0.644 |
| TTT3R [11] | × | 500 | 0.068 | 0.046 | 0.033 | 0.009 | 0.542 | 0.562 | 0.127 | 0.061 | 0.033 | 0.003 | 0.586 | 0.635 |
| TTT3R [11] | ✓ | 500 | 0.057 | 0.039 | 0.030 | 0.008 | 0.546 | 0.568 | 0.105 | 0.048 | 0.026 | 0.004 | 0.586 | 0.633 |
| TTSA3R [15] | × | 300 | 0.023 | 0.011 | 0.018 | 0.004 | 0.558 | 0.588 | 0.039 | 0.022 | 0.011 | 0.003 | 0.606 | 0.669 |
| TTSA3R [15] | ✓ | 300 | 0.022 | 0.009 | 0.017 | 0.004 | 0.559 | 0.588 | 0.037 | 0.022 | 0.010 | 0.003 | 0.605 | 0.668 |
| TTSA3R [15] | × | 400 | 0.030 | 0.016 | 0.022 | 0.004 | 0.553 | 0.580 | 0.060 | 0.027 | 0.010 | 0.003 | 0.598 | 0.655 |
| TTSA3R [15] | ✓ | 400 | 0.025 | 0.012 | 0.021 | 0.004 | 0.554 | 0.581 | 0.059 | 0.027 | 0.010 | 0.003 | 0.596 | 0.651 |
| TTSA3R [15] | × | 500 | 0.045 | 0.029 | 0.025 | 0.004 | 0.545 | 0.567 | 0.085 | 0.034 | 0.020 | 0.003 | 0.596 | 0.651 |
| TTSA3R [15] | ✓ | 500 | 0.035 | 0.021 | 0.023 | 0.004 | 0.548 | 0.571 | 0.081 | 0.032 | 0.014 | 0.003 | 0.595 | 0.649 |
Table 6. Evaluation on Short-Sequence Pose Estimation
TUM-dynamics (90 frames), ScanNet (90 frames), and Sintel (50 frames).
| Method | Online | TUM ATE ↓ | TUM RPE trans ↓ | TUM RPE rot ↓ | ScanNet ATE ↓ | ScanNet RPE trans ↓ | ScanNet RPE rot ↓ | Sintel ATE ↓ | Sintel RPE trans ↓ | Sintel RPE rot ↓ |
|---|---|---|---|---|---|---|---|---|---|---|
| Robust-CVD [57] | × | 0.153 | 0.026 | 3.528 | 0.227 | 0.064 | 7.374 | 0.360 | 0.154 | 3.443 |
| CasualSAM [58] | × | 0.071 | 0.010 | 1.712 | 0.158 | 0.034 | 1.618 | 0.141 | 0.035 | 0.615 |
| DUSt3R [1] | × | 0.083 | 0.017 | 3.567 | 0.081 | 0.028 | 0.784 | 0.417 | 0.250 | 5.796 |
| MASt3R [18] | × | 0.038 | 0.012 | 0.448 | 0.078 | 0.020 | 0.475 | 0.185 | 0.060 | 1.496 |
| MonST3R [44] | × | 0.098 | 0.019 | 0.935 | 0.077 | 0.018 | 0.529 | 0.111 | 0.044 | 0.869 |
| Easi3R [45] | × | 0.105 | 0.022 | 1.064 | 0.061 | 0.017 | 0.525 | 0.110 | 0.042 | 0.758 |
| AETHER [46] | × | 0.092 | 0.012 | 1.106 | 0.176 | 0.028 | 1.204 | 0.189 | 0.054 | 0.694 |
| VGGT [2] | × | 0.012 | 0.010 | 0.310 | 0.035 | 0.015 | 0.377 | 0.172 | 0.062 | 0.471 |
| Spann3R [23] | ✓ | 0.056 | 0.021 | 0.591 | 0.096 | 0.023 | 0.661 | 0.329 | 0.110 | 4.471 |
| Point3R [4] | ✓ | 0.075 | 0.029 | 0.642 | 0.106 | 0.035 | 1.946 | 0.351 | 0.128 | 1.822 |
| StreamVGGT [25] | ✓ | 0.061 | 0.033 | 3.209 | 0.161 | 0.057 | 3.647 | 0.251 | 0.149 | 1.894 |
| CUT3R [3] | ✓ | 0.045 | 0.015 | 0.443 | 0.096 | 0.022 | 0.600 | 0.210 | 0.069 | 0.628 |
| CUT3R(w. MeMix) | ✓ | 0.043 | 0.014 | 0.424 | 0.090 | 0.022 | 0.604 | 0.190 | 0.075 | 0.627 |
| TTT3R [11] | ✓ | 0.029 | 0.013 | 0.380 | 0.065 | 0.021 | 0.640 | 0.208 | 0.093 | 0.725 |
| TTT3R(w. MeMix) | ✓ | 0.028 | 0.013 | 0.376 | 0.065 | 0.021 | 0.677 | 0.210 | 0.083 | 0.733 |
| TTSA3R [15] | ✓ | 0.026 | 0.013 | 0.372 | 0.058 | 0.021 | 0.561 | 0.210 | 0.084 | 0.738 |
| TTSA3R(w. MeMix) | ✓ | 0.025 | 0.013 | 0.372 | 0.057 | 0.021 | 0.569 | 0.209 | 0.084 | 0.763 |
Table 7. Video Depth Estimation
KITTI, Sintel, and Bonn under per-sequence scale and metric scale protocols.
| Alignment | Method | Online | KITTI Abs Rel ↓ | KITTI δ < 1.25 ↑ | Sintel Abs Rel ↓ | Sintel δ < 1.25 ↑ | Bonn Abs Rel ↓ | Bonn δ < 1.25 ↑ |
|---|---|---|---|---|---|---|---|---|
| Per-sequence scale | ||||||||
| Per-sequence scale | DUSt3R-GA [1] | × | 0.144 | 81.3 | 0.656 | 45.2 | 0.155 | 83.3 |
| Per-sequence scale | MASt3R-GA [18] | × | 0.183 | 74.5 | 0.641 | 43.9 | 0.252 | 70.1 |
| Per-sequence scale | MonST3R-GA [44] | × | 0.168 | 74.4 | 0.378 | 55.8 | 0.067 | 96.3 |
| Per-sequence scale | Easi3R [45] | × | 0.102 | 91.2 | 0.377 | 55.9 | 0.059 | 97.0 |
| Per-sequence scale | VGGT [2] | × | 0.070 | 96.5 | 0.287 | 66.1 | 0.055 | 97.1 |
| Per-sequence scale | Spann3R [23] | ✓ | 0.198 | 73.7 | 0.622 | 42.6 | 0.144 | 81.3 |
| Per-sequence scale | Point3R [4] | ✓ | 0.136 | 84.2 | 0.452 | 48.9 | 0.060 | 96.0 |
| Per-sequence scale | STREAM3Rα [5] | ✓ | 0.116 | 89.6 | 0.478 | 51.1 | 0.075 | 94.1 |
| Per-sequence scale | StreamVGGT [25] | ✓ | 0.173 | 72.1 | 0.323 | 65.7 | 0.059 | 97.2 |
| Per-sequence scale | CUT3R [3] | ✓ | 0.116 | 88.1 | 0.426 | 47.3 | 0.079 | 93.7 |
| Per-sequence scale | CUT3R(w. MeMix) | ✓ | 0.115 | 88.6 | 0.436 | 46.2 | 0.078 | 93.8 |
| Per-sequence scale | TTT3R [11] | ✓ | 0.107 | 91.2 | 0.409 | 48.9 | 0.069 | 95.5 |
| Per-sequence scale | TTT3R(w. MeMix) | ✓ | 0.103 | 92.1 | 0.407 | 49.2 | 0.070 | 95.1 |
| Per-sequence scale | TTSA3R [15] | ✓ | 0.103 | 91.9 | 0.410 | 49.6 | 0.064 | 96.4 |
| Per-sequence scale | TTSA3R(w. MeMix) | ✓ | 0.103 | 92.2 | 0.400 | 50.2 | 0.065 | 96.0 |
| Metric scale | ||||||||
| Metric scale | MASt3R-GA [18] | × | 0.467 | 15.2 | 1.022 | 14.3 | 0.272 | 70.6 |
| Metric scale | Point3R [4] | ✓ | 0.191 | 73.8 | 0.777 | 17.1 | 0.137 | 94.7 |
| Metric scale | STREAM3Rα [5] | ✓ | 0.234 | 57.6 | 1.041 | 21.0 | 0.084 | 94.4 |
| Metric scale | CUT3R [3] | ✓ | 0.129 | 82.8 | 1.020 | 23.7 | 0.103 | 88.9 |
| Metric scale | CUT3R(w. MeMix) | ✓ | 0.122 | 85.0 | 1.068 | 24.1 | 0.104 | 88.8 |
| Metric scale | TTT3R [11] | ✓ | 0.107 | 89.2 | 0.978 | 23.3 | 0.090 | 94.4 |
| Metric scale | TTT3R(w. MeMix) | ✓ | 0.103 | 89.9 | 0.984 | 23.6 | 0.094 | 92.9 |
| Metric scale | TTSA3R [15] | ✓ | 0.110 | 88.6 | 0.959 | 24.5 | 0.080 | 96.4 |
| Metric scale | TTSA3R(w. MeMix) | ✓ | 0.107 | 89.1 | 0.962 | 24.9 | 0.083 | 96.1 |
BibTeX
@misc{dong2026memix,
title = {MeMix: Writing Less, Remembering More for Streaming 3D Reconstruction},
author = {Jiacheng Dong and Huan Li and Sicheng Zhou and Wenhao Hu and Weili Xu and Yan Wang},
year = {2026},
note = {Preprint}
}