Reparameterizing an ε-predictor as a v-predictor

A diffusion model trained to predict noise ( $ϵ$ ) can be turned into one that predicts velocity ( $v$ ) without retraining — the two are related by a fixed, schedule-dependent change of variables. This note works through that reparameterization and verifies it empirically: I load SD 1.5 (a noise predictor) and sample from it two ways — once in $ϵ$ -space, once by converting its output to velocity on the fly — and confirm the images match.

The velocity parameterization is the one made explicit in Progressive Distillation for Fast Sampling, and it’s the bridge from the diffusion view into the flow-matching framing, where the model is expected to output a velocity field. Everything here is in latent space (latent diffusion): SD 1.5 diffuses the VAE latent, not pixels.

The setup

The forward process noises a clean latent $x_{0}$ with Gaussian noise $ϵ$ on a schedule $(α_{t}, σ_{t})$ :

x_{t} = α_{t} x_{0} + σ_{t} ϵ .

Treating $x_{t}$ as a point mass moving in time, its velocity is just the time derivative:

v_{t} = \overset{α}{˙}_{t} x_{0} + \overset{σ}{˙}_{t} ϵ .

A noise predictor gives us $\overset{ϵ}{^}$ . To express $v_{t}$ in terms of the quantities we actually have at inference ( $x_{t}$ and $\overset{ϵ}{^}$ ), solve the forward equation for $x_{0} = (x_{t} - σ_{t} ϵ) / α_{t}$ and substitute:

v_{t} = \frac{α ˙ _{t}}{α _{t}} x_{t} + (\overset{σ}{˙}_{t} - \frac{α ˙ _{t} σ _{t}}{α _{t}}) ϵ .

That identity is the whole trick: given an $ϵ$ -prediction and the schedule, you get the velocity for free. No retraining, no new weights.

The reverse direction: recovering $x_{0}$ from velocity

For a velocity-based sampler you need to denoise — recover $\overset{x}{^}_{0}$ — from $v_{t}$ . Take the two defining equations: $x_{t} = α_{t} x_{0} + σ_{t} ϵ, v_{t} = \overset{α}{˙}_{t} x_{0} + \overset{σ}{˙}_{t} ϵ .$ Eliminate $ϵ$ (multiply the first by $\overset{σ}{˙}_{t}$ , the second by $σ_{t}$ , subtract) to get
$x_{0} = \frac{σ _{t} v _{t} - σ ˙ _{t} x _{t}}{σ _{t} α ˙ _{t} - σ ˙ _{t} α _{t}} .$
This is the denoiser form used in 2509.25170.

A scheduler that exposes the derivatives

The standard diffusers DDIM scheduler doesn’t hand you $\overset{α}{˙}_{t}$ and $\overset{σ}{˙}_{t}$ , which the velocity formula needs. So I build the schedule explicitly — linear- $β$ , with $α$ and $σ$ from the cumulative product and their derivatives by finite difference. Having the time-segment indexing explicit also makes the per-step bookkeeping easier to follow.

class LinearBetaScheduler:
    def __init__(self, T=1000, beta_min=0.00085, beta_max=0.012):
        self.T = T
        self.betas = torch.linspace(beta_min, beta_max, T)
        self.alphas_cumprod = torch.cumprod(1 - self.betas, dim=0)
 
        self.alpha = self.alphas_cumprod.sqrt().to("mps")
        self.sigma = (1 - self.alphas_cumprod).sqrt().to("mps")
 
        # time derivatives via forward difference
        self.dot_alpha = torch.zeros(T).to("mps")
        self.dot_sigma = torch.zeros(T).to("mps")
        self.dot_alpha[1:] = self.alpha[1:] - self.alpha[:-1]
        self.dot_sigma[1:] = self.sigma[1:] - self.sigma[:-1]
        self.dot_alpha[0] = self.dot_alpha[1]   # boundary
        self.dot_sigma[0] = self.dot_sigma[1]

Baseline: sampling in ε-space

First the normal path — predict noise, solve for $\overset{z}{^}_{0}$ , step. SD 1.5 runs with DDIM at 20–50 steps; the deterministic reverse process lets us skip timesteps because the marginals $q (x_{t} ∣ x_{0})$ stay valid. Classifier-free guidance mixes the conditional and unconditional noise predictions:

\overset{ϵ}{^}_{guided} = \overset{ϵ}{^}_{uncond} + w (\overset{ϵ}{^}_{cond} - \overset{ϵ}{^}_{uncond}) .

def ddpm_epsilon_sampler(prompt, scheduler, guidance_scale=7.5):
    timesteps = torch.linspace(999, 0, 50).long()
    z_t = sample_ddpm_latent(time=999)
    text_emb, uncond_emb = encode_text(prompt), encode_text("")
 
    for i, t in enumerate(timesteps):
        t_tensor = torch.tensor([t], device="mps")
        eps_uncond = unet(z_t, t_tensor, encoder_hidden_states=uncond_emb).sample
        eps_text   = unet(z_t, t_tensor, encoder_hidden_states=text_emb).sample
        eps = eps_uncond + guidance_scale * (eps_text - eps_uncond)
 
        z0 = (z_t - scheduler.sigma[t] * eps) / scheduler.alpha[t]
        if i < len(timesteps) - 1:
            t_next = timesteps[i + 1]
            z_t = scheduler.alpha[t_next] * z0 + scheduler.sigma[t_next] * eps
        else:
            z_t = z0
    return z_t

Prompt: “A woman on vacation in Bali.”

The conversion, applied

Now the velocity path. Wrap the (CFG-combined) noise predictor, convert its output to velocity via the identity above, then denoise with the $x_{0}$ -from- $v$ formula.

A nice structural fact worth noting: the velocity operator (ε → v) and the CFG operator (linear mix of cond/uncond) commute — so it doesn’t matter whether you apply guidance before or after converting to velocity.

def noise_to_velocity(prompt, z_t, t, unet, scheduler, guidance_scale=7.5):
    eps = noise_predictor(prompt, z_t, t, unet, guidance_scale)
    v = (scheduler.dot_alpha[t] / scheduler.alpha[t]) * z_t \
        + (scheduler.dot_sigma[t]
           - scheduler.dot_alpha[t] * scheduler.sigma[t] / scheduler.alpha[t]) * eps
    return v, eps
 
def velocity_ddim_sampler(prompt, unet, scheduler, guidance_scale=7.5):
    timesteps = torch.linspace(999, 0, 50).long()
    z_t = sample_ddpm_latent(time=999)
 
    for i, t in enumerate(timesteps):
        v, eps = noise_to_velocity(prompt, z_t, t, unet, scheduler, guidance_scale)
        denom = scheduler.sigma[t] * scheduler.dot_alpha[t] \
                - scheduler.dot_sigma[t] * scheduler.alpha[t]
        z0 = (scheduler.sigma[t] * v - scheduler.dot_sigma[t] * z_t) / denom
        if i < len(timesteps) - 1:
            t_next = timesteps[i + 1]
            z_t = scheduler.alpha[t_next] * z0 + scheduler.sigma[t_next] * eps
        else:
            z_t = z0
    return z_t

Same prompt, sampling entirely through the velocity reparameterization:

The image adheres to the prompt

Ann He

Explorer

Reparameterizing an ε-predictor as a v-predictor

The setup

A scheduler that exposes the derivatives

Baseline: sampling in ε-space

The conversion, applied

References

Graph View

Table of Contents

Backlinks