A diffusion model trained to predict noise () can be turned into one that predicts velocity () without retraining — the two are related by a fixed, schedule-dependent change of variables. This note works through that reparameterization and verifies it empirically: I load SD 1.5 (a noise predictor) and sample from it two ways — once in -space, once by converting its output to velocity on the fly — and confirm the images match.

The velocity parameterization is the one made explicit in Progressive Distillation for Fast Sampling, and it’s the bridge from the diffusion view into the flow-matching framing, where the model is expected to output a velocity field. Everything here is in latent space (latent diffusion): SD 1.5 diffuses the VAE latent, not pixels.

The setup

The forward process noises a clean latent with Gaussian noise on a schedule :

Treating as a point mass moving in time, its velocity is just the time derivative:

A noise predictor gives us . To express in terms of the quantities we actually have at inference ( and ), solve the forward equation for and substitute:

That identity is the whole trick: given an -prediction and the schedule, you get the velocity for free. No retraining, no new weights.

A scheduler that exposes the derivatives

The standard diffusers DDIM scheduler doesn’t hand you and , which the velocity formula needs. So I build the schedule explicitly — linear-, with and from the cumulative product and their derivatives by finite difference. Having the time-segment indexing explicit also makes the per-step bookkeeping easier to follow.

class LinearBetaScheduler:
    def __init__(self, T=1000, beta_min=0.00085, beta_max=0.012):
        self.T = T
        self.betas = torch.linspace(beta_min, beta_max, T)
        self.alphas_cumprod = torch.cumprod(1 - self.betas, dim=0)
 
        self.alpha = self.alphas_cumprod.sqrt().to("mps")
        self.sigma = (1 - self.alphas_cumprod).sqrt().to("mps")
 
        # time derivatives via forward difference
        self.dot_alpha = torch.zeros(T).to("mps")
        self.dot_sigma = torch.zeros(T).to("mps")
        self.dot_alpha[1:] = self.alpha[1:] - self.alpha[:-1]
        self.dot_sigma[1:] = self.sigma[1:] - self.sigma[:-1]
        self.dot_alpha[0] = self.dot_alpha[1]   # boundary
        self.dot_sigma[0] = self.dot_sigma[1]

Baseline: sampling in ε-space

First the normal path — predict noise, solve for , step. SD 1.5 runs with DDIM at 20–50 steps; the deterministic reverse process lets us skip timesteps because the marginals stay valid. Classifier-free guidance mixes the conditional and unconditional noise predictions:

def ddpm_epsilon_sampler(prompt, scheduler, guidance_scale=7.5):
    timesteps = torch.linspace(999, 0, 50).long()
    z_t = sample_ddpm_latent(time=999)
    text_emb, uncond_emb = encode_text(prompt), encode_text("")
 
    for i, t in enumerate(timesteps):
        t_tensor = torch.tensor([t], device="mps")
        eps_uncond = unet(z_t, t_tensor, encoder_hidden_states=uncond_emb).sample
        eps_text   = unet(z_t, t_tensor, encoder_hidden_states=text_emb).sample
        eps = eps_uncond + guidance_scale * (eps_text - eps_uncond)
 
        z0 = (z_t - scheduler.sigma[t] * eps) / scheduler.alpha[t]
        if i < len(timesteps) - 1:
            t_next = timesteps[i + 1]
            z_t = scheduler.alpha[t_next] * z0 + scheduler.sigma[t_next] * eps
        else:
            z_t = z0
    return z_t

Prompt: “A woman on vacation in Bali.”

The conversion, applied

Now the velocity path. Wrap the (CFG-combined) noise predictor, convert its output to velocity via the identity above, then denoise with the -from- formula.

A nice structural fact worth noting: the velocity operator (ε → v) and the CFG operator (linear mix of cond/uncond) commute — so it doesn’t matter whether you apply guidance before or after converting to velocity.

def noise_to_velocity(prompt, z_t, t, unet, scheduler, guidance_scale=7.5):
    eps = noise_predictor(prompt, z_t, t, unet, guidance_scale)
    v = (scheduler.dot_alpha[t] / scheduler.alpha[t]) * z_t \
        + (scheduler.dot_sigma[t]
           - scheduler.dot_alpha[t] * scheduler.sigma[t] / scheduler.alpha[t]) * eps
    return v, eps
 
def velocity_ddim_sampler(prompt, unet, scheduler, guidance_scale=7.5):
    timesteps = torch.linspace(999, 0, 50).long()
    z_t = sample_ddpm_latent(time=999)
 
    for i, t in enumerate(timesteps):
        v, eps = noise_to_velocity(prompt, z_t, t, unet, scheduler, guidance_scale)
        denom = scheduler.sigma[t] * scheduler.dot_alpha[t] \
                - scheduler.dot_sigma[t] * scheduler.alpha[t]
        z0 = (scheduler.sigma[t] * v - scheduler.dot_sigma[t] * z_t) / denom
        if i < len(timesteps) - 1:
            t_next = timesteps[i + 1]
            z_t = scheduler.alpha[t_next] * z0 + scheduler.sigma[t_next] * eps
        else:
            z_t = z0
    return z_t

Same prompt, sampling entirely through the velocity reparameterization:

The image adheres to the prompt

References

  1. https://arxiv.org/abs/2202.00512
  2. https://arxiv.org/abs/2112.10752
  3. https://arxiv.org/abs/2509.25170
  4. https://arxiv.org/abs/2010.02502