A diffusion model trained to predict noise () can be turned into one that predicts velocity () without retraining — the two are related by a fixed, schedule-dependent change of variables. This note works through that reparameterization and verifies it empirically: I load SD 1.5 (a noise predictor) and sample from it two ways — once in -space, once by converting its output to velocity on the fly — and confirm the images match.
The velocity parameterization is the one made explicit in Progressive Distillation for Fast Sampling, and it’s the bridge from the diffusion view into the flow-matching framing, where the model is expected to output a velocity field. Everything here is in latent space (latent diffusion): SD 1.5 diffuses the VAE latent, not pixels.
The setup
The forward process noises a clean latent with Gaussian noise on a schedule :
Treating as a point mass moving in time, its velocity is just the time derivative:
A noise predictor gives us . To express in terms of the quantities we actually have at inference ( and ), solve the forward equation for and substitute:
That identity is the whole trick: given an -prediction and the schedule, you get the velocity for free. No retraining, no new weights.
The reverse direction: recovering from velocity
For a velocity-based sampler you need to denoise — recover — from . Take the two defining equations: Eliminate (multiply the first by , the second by , subtract) to get
This is the denoiser form used in 2509.25170.
A scheduler that exposes the derivatives
The standard diffusers DDIM scheduler doesn’t hand you and
, which the velocity formula needs. So I build the schedule
explicitly — linear-, with and from the cumulative
product and their derivatives by finite difference. Having the time-segment
indexing explicit also makes the per-step bookkeeping easier to follow.
class LinearBetaScheduler:
def __init__(self, T=1000, beta_min=0.00085, beta_max=0.012):
self.T = T
self.betas = torch.linspace(beta_min, beta_max, T)
self.alphas_cumprod = torch.cumprod(1 - self.betas, dim=0)
self.alpha = self.alphas_cumprod.sqrt().to("mps")
self.sigma = (1 - self.alphas_cumprod).sqrt().to("mps")
# time derivatives via forward difference
self.dot_alpha = torch.zeros(T).to("mps")
self.dot_sigma = torch.zeros(T).to("mps")
self.dot_alpha[1:] = self.alpha[1:] - self.alpha[:-1]
self.dot_sigma[1:] = self.sigma[1:] - self.sigma[:-1]
self.dot_alpha[0] = self.dot_alpha[1] # boundary
self.dot_sigma[0] = self.dot_sigma[1]Baseline: sampling in ε-space
First the normal path — predict noise, solve for , step. SD 1.5 runs with DDIM at 20–50 steps; the deterministic reverse process lets us skip timesteps because the marginals stay valid. Classifier-free guidance mixes the conditional and unconditional noise predictions:
def ddpm_epsilon_sampler(prompt, scheduler, guidance_scale=7.5):
timesteps = torch.linspace(999, 0, 50).long()
z_t = sample_ddpm_latent(time=999)
text_emb, uncond_emb = encode_text(prompt), encode_text("")
for i, t in enumerate(timesteps):
t_tensor = torch.tensor([t], device="mps")
eps_uncond = unet(z_t, t_tensor, encoder_hidden_states=uncond_emb).sample
eps_text = unet(z_t, t_tensor, encoder_hidden_states=text_emb).sample
eps = eps_uncond + guidance_scale * (eps_text - eps_uncond)
z0 = (z_t - scheduler.sigma[t] * eps) / scheduler.alpha[t]
if i < len(timesteps) - 1:
t_next = timesteps[i + 1]
z_t = scheduler.alpha[t_next] * z0 + scheduler.sigma[t_next] * eps
else:
z_t = z0
return z_tPrompt: “A woman on vacation in Bali.”

The conversion, applied
Now the velocity path. Wrap the (CFG-combined) noise predictor, convert its output to velocity via the identity above, then denoise with the -from- formula.
A nice structural fact worth noting: the velocity operator (ε → v) and the CFG operator (linear mix of cond/uncond) commute — so it doesn’t matter whether you apply guidance before or after converting to velocity.
def noise_to_velocity(prompt, z_t, t, unet, scheduler, guidance_scale=7.5):
eps = noise_predictor(prompt, z_t, t, unet, guidance_scale)
v = (scheduler.dot_alpha[t] / scheduler.alpha[t]) * z_t \
+ (scheduler.dot_sigma[t]
- scheduler.dot_alpha[t] * scheduler.sigma[t] / scheduler.alpha[t]) * eps
return v, eps
def velocity_ddim_sampler(prompt, unet, scheduler, guidance_scale=7.5):
timesteps = torch.linspace(999, 0, 50).long()
z_t = sample_ddpm_latent(time=999)
for i, t in enumerate(timesteps):
v, eps = noise_to_velocity(prompt, z_t, t, unet, scheduler, guidance_scale)
denom = scheduler.sigma[t] * scheduler.dot_alpha[t] \
- scheduler.dot_sigma[t] * scheduler.alpha[t]
z0 = (scheduler.sigma[t] * v - scheduler.dot_sigma[t] * z_t) / denom
if i < len(timesteps) - 1:
t_next = timesteps[i + 1]
z_t = scheduler.alpha[t_next] * z0 + scheduler.sigma[t_next] * eps
else:
z_t = z0
return z_tSame prompt, sampling entirely through the velocity reparameterization:

The image adheres to the prompt