Tutorial 2: Automatic Differentiation
The autograd module provides reverse-mode automatic differentiation.
Variables wrap tensors with gradient tracking. When you perform operations
on variables, a computation graph is built behind the scenes. Calling
backward() walks that graph in reverse, accumulating gradients at each
leaf variable.
This tutorial builds on Tutorial 1: Tensors.
Variables
A Variable wraps a tensor and optionally tracks gradients:
use flodl::{Tensor, Variable};
let t = Tensor::from_f32(&[2.0], &[1], Device::CPU)?;
// requires_grad=true: operations on this variable build a computation graph
let x = Variable::new(t, true);
// requires_grad=false: just a constant, no tracking
let c = Variable::new(some_tensor, false);
Variables created by the user are leaf variables. Variables produced by operations are non-leaf (intermediate) nodes in the computation graph.
Forward Pass: Building the Graph
Operations on variables mirror the tensor API. Each operation records its inputs and backward function:
let w_t = Tensor::from_f32(&[3.0], &[1], Device::CPU)?;
let x_t = Tensor::from_f32(&[2.0], &[1], Device::CPU)?;
let w = Variable::new(w_t, true);
let x = Variable::new(x_t, true);
// y = x * w, then reduce to scalar
let y = x.mul(&w)?.sum()?;
// The graph now records: Sum <- Mul <- (x, w)
The full set of differentiable operations includes:
Arithmetic: add, sub, mul, div, matmul, mul_scalar, add_scalar, div_scalar, neg
Activations: relu, sigmoid, tanh, gelu, silu, leaky_relu, elu, softplus, mish, selu, hardswish, hardsigmoid, prelu, softmax, log_softmax
Math: exp, log, sqrt, abs, pow_scalar, sin, cos, sign, floor, ceil, round, reciprocal, clamp, clamp_min, clamp_max, log1p, expm1, log2, log10, atan2, maximum, minimum, masked_fill, normalize, cosine_similarity, triu, tril
Reductions: sum, sum_dim, mean, mean_dim, min, max, min_dim, max_dim, var, std, var_dim, std_dim, prod, prod_dim, cumsum, logsumexp
Shape: transpose, permute, reshape, flatten, squeeze, unsqueeze, unsqueeze_many, expand, narrow, select, cat, cat_many, stack, chunk, repeat, pad, index_select, gather, topk, sort
NN ops: conv1d, conv_transpose1d, conv2d, conv_transpose2d, conv3d, conv_transpose3d, max_pool2d, avg_pool2d, max_pool1d, avg_pool1d, adaptive_avg_pool2d, adaptive_max_pool2d, instance_norm, group_norm, layer_norm, grid_sample, pixel_shuffle, pixel_unshuffle, bilinear, embedding_bag, im2col, col2im
Backward Pass: Computing Gradients
Call backward() on a scalar variable to compute gradients for all leaf
variables that contributed to it:
y.backward()?;
backward() requires a scalar (single-element) output. After the backward
pass, the calling variable’s grad_fn chain is severed in-place (via
detach_()) — this immediately frees the C++ autograd Node objects rather
than waiting for the variable to be dropped. Leaf variables hold their
accumulated gradients:
println!("{:?}", w.grad()); // dy/dw — the gradient tensor
println!("{:?}", x.grad()); // dy/dx
Complete Example: Manual Gradient Check
// y = x * w, where x=2, w=3
// dy/dw = x = 2
// dy/dx = w = 3
let x_t = Tensor::from_f32(&[2.0], &[1], Device::CPU)?;
let w_t = Tensor::from_f32(&[3.0], &[1], Device::CPU)?;
let x = Variable::new(x_t, true);
let w = Variable::new(w_t, true);
let y = x.mul(&w)?.sum()?;
y.backward()?;
let w_grad = w.grad().unwrap().to_f32_vec()?; // [2.0] — dy/dw = x
let x_grad = x.grad().unwrap().to_f32_vec()?; // [3.0] — dy/dx = w
ZeroGrad
Gradients accumulate across multiple backward passes. Reset them before each training step:
w.zero_grad(); // reset gradient to None
In practice you will call optimizer.zero_grad() which does this for all
parameters (see Tutorial 4).
Detach
Stop gradient flow by detaching a variable. This creates a new leaf variable sharing the same tensor data but with no gradient tracking:
let detached = v.detach();
// Operations on detached do not build a graph
The underlying Tensor also has an in-place variant, detach_(), which
severs the grad_fn chain without allocating a new handle. This is used
internally by backward() to release the autograd graph immediately.
no_grad: Disabling Tracking for Inference
Wrap inference code in no_grad to skip graph construction. This saves
memory and computation:
use flodl::no_grad;
no_grad(|| {
let output = model.forward(&input)?;
// No computation graph is built, even if inputs require gradients.
Ok(output)
})?;
no_grad blocks can nest safely.
Error Handling
floDl uses Rust’s Result<T> type for error handling. Every operation that
can fail returns a Result, and the ? operator propagates errors immediately
— no silent failures, no error chains:
let result = x.matmul(&w)?.add(&b)?;
// If matmul fails (shape mismatch), the error returns immediately.
// No silent propagation — you handle errors explicitly.
This is more explicit than error chains but catches bugs earlier and produces clearer error messages.