Porting from PyTorch
flodl is designed for PyTorch users. Same module names, same semantics, same training loop structure. Most translations are mechanical. This guide covers manual porting and AI-assisted porting.
The fast path: AI-assisted porting
flodl ships with a porting skill that works with AI coding assistants.
The skill reads your PyTorch script, classifies each block by intent,
maps it to flodl equivalents, generates a complete Rust project, and
validates with cargo check.
With Claude Code:
/port my_model.py
With any AI tool:
Point it at ai/skills/port/guide.md in the flodl repo (or any flodl
project scaffolded with fdl init). The guide contains the complete
mapping and the process to follow.
The AI uses fdl api-ref to get the current API surface, so it stays
up to date across flodl versions.
Project setup
Before porting, you need a build environment. The fdl CLI handles this:
# Install fdl (one time)
cargo install flodl-cli # from crates.io
# or: curl -sL https://flodl.dev/fdl -o fdl && chmod +x fdl
# Scaffold a project
fdl init my-model # generates Cargo.toml, Dockerfile, Makefile, etc.
cd my-model
# Detect hardware and download libtorch
fdl setup
All builds run inside Docker. You don’t need Rust on your host machine. For standalone Docker mode (libtorch baked into the image):
fdl init my-model --docker
See the CLI documentation for full details.
Module mapping
flodl uses the same names as PyTorch. The main differences are Rust syntax
(constructors return Result, builder pattern for conv layers) and the
Graph builder for model composition.
Layers
| PyTorch | flodl |
|---|---|
nn.Linear(in, out) |
Linear::new(in, out)? |
nn.Conv2d(in, out, k, padding=1) |
Conv2d::configure(in, out, k).with_padding(1).done()? |
nn.BatchNorm2d(n) |
BatchNorm::new(n)? |
nn.LayerNorm(n) |
LayerNorm::new(n)? |
nn.Dropout(p) |
Dropout::new(p) |
nn.ReLU() |
ReLU::new() |
nn.GELU() |
GELU (erf form, default) — use GELU::tanh() for the tanh approximation |
nn.Embedding(n, d) |
Embedding::new(n, d)? |
nn.LSTM(in, h, layers) |
LSTM::new(in, h, layers)? |
nn.GRU(in, h, layers) |
GRU::new(in, h, layers)? |
nn.MultiheadAttention(d, h) |
MultiheadAttention::new(d, h)? |
Every module has an ::on_device(... , device) variant for explicit
device placement.
For the full mapping (30+ modules, losses, optimizers, schedulers), see
ai/skills/port/guide.md.
Losses
flodl losses are functions, not structs:
// PyTorch: criterion = nn.MSELoss(); loss = criterion(pred, target)
// flodl:
let loss = mse_loss(&pred, &target)?;
let loss = cross_entropy_loss(&pred, &target)?;
let loss = focal_loss(&pred, &target, alpha, gamma)?;
Optimizers
let optimizer = Adam::new(&model.parameters(), 1e-3);
let optimizer = AdamW::new(&model.parameters(), 1e-3, 0.01);
let optimizer = SGD::new(&model.parameters(), 0.01).momentum(0.9);
Model architecture: FlowBuilder
This is where flodl diverges from PyTorch in a good way. Instead of
writing a forward() method with imperative control flow, you describe
data flow declaratively with FlowBuilder:
Sequential
# PyTorch
model = nn.Sequential(nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10))
// flodl
let model = FlowBuilder::from(Linear::new(784, 256)?)
.through(ReLU::new())
.through(Linear::new(256, 10)?)
.build()?;
Residual connections
# PyTorch: return x + self.layers(x)
// flodl: .also() adds a residual branch
let block = FlowBuilder::from(Linear::new(d, d)?)
.through(ReLU::new())
.also(Linear::new(d, d)?)
.build()?;
Skip connections / cross-attention
# PyTorch: h = encoder(x); y = decoder(x); return cross_attn(y, h)
// flodl: .tag() saves, .using() retrieves
let model = FlowBuilder::from(encoder)
.tag("hidden")
.through(decoder)
.through(cross_attn).using(&["hidden"])
.build()?;
Parallel branches
# PyTorch: return head_a(x) + head_b(x)
// flodl: .split() + .merge()
let model = FlowBuilder::from(encoder)
.split(modules![head_a, head_b])
.merge(MergeOp::Add)
.build()?;
Iterative refinement
# PyTorch: for _ in range(3): x = refine(x)
// flodl: .loop_body().for_n()
let model = FlowBuilder::from(encoder)
.loop_body(refine_block).for_n(3)
.build()?;
Tags for observation and checkpoints
Tags make intermediate outputs observable and enable selective checkpointing:
let model = FlowBuilder::from(encoder)
.tag("encoder_out") // observable, checkpointable
.through(decoder)
.tag("decoder_out")
.label("my_model") // graph-level label
.build()?;
Training loop
# PyTorch
model.train()
for epoch in range(num_epochs):
for batch in dataloader:
optimizer.zero_grad()
loss = criterion(model(batch), target)
loss.backward()
optimizer.step()
flodl ports the manual loop almost line-for-line, and also offers a
universal Trainer that can own the loop for you. Three tiers, same
code on CPU / single GPU / multi-GPU.
Trainer::builder: framework-managed (universal)
You provide a step closure (forward + loss); the framework runs the loop, backward, optimizer step, and gradient sync.
// Step closure: forward + loss, returns the loss Variable.
fn train_step(model: &dyn Module, batch: &[Tensor]) -> Result<Variable> {
let input = Variable::new(batch[0].clone(), false);
let target = Variable::new(batch[1].to_dtype(DType::Int64)?, false);
cross_entropy_loss(&model.forward(&input)?, &target)
}
let handle = Trainer::builder(
|dev| build_model_on(dev),
|params| Adam::new(params, 0.001),
train_step,
)
.dataset(dataset)
.batch_size(32)
.num_epochs(10)
.run()?;
let state = handle.join()?; // averaged params + buffers
Trainer::setup: setup only, your loop
Trainer::setup runs device replication + optimizer setup +
training-mode toggle in one call; the loop stays yours.
let model = build_model()?;
Trainer::setup(&model, |dev| build_model_on(dev), |p| Adam::new(p, 0.001))?;
for epoch in 0..num_epochs {
for batch in loader.epoch(epoch) {
let batch = batch?;
let pred = model.forward(&batch[0].into())?;
let loss = cross_entropy_loss(&pred, &batch["label"].into())?;
loss.backward()?;
model.step()?; // AllReduce + buffer sync + optimizer + zero_grad
}
}
Fully manual: closest port from PyTorch
// flodl
model.train();
for epoch in 0..num_epochs {
for batch in loader.epoch(epoch) {
let batch = batch?;
let pred = model.forward(&batch[0].into())?;
let loss = mse_loss(&pred, &Variable::new(batch[1].clone(), false))?;
loss.backward()?;
optimizer.step()?;
optimizer.zero_grad();
}
}
Multi-GPU (DDP)
The Trainer tiers in Training loop above already cover
the multi-GPU story: both Trainer::builder and Trainer::setup
auto-detect available CUDA devices and fall back to single-GPU/CPU when
fewer than 2 GPUs are present. The same code runs on CPU, single GPU,
and multi-GPU with no process-group setup, no torchrun, no mp.spawn,
and no DistributedSampler.
For DDP-specific knobs (sync policy, averaging backend),
Trainer::builder exposes them on the builder chain:
let ddp = Trainer::builder(model_factory, optim_factory, train_step)
.dataset(dataset)
.batch_size(32)
.num_epochs(10)
.policy(ApplyPolicy::Cadence) // Sync | Cadence | Async
.backend(AverageBackend::Nccl) // Nccl | Cpu
.run()?;
let state = ddp.join()?; // averaged params + buffers on CPU
ElChe cadence auto-detects heterogeneous GPU speeds and lets the faster card run ahead while the slow one anchors synchronization. See the DDP Reference for policies, backends, convergence guard, metrics, and live-monitor wiring, and DDP Benchmark for results on mixed consumer hardware.
Key differences from PyTorch
| Concept | PyTorch | flodl |
|---|---|---|
| Error handling | Exceptions | Result<T> with ? operator |
| Memory | Garbage collected | Reference counted (cheap clone) |
| Model composition | nn.Sequential / manual forward() |
FlowBuilder (declarative data flow) |
| Training mode | model.train() |
model.train() |
| Eval mode | model.eval() |
model.eval() |
| No-grad | with torch.no_grad(): |
no_grad(\|\| { ... }) or NoGradGuard::new() |
| Device | .to(device) / .cuda() |
::on_device(... , device) constructors |
| Checkpoint format | .pt (pickle) |
.fdl (binary, architecture-validated) |
| Losses | Struct instances | Free functions |
| Conv options | Constructor kwargs | Builder pattern (.with_padding(), .done()) |
Further reading
- Full porting guide (30+ modules, all patterns)
- API reference (via
fdl api-ref) - Graph builder tutorial
- Training tutorial
- CLI documentation (project setup, libtorch management)