Porting from PyTorch
flodl is designed for PyTorch users. Same module names, same semantics, same training loop structure. Most translations are mechanical. This guide covers manual porting and AI-assisted porting.
The fast path: AI-assisted porting
flodl ships with a porting skill that works with AI coding assistants.
The skill reads your PyTorch script, classifies each block by intent,
maps it to flodl equivalents, generates a complete Rust project, and
validates with cargo check.
With Claude Code:
/port my_model.py
With any AI tool:
Point it at ai/skills/port/guide.md in the flodl repo (or any flodl
project scaffolded with fdl init). The guide contains the complete
mapping and the process to follow.
The AI uses fdl api-ref to get the current API surface, so it stays
up to date across flodl versions.
Project setup
Before porting, you need a build environment. The fdl CLI handles this:
# Install fdl (one time)
cargo install flodl-cli # from crates.io
# or: curl -sL https://flodl.dev/fdl -o fdl && chmod +x fdl
# Scaffold a project
fdl init my-model # generates Cargo.toml, Dockerfile, Makefile, etc.
cd my-model
# Detect hardware and download libtorch
fdl setup
All builds run inside Docker. You don’t need Rust on your host machine. For standalone Docker mode (libtorch baked into the image):
fdl init my-model --docker
See the CLI documentation for full details.
Module mapping
flodl uses the same names as PyTorch. The main differences are Rust syntax
(constructors return Result, builder pattern for conv layers) and the
Graph builder for model composition.
Layers
| PyTorch | flodl |
|---|---|
nn.Linear(in, out) |
Linear::new(in, out)? |
nn.Conv2d(in, out, k, padding=1) |
Conv2d::configure(in, out, k).with_padding(1).done()? |
nn.BatchNorm2d(n) |
BatchNorm::new(n)? |
nn.LayerNorm(n) |
LayerNorm::new(n)? |
nn.Dropout(p) |
Dropout::new(p) |
nn.ReLU() |
ReLU::new() |
nn.GELU() |
GELU |
nn.Embedding(n, d) |
Embedding::new(n, d)? |
nn.LSTM(in, h, layers) |
LSTM::new(in, h, layers)? |
nn.GRU(in, h, layers) |
GRU::new(in, h, layers)? |
nn.MultiheadAttention(d, h) |
MultiheadAttention::new(d, h)? |
Every module has an ::on_device(... , device) variant for explicit
device placement.
For the full mapping (30+ modules, losses, optimizers, schedulers), see
ai/skills/port/guide.md.
Losses
flodl losses are functions, not structs:
// PyTorch: criterion = nn.MSELoss(); loss = criterion(pred, target)
// flodl:
let loss = mse_loss(&pred, &target)?;
let loss = cross_entropy_loss(&pred, &target)?;
let loss = focal_loss(&pred, &target, alpha, gamma)?;
Optimizers
let optimizer = Adam::new(&model.parameters(), 1e-3);
let optimizer = AdamW::new(&model.parameters(), 1e-3, 0.01);
let optimizer = SGD::new(&model.parameters(), 0.01).momentum(0.9);
Model architecture: FlowBuilder
This is where flodl diverges from PyTorch in a good way. Instead of
writing a forward() method with imperative control flow, you describe
data flow declaratively with FlowBuilder:
Sequential
# PyTorch
model = nn.Sequential(nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10))
// flodl
let model = FlowBuilder::from(Linear::new(784, 256)?)
.through(ReLU::new())
.through(Linear::new(256, 10)?)
.build()?;
Residual connections
# PyTorch: return x + self.layers(x)
// flodl: .also() adds a residual branch
let block = FlowBuilder::from(Linear::new(d, d)?)
.through(ReLU::new())
.also(Linear::new(d, d)?)
.build()?;
Skip connections / cross-attention
# PyTorch: h = encoder(x); y = decoder(x); return cross_attn(y, h)
// flodl: .tag() saves, .using() retrieves
let model = FlowBuilder::from(encoder)
.tag("hidden")
.through(decoder)
.through(cross_attn).using(&["hidden"])
.build()?;
Parallel branches
# PyTorch: return head_a(x) + head_b(x)
// flodl: .split() + .merge()
let model = FlowBuilder::from(encoder)
.split(modules![head_a, head_b])
.merge(MergeOp::Add)
.build()?;
Iterative refinement
# PyTorch: for _ in range(3): x = refine(x)
// flodl: .loop_body().for_n()
let model = FlowBuilder::from(encoder)
.loop_body(refine_block).for_n(3)
.build()?;
Tags for observation and checkpoints
Tags make intermediate outputs observable and enable selective checkpointing:
let model = FlowBuilder::from(encoder)
.tag("encoder_out") // observable, checkpointable
.through(decoder)
.tag("decoder_out")
.label("my_model") // graph-level label
.build()?;
Training loop
# PyTorch
model.train()
for epoch in range(num_epochs):
for batch in dataloader:
optimizer.zero_grad()
loss = criterion(model(batch), target)
loss.backward()
optimizer.step()
// flodl
model.train();
for epoch in 0..num_epochs {
for batch in loader.epoch(epoch) {
let batch = batch?;
let pred = model.forward(&batch[0].into())?;
let loss = mse_loss(&pred, &Variable::new(batch[1].clone(), false))?;
loss.backward()?;
optimizer.step()?;
optimizer.zero_grad();
}
}
With Graph’s integrated data loader, this simplifies further:
model.train();
for epoch in 0..num_epochs {
for batch in model.epoch(epoch) {
let batch = batch?;
let pred = model.forward_batch(&batch)?;
let loss = mse_loss(&pred, &batch["target"].into())?;
model.step(&loss)?; // backward + optimizer + zero_grad
}
}
Key differences from PyTorch
| Concept | PyTorch | flodl |
|---|---|---|
| Error handling | Exceptions | Result<T> with ? operator |
| Memory | Garbage collected | Reference counted (cheap clone) |
| Model composition | nn.Sequential / manual forward() |
FlowBuilder (declarative data flow) |
| Training mode | model.train() |
model.train() |
| Eval mode | model.eval() |
model.eval() |
| No-grad | with torch.no_grad(): |
no_grad(\|\| { ... }) or NoGradGuard::new() |
| Device | .to(device) / .cuda() |
::on_device(... , device) constructors |
| Checkpoint format | .pt (pickle) |
.fdl (binary, architecture-validated) |
| Losses | Struct instances | Free functions |
| Conv options | Constructor kwargs | Builder pattern (.with_padding(), .done()) |
Further reading
- Full porting guide (30+ modules, all patterns)
- API reference (via
fdl api-ref) - Graph builder tutorial
- Training tutorial
- CLI documentation (project setup, libtorch management)