Porting from PyTorch

flodl is designed for PyTorch users. Same module names, same semantics, same training loop structure. Most translations are mechanical. This guide covers manual porting and AI-assisted porting.

The fast path: AI-assisted porting

flodl ships with a porting skill that works with AI coding assistants. The skill reads your PyTorch script, classifies each block by intent, maps it to flodl equivalents, generates a complete Rust project, and validates with cargo check.

With Claude Code:

/port my_model.py

With any AI tool:

Point it at ai/skills/port/guide.md in the flodl repo (or any flodl project scaffolded with fdl init). The guide contains the complete mapping and the process to follow.

The AI uses fdl api-ref to get the current API surface, so it stays up to date across flodl versions.

Project setup

Before porting, you need a build environment. The fdl CLI handles this:

# Install fdl (one time)
cargo install flodl-cli      # from crates.io
# or: curl -sL https://flodl.dev/fdl -o fdl && chmod +x fdl

# Scaffold a project
fdl init my-model            # generates Cargo.toml, Dockerfile, Makefile, etc.
cd my-model

# Detect hardware and download libtorch
fdl setup

All builds run inside Docker. You don’t need Rust on your host machine. For standalone Docker mode (libtorch baked into the image):

fdl init my-model --docker

See the CLI documentation for full details.

Module mapping

flodl uses the same names as PyTorch. The main differences are Rust syntax (constructors return Result, builder pattern for conv layers) and the Graph builder for model composition.

Layers

PyTorch flodl
nn.Linear(in, out) Linear::new(in, out)?
nn.Conv2d(in, out, k, padding=1) Conv2d::configure(in, out, k).with_padding(1).done()?
nn.BatchNorm2d(n) BatchNorm::new(n)?
nn.LayerNorm(n) LayerNorm::new(n)?
nn.Dropout(p) Dropout::new(p)
nn.ReLU() ReLU::new()
nn.GELU() GELU
nn.Embedding(n, d) Embedding::new(n, d)?
nn.LSTM(in, h, layers) LSTM::new(in, h, layers)?
nn.GRU(in, h, layers) GRU::new(in, h, layers)?
nn.MultiheadAttention(d, h) MultiheadAttention::new(d, h)?

Every module has an ::on_device(... , device) variant for explicit device placement.

For the full mapping (30+ modules, losses, optimizers, schedulers), see ai/skills/port/guide.md.

Losses

flodl losses are functions, not structs:

// PyTorch: criterion = nn.MSELoss(); loss = criterion(pred, target)
// flodl:
let loss = mse_loss(&pred, &target)?;
let loss = cross_entropy_loss(&pred, &target)?;
let loss = focal_loss(&pred, &target, alpha, gamma)?;

Optimizers

let optimizer = Adam::new(&model.parameters(), 1e-3);
let optimizer = AdamW::new(&model.parameters(), 1e-3, 0.01);
let optimizer = SGD::new(&model.parameters(), 0.01).momentum(0.9);

Model architecture: FlowBuilder

This is where flodl diverges from PyTorch in a good way. Instead of writing a forward() method with imperative control flow, you describe data flow declaratively with FlowBuilder:

Sequential

# PyTorch
model = nn.Sequential(nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10))
// flodl
let model = FlowBuilder::from(Linear::new(784, 256)?)
    .through(ReLU::new())
    .through(Linear::new(256, 10)?)
    .build()?;

Residual connections

# PyTorch: return x + self.layers(x)
// flodl: .also() adds a residual branch
let block = FlowBuilder::from(Linear::new(d, d)?)
    .through(ReLU::new())
    .also(Linear::new(d, d)?)
    .build()?;

Skip connections / cross-attention

# PyTorch: h = encoder(x); y = decoder(x); return cross_attn(y, h)
// flodl: .tag() saves, .using() retrieves
let model = FlowBuilder::from(encoder)
    .tag("hidden")
    .through(decoder)
    .through(cross_attn).using(&["hidden"])
    .build()?;

Parallel branches

# PyTorch: return head_a(x) + head_b(x)
// flodl: .split() + .merge()
let model = FlowBuilder::from(encoder)
    .split(modules![head_a, head_b])
    .merge(MergeOp::Add)
    .build()?;

Iterative refinement

# PyTorch: for _ in range(3): x = refine(x)
// flodl: .loop_body().for_n()
let model = FlowBuilder::from(encoder)
    .loop_body(refine_block).for_n(3)
    .build()?;

Tags for observation and checkpoints

Tags make intermediate outputs observable and enable selective checkpointing:

let model = FlowBuilder::from(encoder)
    .tag("encoder_out")        // observable, checkpointable
    .through(decoder)
    .tag("decoder_out")
    .label("my_model")         // graph-level label
    .build()?;

Training loop

# PyTorch
model.train()
for epoch in range(num_epochs):
    for batch in dataloader:
        optimizer.zero_grad()
        loss = criterion(model(batch), target)
        loss.backward()
        optimizer.step()
// flodl
model.train();
for epoch in 0..num_epochs {
    for batch in loader.epoch(epoch) {
        let batch = batch?;
        let pred = model.forward(&batch[0].into())?;
        let loss = mse_loss(&pred, &Variable::new(batch[1].clone(), false))?;
        loss.backward()?;
        optimizer.step()?;
        optimizer.zero_grad();
    }
}

With Graph’s integrated data loader, this simplifies further:

model.train();
for epoch in 0..num_epochs {
    for batch in model.epoch(epoch) {
        let batch = batch?;
        let pred = model.forward_batch(&batch)?;
        let loss = mse_loss(&pred, &batch["target"].into())?;
        model.step(&loss)?;  // backward + optimizer + zero_grad
    }
}

Key differences from PyTorch

Concept PyTorch flodl
Error handling Exceptions Result<T> with ? operator
Memory Garbage collected Reference counted (cheap clone)
Model composition nn.Sequential / manual forward() FlowBuilder (declarative data flow)
Training mode model.train() model.train()
Eval mode model.eval() model.eval()
No-grad with torch.no_grad(): no_grad(\|\| { ... }) or NoGradGuard::new()
Device .to(device) / .cuda() ::on_device(... , device) constructors
Checkpoint format .pt (pickle) .fdl (binary, architecture-validated)
Losses Struct instances Free functions
Conv options Constructor kwargs Builder pattern (.with_padding(), .done())

Further reading