Troubleshooting

Real error messages and how to fix them.

Build Errors

`error: could not find native static library 'torch_cpu'`

libtorch is not found. The build system needs LIBTORCH_PATH set to the libtorch directory.

Fix (Docker — recommended): All builds should run in the Docker container.

fdl build        # CPU
fdl cuda-build   # CUDA

Fix (host): Use the download script to install libtorch and set up paths:

curl -sL https://raw.githubusercontent.com/flodl-labs/flodl/main/download-libtorch.sh | sh

This downloads libtorch to ~/.local/lib/libtorch and prints the exports to add to your shell profile. Or set the variables manually:

export LIBTORCH_PATH=/path/to/libtorch
export LD_LIBRARY_PATH=$LIBTORCH_PATH/lib
export LIBRARY_PATH=$LIBTORCH_PATH/lib

`error[E0433]: could not find 'flodl' in the list of imported crates`

Cargo can’t resolve the flodl dependency.

Fix: Check your Cargo.toml. If using a git dependency:

[dependencies]
flodl = { git = "https://github.com/flodl-labs/flodl.git" }

Make sure you have network access inside the container. If behind a proxy, configure git:

git config --global http.proxy http://proxy:port

`undefined reference to 'cudaGetDeviceCount'`

CUDA libraries aren’t linked. This happens on CUDA builds when the linker drops CUDA libs with --as-needed.

Fix: Make sure your main.rs calls the CUDA link anchor:

fn main() -> flodl::Result<()> {
    flodl_sys::flodl_force_cuda_link();
    // ... rest of your code
}

And build with the cuda feature:

cargo build --features cuda

`error: linker 'cc' not found`

Missing C/C++ toolchain. The Docker images include this, but if building on host:

Fix:

# Ubuntu/Debian
sudo apt-get install gcc g++ pkg-config

# macOS
xcode-select --install

Docker Errors

`permission denied while trying to connect to the Docker daemon`

Fix: Add your user to the docker group:

sudo usermod -aG docker $USER
# Log out and back in, then retry

`could not select device driver "nvidia"` / `nvidia-container-cli: initialization error`

NVIDIA Container Toolkit is not installed or the driver is missing.

Fix:

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify:

docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi

`service "cuda" depends on ... which is not healthy`

GPU not available to Docker. For CPU-only development:

Fix: Use CPU targets instead:

fdl build    # not cuda-build
fdl test     # not cuda-test

Runtime Errors

`TensorError: shape mismatch in matmul: [2, 3] x [4, 5]`

Matrix dimensions don’t align. Inner dimensions must match for matmul.

Fix: Check your layer dimensions. For Linear::new(in, out), the input tensor’s last dimension must equal in:

// Input is [batch, 10] — so first linear must accept 10
let model = FlowBuilder::from(Linear::new(10, 32)?)  // [batch, 10] -> [batch, 32]
    .through(Linear::new(32, 1)?)                     // [batch, 32] -> [batch, 1]
    .build()?;

`TensorError: shape mismatch: cannot add [4, 8] and [4, 16]`

Shapes don’t broadcast. Common with also() (residual connections) — input and output must have the same shape.

Fix: Make sure the residual branch preserves dimensions:

// Wrong: also() adds input (dim 8) to Linear output (dim 16)
FlowBuilder::from(Linear::new(4, 8)?)
    .also(Linear::new(8, 16)?)   // 8 != 16, shape mismatch

// Right: preserve dimension through the residual
FlowBuilder::from(Linear::new(4, 8)?)
    .also(Linear::new(8, 8)?)    // 8 == 8, ok

`cannot borrow 'optimizer' as mutable, as it is not declared as mutable`

Rust borrow checker: you need mut for things that change state.

Fix:

let mut optimizer = Adam::new(&params, 0.001);  // add mut

`thread 'main' panicked at 'called Result::unwrap() on an Err value'`

An operation failed and you used .unwrap() instead of ?.

Fix: Use ? to propagate errors, and make sure your function returns Result<T>:

fn main() -> flodl::Result<()> {
    let t = Tensor::zeros(&[2, 3], opts)?;  // ? not unwrap()
    Ok(())
}

Checkpoint Errors

`checkpoint error: parameter count mismatch (expected 6, got 4)`

The model architecture changed since the checkpoint was saved.

Fix: Ensure the model definition is identical when loading:

// Save and load must use the same architecture
let model = FlowBuilder::from(Linear::new(1, 32)?)
    .through(GELU)
    .through(Linear::new(32, 1)?)
    .build()?;

model.save_checkpoint("model.fdl")?;

// Later — rebuild the SAME architecture before loading
let model = FlowBuilder::from(Linear::new(1, 32)?)
    .through(GELU)
    .through(Linear::new(32, 1)?)
    .build()?;

let report = model.load_checkpoint("model.fdl")?;

`checkpoint error: invalid magic bytes`

The file is not a valid .fdl checkpoint (wrong format, corrupted, or a PyTorch .pt file).

Fix: floDl uses its own .fdl format. You cannot load PyTorch checkpoints directly. Retrain or export from PyTorch as numpy and reconstruct tensors.

Visualization

`dot: command not found` / SVG rendering fails

Graphviz is not installed.

Fix (Docker): The floDl Docker images include graphviz. Use fdl shell.

Fix (host):

# Ubuntu/Debian
sudo apt-get install graphviz

# macOS
brew install graphviz

`model.svg()` returns empty or minimal output

The graph has no nodes. Make sure you’ve built the graph with FlowBuilder:

let model = FlowBuilder::from(Linear::new(4, 8)?)
    .through(GELU)
    .build()?;
// Now model.dot() and model.svg() produce output

Training Issues

Loss is NaN

Common causes:

Learning rate too high — gradients explode
Log of zero or negative — log(0) = -inf, poisons everything
Division by zero in normalization

Fix:

// Lower the learning rate
let mut optimizer = Adam::new(&params, 1e-4);  // try 10x smaller

// Add gradient clipping
clip_grad_norm(&params, 1.0)?;

// Check for NaN in loss
let loss_val = loss.item()?;
if loss_val.is_nan() {
    eprintln!("NaN detected at epoch {}", epoch);
    break;
}

Loss not decreasing

Learning rate too low — try 10x higher
Model too small — add capacity
Data issue — verify inputs and targets are correct

Debugging checklist:

// Print loss every epoch
println!("epoch {}: loss={:.6}", epoch, loss.item()?);

// Verify gradients exist
for p in &params {
    if let Some(g) = p.var().grad() {
        let norm = g.norm()?.item()?;
        println!("grad norm: {:.6}", norm);
    }
}

// Try a known-working config first (sine wave example)
// cargo run --example sine_wave

Loss oscillates wildly

Fix: Reduce learning rate and/or add gradient clipping:

let mut optimizer = Adam::new(&params, 1e-4);
// ...
clip_grad_norm(&params, 0.5)?;

Out of VRAM

The CUDA allocator ran out of GPU memory. Common when batch sizes are too large or inference runs without disabling gradients.

Diagnose first — check how much VRAM you actually have and how it’s used:

if let Ok((used, total)) = cuda_memory_info() {
    let active = cuda_active_bytes().unwrap_or(0);
    let reserved = cuda_allocated_bytes().unwrap_or(0);
    println!("VRAM: {:.0}/{:.0} MB (active: {:.0} MB, reserved: {:.0} MB)",
        used as f64 / 1e6, total as f64 / 1e6,
        active as f64 / 1e6, reserved as f64 / 1e6);
}

If reserved is much larger than active, the allocator is holding freed blocks. Call cuda_empty_cache() to release them before checking again.

Fix:

Reduce batch size — the single biggest lever for VRAM usage
Use no_grad for inference — backward graphs consume significant memory:
```
let pred = no_grad(|| model.forward(&input))?;
```

Use mixed precision — Float16/BFloat16 halves parameter memory:

cast_parameters(&params, DType::Float16);
let scaler = GradScaler::new();

Use smaller model dimensions — reduce hidden sizes, fewer layers
Detach state between steps — for recurrent models, call model.detach_state() to break gradient chains across time steps
Check for tensor leaks — if VRAM grows linearly across epochs, tensors are being retained. Use live_tensor_count() to track:
```
println!("live tensors: {}", live_tensor_count());
```

Common Rust Compiler Messages

`value used here after move`

You used a variable after it was moved into something else.

Fix: Clone before the move, or restructure to use references:

let t = Tensor::zeros(&[2, 3], opts)?;
let input = Variable::new(t.clone(), true);  // clone t before moving into Variable
// t is still usable

`expected &Tensor, found Tensor`

A function expects a reference but you passed an owned value.

Fix: Add &:

let result = a.add(&b)?;  // &b, not b

`the trait 'Module' is not implemented for ...`

You’re passing something that doesn’t implement Module to a graph builder method.

Fix: Check the type. All built-in layers implement Module. Custom types need impl Module for YourType { ... }.

Multi-GPU / DDP

For DDP-specific troubleshooting (NCCL init failure, parameter mismatch, CUDA context corruption, NCCL deadlock, OOM on smaller GPU, CPU averaging timeout), see the dedicated DDP Reference – Troubleshooting section.

Common quick fixes:

NCCL init fails: Check nvidia-smi topo -m. Try AverageBackend::Cpu.
CUBLAS_STATUS_EXECUTION_FAILED after NCCL: Use NcclComms::new() + split() on main thread, not init_rank() from worker threads.
Training hangs: A worker died mid-collective. DdpHandle auto-aborts via NcclAbortHandle.
OOM on one GPU: Use Cadence policy (El Che assigns fewer batches to smaller GPU).

Getting Help

Examples: cargo run --example quickstart and cargo run --example sine_wave
Tutorials: Start with Rust for PyTorch Users
Multi-GPU: DDP Reference and Multi-GPU Tutorial
Migration guide: PyTorch -> floDl
Issues: github.com/flodl-labs/flodl/issues

← Previous PyTorch Migration Guide Next → The floDl CLI

Troubleshooting

Build Errors

error: could not find native static library 'torch_cpu'

error[E0433]: could not find 'flodl' in the list of imported crates

undefined reference to 'cudaGetDeviceCount'

error: linker 'cc' not found

Docker Errors

permission denied while trying to connect to the Docker daemon

could not select device driver "nvidia" / nvidia-container-cli: initialization error

service "cuda" depends on ... which is not healthy

Runtime Errors

TensorError: shape mismatch in matmul: [2, 3] x [4, 5]

TensorError: shape mismatch: cannot add [4, 8] and [4, 16]

cannot borrow 'optimizer' as mutable, as it is not declared as mutable

thread 'main' panicked at 'called Result::unwrap() on an Err value'

Checkpoint Errors

checkpoint error: parameter count mismatch (expected 6, got 4)

checkpoint error: invalid magic bytes

Visualization

dot: command not found / SVG rendering fails

model.svg() returns empty or minimal output

Training Issues

Loss is NaN

Loss not decreasing

Loss oscillates wildly

Out of VRAM

Common Rust Compiler Messages

value used here after move

expected &Tensor, found Tensor

the trait 'Module' is not implemented for ...

Multi-GPU / DDP

Getting Help

`error: could not find native static library 'torch_cpu'`

`error[E0433]: could not find 'flodl' in the list of imported crates`

`undefined reference to 'cudaGetDeviceCount'`

`error: linker 'cc' not found`

`permission denied while trying to connect to the Docker daemon`

`could not select device driver "nvidia"` / `nvidia-container-cli: initialization error`

`service "cuda" depends on ... which is not healthy`

`TensorError: shape mismatch in matmul: [2, 3] x [4, 5]`

`TensorError: shape mismatch: cannot add [4, 8] and [4, 16]`

`cannot borrow 'optimizer' as mutable, as it is not declared as mutable`

`thread 'main' panicked at 'called Result::unwrap() on an Err value'`

`checkpoint error: parameter count mismatch (expected 6, got 4)`

`checkpoint error: invalid magic bytes`

`dot: command not found` / SVG rendering fails

`model.svg()` returns empty or minimal output

`value used here after move`

`expected &Tensor, found Tensor`

`the trait 'Module' is not implemented for ...`