Graph Tree: Composable Subgraph Training
The graph tree extends Graph-as-Module composition with label-path addressing, selective freeze/thaw, subgraph checkpointing, and cross-boundary observation.
Prerequisites: The Graph Builder and Advanced Graphs. You should be comfortable with
FlowBuilder,tag,using, and Graph-as-Module nesting.
Labeling subgraphs
Any Graph can be labeled with .label():
let encoder = FlowBuilder::from(Linear::new(4, 8)?)
.through(GELU)
.through(Linear::new(8, 4)?)
.label("encoder")
.build()?;
When a labeled graph is used inside another FlowBuilder, the parent
automatically registers it as a child subgraph. Unlabeled graphs work
exactly as before – they just don’t get tree features.
let model = FlowBuilder::from(encoder) // child "encoder" registered
.through(Linear::new(4, 2)?)
.build()?;
assert_eq!(model.tree_children().len(), 1);
assert!(model.child_graph("encoder").is_some());
Labels must be valid identifiers (no dots – dots are path separators). Duplicate labels at the same level produce a build error.
The composed flag
A child graph knows it’s been nested:
let child = model.child_graph("encoder").unwrap();
assert!(child.is_composed()); // true -- someone is using us
Use is_composed() to adapt behavior. For example, skip standalone loss
computation when a parent will handle the loss.
Label-path addressing
Dots in paths mean subgraph boundaries. The rule is simple:
"encoder"– child subgraph “encoder” (or local tag if no child with that name)"encoder.hidden"– tag “hidden” inside child “encoder”"letter.read.confidence"– tag “confidence” inside “read” inside “letter”
No fuzzy resolution, no walking up to parents. If a segment doesn’t match a child or tag, you get a clear error.
// Validate paths at build time
assert_eq!(model.validate_path("encoder")?, PathKind::Subgraph);
assert_eq!(model.validate_path("encoder.hidden")?, PathKind::Tag);
Selective freeze and thaw
Freeze or thaw any subtree by path:
// Freeze the entire encoder
model.freeze("encoder")?;
assert!(model.is_frozen("encoder")?);
// Thaw just the scan phase within the encoder
model.thaw("encoder.scan")?;
// Read phase stays frozen
assert!(model.is_frozen("encoder.read")?);
This makes training phase definitions declarative:
// Phase 1: train only the routing layer, everything else frozen
model.freeze("encoder")?;
// ... train ...
// Phase 2: thaw encoder.scan, keep encoder.read frozen
model.thaw("encoder.scan")?;
// ... train with lower LR on scan ...
Parameter groups by path
parameters_at() collects parameters from a subtree, ready for optimizer
groups:
let mut optimizer = Adam::with_groups()
.group(&model.parameters_at("meta")?, 0.001)
.group(&model.parameters_at("encoder.scan")?, 0.0001)
// encoder.read is frozen -- not in any group
.build();
For checkpoint operations, named_parameters_at() returns names in the
target’s own namespace – not the parent’s:
let named = model.named_parameters_at("encoder")?;
// Names like "hidden/weight", "hidden/bias" -- the encoder's own names
Subgraph checkpoint loading
Train a component standalone, save it, then load it into a larger model:
// Step 1: Train the encoder standalone
let encoder = FlowBuilder::from(scan_module)
.tag("scan")
.through(read_module)
.tag("read")
.label("encoder")
.build()?;
// ... train encoder ...
encoder.save_checkpoint("encoder_v1.fdl.gz")?;
// Step 2: Build a larger model with a fresh encoder
let fresh_encoder = FlowBuilder::from(scan_module_new)
.tag("scan")
.through(read_module_new)
.tag("read")
.label("encoder")
.build()?;
let model = FlowBuilder::from(fresh_encoder)
.through(classifier)
.build()?;
// Load pre-trained weights into the encoder subgraph
let report = model.load_subgraph_checkpoint("encoder", "encoder_v1.fdl.gz")?;
eprintln!("Loaded {}/{} params", report.loaded.len(), report.loaded.len() + report.missing.len());
// Freeze the read phase (pre-trained, proven)
model.freeze("encoder.read")?;
The checkpoint uses the child’s own namespace and structural hash. Architecture mismatches are caught at load time.
Cross-boundary observation
Read tagged outputs across graph boundaries:
// After forward pass
model.forward(&input)?;
// Read a child's tagged output (null/nil semantics)
match model.tagged_at("encoder.hidden")? {
Some(v) => println!("hidden shape: {:?}", v.shape()),
None => println!("not computed yet"),
}
// Err = path doesn't exist (wiring bug)
Record and track metrics across boundaries:
// Record into child's observation buffer
model.record_at("encoder.loss", loss_value)?;
model.record_at("encoder.accuracy", acc)?;
// Single flush on the parent flushes the entire tree
model.flush(&[]);
// Read trend from child
let trend = model.trend_at("encoder.loss")?;
println!("encoder loss trend: {:?}", trend.last());
Tree-aware flush and metrics
flush() automatically recurses into all labeled child subgraphs. A single
model.flush(&[]) on the root graph flushes the entire tree – no need to walk
children manually. If a child’s buffer is already empty (flushed separately),
it’s safely skipped (no double epoch entries).
latest_metrics() collects from the entire tree with dotted prefixes. A
child labeled "encoder" with a metric "loss" appears as "encoder.loss".
Deep nesting works too: "letter.read.confidence".
This means Monitor::log() sees the whole tree automatically:
model.record_at("subscan.ce", ce_value)?;
model.record_at("letter.accuracy", acc)?;
model.record_scalar("total_loss", total);
model.flush(&[]); // flushes parent + subscan + letter
// Monitor sees: total_loss, subscan.ce, letter.accuracy
monitor.log(epoch, t.elapsed(), &model);
The dashboard displays each metric as a separate curve – the dotted names provide natural grouping in the legend.
Independent flush cadences
Sometimes child subgraphs train on a different schedule (e.g. a slow auxiliary
loss that’s only meaningful every N epochs). Use flush_local() and
latest_metrics_local() to manage each graph’s observation cycle independently:
// Every epoch: flush parent only
model.flush_local(&[]);
// Every 10 epochs: flush the slow child
if epoch % 10 == 0 {
model.child_graph("auxiliary").unwrap().flush_local(&[]);
}
// For monitoring, choose what to show:
// - latest_metrics_local() = only this graph's own metrics
// - latest_metrics() = this graph + all children (tree-recursive)
monitor.log(epoch, t.elapsed(), &model); // uses latest_metrics() by default
When using independent cadences, the parent’s latest_metrics() still collects
from children – it reads whatever the child last flushed. So the dashboard
shows the child’s most recent epoch value, updated at the child’s own pace.
Internal tags
Tags starting with _ are automatically internal – hidden from parent
graph resolution:
let encoder = FlowBuilder::from(module)
.tag("_plumbing") // auto-internal (underscore prefix)
.through(next)
.tag("output") // visible from parent
.label("encoder")
.build()?;
let model = FlowBuilder::from(encoder)
.through(Linear::new(4, 2)?)
.build()?;
// This fails: _plumbing is internal
assert!(model.tagged_at("encoder._plumbing").is_err());
// This works
assert!(model.tagged_at("encoder.output").is_ok());
You can also mark tags explicitly:
FlowBuilder::from(module)
.tag("intermediate")
.internal("intermediate") // explicitly hide from parent
.through(next)
.build()?;
Training mode propagation
Set training/eval mode on specific subgraphs:
// Put encoder in eval mode (BatchNorm uses running stats)
model.set_training_at("encoder", false)?;
// Rest of model stays in training mode
This matters for BatchNorm – frozen subgraphs should use running stats (eval mode), not batch stats.
Verbose build output
Enable .verbose(true) to print the tree structure on build:
let model = FlowBuilder::from(encoder)
.through(classifier)
.verbose(true)
.build()?;
Prints to stderr:
=== Graph Tree ===
(root) [hash: a3f8c2d1]
+-- tags: output
+-- params: 6
+-- encoder [hash: 7b2e9f4a]
+-- tags: hidden, output
+-- params: 4
=== Parameter Summary ===
Total: 6 parameters
encoder: 4 (66.7%) trainable
(own): 2 (33.3%) trainable
You can also call tree_summary() and param_summary() directly.
Performance guarantee
The graph tree adds zero overhead to the forward path. All tree metadata
(children, composed, internal_tags) is stored in the Graph struct but
never accessed during forward_impl(). The pre-computed Vec routing, reused
execution buffers, and topological level execution remain exactly as they are.
Tree operations (parameters_at, freeze, tagged_at, etc.) are explicit
calls – they only run when you call them, never during forward/backward.
Quick reference
| Method | Returns | Description |
|---|---|---|
tree_children() |
&HashMap<String, usize> |
Direct children map |
child_graph(label) |
Option<&Graph> |
One-level child lookup |
subgraph(path) |
Result<&Graph> |
Multi-level subgraph lookup |
is_composed() |
bool |
Whether nested in a parent |
validate_path(path) |
Result<PathKind> |
Check if path resolves |
parameters_at(path) |
Result<Vec<Parameter>> |
Params at path |
named_parameters_at(path) |
Result<Vec<(String, Parameter)>> |
Named params (target namespace) |
named_buffers_at(path) |
Result<Vec<(String, Buffer)>> |
Named buffers (target namespace) |
freeze(path) |
Result<()> |
Freeze all params at path |
thaw(path) |
Result<()> |
Unfreeze all params at path |
is_frozen(path) |
Result<bool> |
All params frozen? |
set_training_at(path, bool) |
Result<()> |
Training/eval mode at path |
load_subgraph_checkpoint(path, file) |
Result<LoadReport> |
Load checkpoint into subgraph |
tagged_at(path) |
Result<Option<Variable>> |
Tagged output across boundaries |
collect_at(paths) |
Result<()> |
Collect metrics across boundaries |
record_at(path, value) |
Result<()> |
Record scalar into child’s buffer |
trend_at(path) |
Result<Trend> |
Epoch trend from child’s history |
flush(tags) |
() |
Flush batch buffer (recurses into children) |
flush_local(tags) |
() |
Flush this graph only (no recursion) |
latest_metrics() |
Vec<(String, f64)> |
Latest epoch values (children with dotted prefixes) |
latest_metrics_local() |
Vec<(String, f64)> |
Latest epoch values (this graph only) |
tree_summary() |
String |
Tree structure visualization |
param_summary() |
String |
Per-subgraph param breakdown |
internal_tags() |
&HashSet<String> |
Tags hidden from parent |
FlowBuilder methods
| Method | Description |
|---|---|
.label(name) |
Set graph label (enables tree features when nested) |
.internal(tag) |
Mark a tag as internal (hidden from parent) |
.verbose(true) |
Print tree structure on build |
Migrating checkpoints from earlier versions
If you trained a model with flodl 0.1.x and renamed tags or restructured the
graph for 0.2.0, use migrate_checkpoint_file() to remap parameter names
without retraining:
use flodl::nn::{checkpoint_version, migrate_checkpoint_file};
if checkpoint_version("encoder_v1.fdl")? < 2 {
let report = migrate_checkpoint_file(
"encoder_v1.fdl",
"encoder_v2.fdl",
&encoder.named_parameters(),
&encoder.named_buffers(),
)?;
println!("{}", report);
assert!(report.is_complete());
}
// Load into the subgraph as usual
model.load_subgraph_checkpoint("encoder", "encoder_v2.fdl")?;
The migrated checkpoint is written as v2 with a zeroed structural hash, so it loads without architecture validation. Same architecture required – if you changed layer sizes, retrain instead.
What’s next
The graph tree is the foundation for progressive model composition – training layers independently, checkpointing them, and composing them into larger models with fine-grained training control. See the design document for the full architecture.