Train & Validate¶
Training Model¶
ModelTrainer manages the training process. Initialize it and call solve to start training. Start the progress logger before training to enable logging, W&B, or TensorBoard.
from yolo import ModelTrainer
solver = ModelTrainer(cfg, model, converter, progress, device, use_ddp)
progress.start()
solver.solve(dataloader)
Training Diagram¶
flowchart LR
subgraph TS["trainer.solve"]
subgraph TE["train one epoch"]
subgraph "train one batch"
backpropagation-->TF[forward]
TF-->backpropagation
end
end
subgraph validator.solve
VC["calculate mAP"]-->VF[forward]
VF[forward]-->VC
end
end
TE-->validator.solve
validator.solve-->TE
Training Callbacks¶
Gradient Accumulation¶
GradientAccumulation is a Lightning callback that automatically scales accumulate_grad_batches to match equivalent_batch_size. During warmup epochs the accumulation count ramps linearly from 1 up to the target, then holds constant.
Configure via train.yaml:
data:
batch_size: 16
equivalent_batch_size: 64 # effective batch = batch_size * world_size * accumulation
LR & Momentum Warmup¶
WarmupBatchScheduler wraps any epoch-level scheduler and interpolates LR per batch across each epoch. During warmup, momentum is also ramped from start_momentum to end_momentum.
The default policy is YOLOWarmupPolicy:
- Bias group (group 0): starts at 10× initial LR, ramps down to 1× over warmup.
- Conv / BN groups: start at 0, ramp up to 1× over warmup.
Configure via train.yaml:
scheduler:
type: LinearLR
warmup:
epochs: 3.0
start_momentum: 0.8
end_momentum: 0.937
args:
total_iters: 500
start_factor: 1
end_factor: 0.01
EMA (Exponential Moving Average)¶
EMA is a Lightning callback that keeps a shadow copy of model weights smoothed over training steps:
Validation always runs on shadow weights; training weights are swapped back immediately after. Enable via config:
Validation Model¶
ModelValidator follows the same pattern as training.
from yolo import ModelValidator
solver = ModelValidator(cfg, model, converter, progress, device, use_ddp)
progress.start()
solver.solve(dataloader)
Note
The training process already includes validation. Call ModelValidator separately only if you want to re-run validation after training is complete.