Best practices to organize hyperparameter searches
Use unique tags with wandb.init to organize and filter hyperparameter search runs.
Can I rerun a grid search?
Delete failed runs and resume the sweep to re-run specific parameter combinations.
Can I use Sweeps and SageMaker?
Yes, authenticate W&B with a requirements.txt file in your SageMaker estimator setup.
Can we flag boolean variables as hyperparameters?
Yes, use the args_no_boolean_flags macro in your sweep config to pass booleans as flags.
Can you use W&B Sweeps with cloud infrastructures such as AWS Batch, ECS, etc.?
Yes, publish the sweep_id and have cloud instances run wandb agent with that ID.
Do I need to provide values for all hyperparameters as part of the W&B Sweep. Can I set defaults?
No, config values passed to wandb.init() serve as defaults that the sweep can override.
How can I change the directory my sweep logs to locally?
Set the WANDB_DIR environment variable to change the local logging directory for sweep data.
How can I resume a sweep using Python code?
Resume an existing sweep by passing its sweep_id to the wandb.agent() function.
How do I best log models from runs in a sweep?
Create a single model artifact for the sweep where each version represents a different run.
How do I enable code logging with Sweeps?
Add wandb.log_code() after wandb.init to enable code logging for sweeps, even if enabled in profile settings.
How do I fix `CommError, Run does not exist` during a sweep?
Remove the manually set run ID from wandb.init() since sweeps generate their own unique IDs.
How do I fix `Cuda out of memory` during a sweep?
Refactor your sweep to use process-based execution by running the agent from the CLI.
How do I fix an `anaconda 400 error` during a sweep?
Log the exact metric specified in your sweep config to fix the anaconda 400 error.
How do I use custom CLI commands with sweeps?
Configure a sweep YAML to use custom CLI commands that pass hyperparameters to your training script.
How should I run sweeps on SLURM?
Run wandb agent —count 1 SWEEP_ID in each SLURM job to execute one training run per job.
Is there a way to add extra values to a sweep, or do I need to start a new one?
Sweep config cannot be changed after starting. Create a new sweep from existing runs instead.
Optimizing multiple metrics
Use a weighted sum of individual metrics to optimize multiple metrics in a single sweep run.
What happens if I edit my Python files while a sweep is running?
The sweep keeps using the original train script, but picks up changes to imported helper files.
What is the `Est. Runs` column?
Est. Runs shows the total number of possible parameter combinations in a discrete sweep search space.