Skip to main content

Best practices to organize hyperparameter searches

Use unique tags with wandb.init to organize and filter hyperparameter search runs.

Can I rerun a grid search?

Delete failed runs and resume the sweep to re-run specific parameter combinations.

Can I use Sweeps and SageMaker?

Yes, authenticate W&B with a requirements.txt file in your SageMaker estimator setup.

Can we flag boolean variables as hyperparameters?

Yes, use the args_no_boolean_flags macro in your sweep config to pass booleans as flags.

Can you use W&B Sweeps with cloud infrastructures such as AWS Batch, ECS, etc.?

Yes, publish the sweep_id and have cloud instances run wandb agent with that ID.

Do I need to provide values for all hyperparameters as part of the W&B Sweep. Can I set defaults?

No, config values passed to wandb.init() serve as defaults that the sweep can override.

How can I change the directory my sweep logs to locally?

Set the WANDB_DIR environment variable to change the local logging directory for sweep data.

How can I resume a sweep using Python code?

Resume an existing sweep by passing its sweep_id to the wandb.agent() function.

How do I best log models from runs in a sweep?

Create a single model artifact for the sweep where each version represents a different run.

How do I enable code logging with Sweeps?

Add wandb.log_code() after wandb.init to enable code logging for sweeps, even if enabled in profile settings.

How do I fix `CommError, Run does not exist` during a sweep?

Remove the manually set run ID from wandb.init() since sweeps generate their own unique IDs.

How do I fix `Cuda out of memory` during a sweep?

Refactor your sweep to use process-based execution by running the agent from the CLI.

How do I fix an `anaconda 400 error` during a sweep?

Log the exact metric specified in your sweep config to fix the anaconda 400 error.

How do I use custom CLI commands with sweeps?

Configure a sweep YAML to use custom CLI commands that pass hyperparameters to your training script.

How should I run sweeps on SLURM?

Run wandb agent —count 1 SWEEP_ID in each SLURM job to execute one training run per job.

Is there a way to add extra values to a sweep, or do I need to start a new one?

Sweep config cannot be changed after starting. Create a new sweep from existing runs instead.

Optimizing multiple metrics

Use a weighted sum of individual metrics to optimize multiple metrics in a single sweep run.

What happens if I edit my Python files while a sweep is running?

The sweep keeps using the original train script, but picks up changes to imported helper files.

What is the `Est. Runs` column?

Est. Runs shows the total number of possible parameter combinations in a discrete sweep search space.