Skip to main content

How can I fix an error like `AttributeError: module 'wandb' has no attribute ...`?

Fix the wandb AttributeError by reinstalling wandb and removing any local wandb directory that conflicts with the package.

How do I fix `Cuda out of memory` during a sweep?

Refactor your sweep to use process-based execution by running the agent from the CLI.

How do I kill a job with wandb?

Press Ctrl+D to stop a running script that is instrumented with W&B.

How do I resolve a run initialization timeout error in wandb?

Fix timeout errors by checking your network, updating wandb, or increasing WANDB_INIT_TIMEOUT.

If wandb crashes, will it possibly crash my training run?

No. W&B runs in a separate process, so a crash in W&B will not affect your training run.

InitStartError: Error communicating with wandb process

Resolve this error by changing the start_method setting to ‘fork’ or ‘thread’ for your platform.

My run's state is `crashed` on the UI but is still running on my machine. What do I do to get my data back?

Use wandb sync to recover data from a run that shows as crashed due to a lost connection.

Why does my process hang when using Hydra with W&B?

Fix Hydra and W&B multiprocessing conflicts by setting the start method to thread.

Why does my training hang with distributed training?

Fix training hangs at the start or end of distributed training by enabling W&B Service and calling wandb.finish().

Why is a run marked crashed in W&B when it’s training fine locally?

A lost internet connection causes W&B to mark the run as crashed after retries fail.