It's the same in machine learning.
Coding, wait for the results, interpret them, and go back to coding. Additionally, some intermediate presentations of my progress. But almost the same thing doesn't mean there's nothing to learn. on the contrary! A few years ago, I started a daily habit of writing down lessons I learned from my work as ML. Looking back at some of the lessons I've been learning from this month, I've found three practical lessons that stand out.
- Keep logging simple
- Use the experimental notebook
- Keep your one night run in mind
Keep logging simple
For years I used weights and bias (W&B)* as my go-to experimental logger. In fact, I used to be in the top 5% of all active users. The statistics in the diagram below tell us that at that point, we trained nearly 25,000 models, used cumulative 5,000 hours of calculations, and performed over 500 hyperparameter searches. I used it to track papers, big projects like weather forecasting using large datasets, and countless small experiments.

And W&B is a really great tool. If you need a beautiful dashboard and work with your team, W&B is shining. And until recently, performing multiple hyperparameter sweeps while reconstructing data from trained neural networks, W&B's visualization capabilities were invaluable. Reconstructions could be directly compared across the run.
However, I have noticed that W&B is overdone for most of my research projects. I rarely revisited individual runs, and once the project was complete, the logs were just sitting there so I did nothing with them. After that, refactoring the mentioned data reconstruction project explicitly removed the W&B integration. Not because I was wrong, but because I didn't need it.
Now my setup is much easier. Logs selected metrics to CSV and text files and writes them directly to disk. It relies on Optuna for searching for hyperparameters. Even distributed versions with central servers save research states to pickle files with local optuna alone. If something crashes, reload and continue. Practical and sufficient (in my use case).
Here is the key insights here. Logging is not a task. It's a support system. Spending 99% of your time and determining what you want to log – Gradation? Weight? distribution? And at which frequency? – Easy to distract from real research. For me, simple local logging covers all your needs and provides minimal setup efforts.
Keep an experimental lab notebook
In December 1939, William Shockley wrote down his ideas in a lab notebook. Replace the vacuum tube with a semiconductor. About 20 years later, two colleagues, Shockley and Bell Lab, received the Nobel Prize for the invention of the modern transistor.
Most of us don't write Nobel-worthy entries in our notebooks, but we can learn from that principle. With machine learning, our workforce does not have chemicals or test tubes. Instead, our labs are often our computers. The same devices I use to write these lines have trained countless models over the years. Additionally, these labs are inherently portable, especially when developing high-performance computing clusters remotely. Better yet, thanks to highly skilled management, these clusters run 24/7. So there's always time to run the experiment.
But the question is which experiment? Here, my former colleague introduced me to the idea of using lab notebooks as the main focus. Recently I've returned to it in the simplest form possible. Before starting a long-term experiment, I write down:
What I'm testing and why am I testing it?
Then, when I come back later – usually the next morning – I can quickly see what results were ready and what I wanted to learn. It's simple, but it changes the workflow. These dedicated experiments become part of a documented feedback loop, rather than “re-run until it works.” Disability is easy to interpret. Success is easy to replicate.
Run the experiment overnight
That's a small, but painful lesson I learned this month.
On Friday evening, I discovered a bug that could affect the results of the experiment. I patched it and played an experiment to verify it. By Saturday morning, the run was over, but when I checked the results I realized I had forgotten to include important ablation. In other words, I had to wait another day.
In ML, overnight time is precious. For our programmers, it's a rest. In our experiment, it's work. If no experiments are being performed while sleeping, we are effectively wasting free calculation cycles.
That doesn't mean that you need to run the experiment just for that. But whenever there are things that make sense to start, it's the perfect time to start them in the evening. Clusters are often underutilized, making resources available more quickly. Most importantly, you will get results to analyze the next morning.
The simple trick is to deliberately plan this. As Cal Newport states in his book Deep Work, good work days begin the night before. Knowing today's and tomorrow's tasks will allow you to set up the right experiment on time.
* Rather than bashing W&B (it would have been the same as mlflow, for example), you would rather ask users to evaluate the goals of your project and spend most of your time pursuing that goal as the most focus.
**Footnote: In my eyes it is not sufficient to guarantee that I will use such a shared dashboard. You need to gain more insight from these sharing tools than you spend time setting up.
