Machine learning lessons I learned this month

Machine Learning


) The task in machine learning is the same.

Code, wait for results, interpret results, and go back to coding. Additionally, there will be some interim presentations about your progress to management*. But just because things are pretty much the same doesn't mean there's nothing to learn. Quite the opposite! A few years ago, I started a daily habit of writing down the lessons I learned from my ML work. Still, to this day, every month still leaves me with some small lessons. Here are three lessons I learned this past month.

Connection with people (ML doesn't matter)

As the Christmas season approaches, year-end gatherings begin. These gatherings often consist of informal chats. There isn't that much “work” to complete. This makes sense since this is usually an after-work event. Normally I would skip such events. But they didn't do that during the Christmas season. For the past few weeks, I've been attending after-work get-togethers and just talking, nothing urgent or deep. The interaction was good and I had a lot of fun.

It reminded me that our work projects don't just run on code and computing. They work with others to run on fuel for long periods of time. Here, small moments like jokes, quick stories, and common complaints about unstable GPUs can refuel the engine and make collaboration smoother later when things get tense.

Think about it from a different perspective. Your colleagues will have to live with you for years to come. And you too with them. If this is a “bearing”, no, it's not good. However, if this is “together”, it is certainly a good thing.

So, if you receive an invitation to your company's or research institute's social gathering in your mailbox, attend.

My co-pilot didn't necessarily make me faster.

Last month I started a new project and have been adapting a list of algorithms to new problems.

One day, while I was wasting time mindlessly on the web, I came across an MIT study** that suggested (among other things) a (huge) help from AI. in front Do the work – can significantly reduce recall, reduce engagement, and weaken discernment and result. Admittedly, this study used essay writing for testing purposes, but coding algorithms is a creative endeavor as well.

So I tried something simple. I have completely disabled Copilot in VS Code.

After a few weeks, my (subjective and self-assessed, so very biased) results looked like this: There is no noticeable difference For my core task.

I'm no stranger to training loops, loaders, and creating training structures. In these cases, AI suggestions did not improve speed. Sometimes it even added friction. Please think about it for a moment Modify AI output largely correct.

This finding is a bit of a contrast to a month or two ago, when I was under the impression that Copilot had improved efficiency.

When I thought about the difference between the two moments, I thought the effect was: domain dependent. When you're working in a new area (load scheduling, for example), having support can help you get into the field faster. With my home domain, the benefits are small and can come with hidden drawbacks that can take years to notice.

My current thoughts on AI assistants (I only use them for coding with Copilot): lamp above To an unknown land. Core work, which makes up the bulk of your salary, is optional at best.

Therefore, in the future, I would recommend something else

  • Create the first pass yourself; Use AI only for polishing (naming, small refactorings, testing).
  • Be honest about the claimed benefits of AI. 5 days with AI off, 5 days with AI on. Along the way, track tasks completed, bugs found, time taken to finish, and how well you can remember and explain the code after a day.
  • Switch at your fingertips: Bind hotkeys to enable/disable suggestions. If you're reaching for it every minute, you're probably using it too extensively.

carefully calibrated pragmatism

As ML people, we sometimes overthink the details. An example is which learning rate to use for training. Alternatively, rather than using a fixed learning rate, decay the learning rate in fixed steps. Or whether to use a cosine annealing strategy.

As you know, even for a simple LR, you can quickly come up with many options. Which one should I choose? I recently read through a version of this.

It helped me zoom out in times like this. end user Do you care? In most cases, the issues are latency, accuracy, stability, and often primarily cost. They don't care which LR schedule you choose as long as it doesn't affect these four. This suggests a tedious but useful approach. Choose the simplest, doable option and stick with it.

Most cases are covered by some defaults. Baseline optimizer. Vanilla LR with one decay milestone. Easy-to-understand early termination rules. If the metrics are bad, escalate to flashier options. If there are no problems, proceed to the next step. But don't throw everything at the problem at once.


*Even at DeepMind, perhaps (at least previously) the most successful pure research institute, researchers seem to be managed to their satisfaction.

** Research available on arXiv: https://arxiv.org/abs/2506.08872



Source link