May 25, 2023
I thought config files sucked, but I was wrong
I expected an easy ride when downloading deep learning code from Torchvision and Microsoft repositories. I was fresh into my PhD, full of noble programming goals and a slight obsession with software engineering. So, I downloaded the code and started going through it to pick out the bits I needed. Though the code was nice, clear and consistent, one problem was really, really pissing me off: Who the heck thought it was a good idea to pass all settings in a single dictionary called
?
So, let me take a step back here. When I'm using other people's code, I spend a lot of the time tracing the origins of settings or arguments so I can remove the code that I do not need or find the exact numbers (think: learning rate, batch size, number of layers, etc.) used in experiments. This works very nicely in VSCode -- you can click through it. Well, that grand plan breaks down completely when people pass a config dictionary instead of separate variables for each setting. This dictionary could have any number of settings in there, and if you want to know when a particular setting was used, you have to carefully backtrack and read a lot of code. Going back far enough, you find the birthing place of these dictionaries: a folder with YAML files with configurations, a configuration for each network or experiment.
This all got me really annoyed. Why do they use this system, I asked myself angrily. Like a young bull, I was impatient. I just wanted to use the code ASAP; I didn't feel like reading a whole bunch of source code of which I would only use a part. Now, I have to do all this detective work to find what I need and ensure nothing is overwritten anywhere.
After a whole lot of grunting, sighing and angry clicking, I found what I needed and resolved to do it all much, much better. I was going to make nice, named arguments for each setting. Nice and traceable. Yes, well, that worked for the first version of the code. Not long after, I needed to make adjustments. Many adjustments: new functionality, new experiments with different settings. I had to pass a bazillion command line arguments or hardcode bash scripts every time. So, I started making dictionaries with settings for specific scenarios: Isn't this much easier than adding a new argument to every function?
Can you see where this is going? It didn't take much more of this until I caved and came to the natural conclusion that config files are, after all, really convenient and not stupid at all. But did I feel stupid? Yes. Was I relieved that I understood the big secret? Also yes. My point is: it's good to be critical and curious. Ask questions. Don't accept the status quo; That's how we make progress. But sometimes, that's also how we reinvent the wheel.
I think we learn things in different ways. We learn complementary colour palettes look nice because we can see how they harmonise in a painting (check out Van Gogh). We understand the rules of a board game because somebody explained them to us. Sometimes, we can only learn by experiencing for ourselves. Maybe if you're a mule like me, the last category is a bit bigger -- and that's okay!