But it works on my machine!
When you work with software developers, especially those with less experience, it’s virtually guaranteed to hear one of them utter a version of the above expression when someone presents them with a situation they think is impossible (or more likely, improbable) to happen in the program they’ve written. Now, the great thing about software is that it’s mostly deterministic – if something strange happens, it’s very likely caused by a bug in the code rather than some unexplained cosmic coincidence. I’ve written about debugging in general before, so I’d like to focus on a set of bugs specifically caused by subsystems.
Programs don’t run in isolation. They depend on many subsystems to function. Obviously, you need a computer and a network connection. Your program also needs to be converted into a form the hardware/operating system combo can run, through the use of a compiler or interpreter.
Next come the libraries. Nowadays, it’s rare for anyone to write code entirely from scratch, thanks to the vast array of freely available libraries that cover tasks from the simplest to the most complex. Just as your program has a version, the libraries it depends on also have their own versions. Fortunately, there are useful programs called package managers that automatically download and install the correct library versions your program needs.
If people use your program to store data, you will need to keep it somewhere. Popular choices are filesystems and databases that can be located on the same machine as your program or on another machine accessible over the network. In the case of relational databases, schemas are used to define the structure of the data stored.
In the interconnected world we live in, your program may need to interface with remote services. These services can either be internal or third-party. Remote service availability and the expected behavior of each service are important considerations since you may not have direct control over them.
Most users access programs via web browsers or mobile devices. Yes, there are still quite a few widely used desktop (or should we call them laptop?) applications, but they are in the minority. Unlike desktop applications, web applications have less control over the environments they run in – different web browsers or even different versions of a particular browser may behave differently. As for mobile devices, they have widely varying capabilities that can be difficult to manage. This is especially true in the relatively fragmented Android world.
Finally, many programs have configuration options that can be used to change their behavior, sometimes quite markedly, and these options may even interact with each other in unexpected ways.
Aside from the subsystems mentioned so far, there are other subsystems specific to certain kinds of programs such as games and embedded applications, but I think it’s safe to say that we’ve covered the most common ones. Statistically speaking, the more the subsystems your program relies on, the greater the chances of something going wrong.
Why does a feature only work on a developer’s machine?
Unless the developer’s machine has supernatural capabilities, the reason is most likely because one or more subsystems on their machine behave differently than the equivalent subsystems on other machines.
First, let’s get the obvious out of the way. Do you have an automated deploy method? Do you use a deploy script, Docker image, or some other automated mechanism to deploy software, which has the added benefit of easily rolling back to a previous deployment when something goes wrong? If not, you will probably waste a lot of time chasing bugs that could easily be prevented by automated deploys in the first place.
Using a package manager is essential. Otherwise, it’s too easy to lose track of which library versions your program is compatible with. When you use the wrong version of a library, your program may not fail right away; it may even appear to be working fine, but things can start to go wrong in subtle ways, which makes it hard to track the root cause of the bugs you will inevitably encounter.
As the primary data storage system used in many applications, the database probably holds the distinction of causing the most “works on my machine” bugs. Data stored in databases are living things. They change over time, sometimes a lot, but many developers tend to use outdated or unrealistic datasets during development and testing.
It greatly helps developers track down hard-to-reproduce bugs if you provide them with recent database backups from the production environment, after anonymizing and encrypting sensitive data, of course. If backups are too large, creating a remote database environment that developers can connect to is a good idea. Using a filesystem like ZFS makes it incredibly easy to spin up a database from a past snapshot. The database is ready in seconds, no matter how large it is.
When remote services work as expected, it’s great; when they don’t, all sorts of strange things can happen. It’s tempting to assume that remote services will respond back quickly, but in the production environment, they can get overloaded and may respond late or not at all. Also, you wouldn’t really want to accidentally DoS yourself by repeatedly making requests to an already slow service, making things even slower. Sensible timeouts and exponential backoff are often used to relieve the load on remote services if they fail to respond in a timely manner. Finally, whenever possible, it’s recommended not to query remote services directly, and use a message queue as a buffer instead.
Testing with Multiple Browsers/Devices
It’s not always possible to make a web app work exactly the same across all major browsers, but the differences should be small enough not to warrant attention. Whenever you develop a new feature, it pays to test with all major browsers including mobile browsers if your application is responsive. As for mobile apps, screen sizes are more or less standard these days, but it’s still easy to fall in the trap of assuming everyone has big screens.
Not everyone can keep in mind what each configuration option does especially when you have hundreds of them. So, it’s important to have sensible defaults, or really strange things can happen like the inability to send e-mail farther than 500 miles, as one university system administrator found out.