A Bug Finding Journey

I recently spent a day tracking down a fairly weird and hard-to-find bug. It required very little advanced knowledge, just the ability to read code and some tenacity. The target audience for this post is newer coders who are intimidated by complex repos or bugs. I want to show that you don’t need special skills or lots of experience to do this sort of thing. Start by searching for key words in the repo, follow the code backwards, and take lots of notes. If you don’t have a key word, look for a file named like it does what you want and start checking method names.

Background

When I build my own websites I use Sinatra, a minimalist web framework. It’s nice for small apps, because there’s no boilerplate and you can even put all your routes in one file. Run that file with Ruby, and your website is up and running. There’s no rails-style magic or monkey-patching, which makes it much easier to understand what’s going on.

You don’t have to put everything in a single file. I created my side business Title Reader in Sinatra, and since it’s more complex I structured it as a more traditional website. It also depends on some features that aren’t included in Sinatra (specifically route prefix mapping). That part of the code goes in the rack config file config.ru and I run the app with the rackup command, a program provided by the Rack team for running Rack apps. It accomplishes the same thing, just from a different perspective.

Rack is the standard interface for web frameworks and servers in Ruby. Frameworks are what you write your website in and servers are what you run when you start the site. Having a defined interface allows you to mix and match servers and frameworks. It means you can serve both Rails and Sinatra apps with Puma (the server I use) without having to change Puma, and it means you can switch servers as needed without rewriting any application code.

Finally, I use Stripe to handle purchases. The way Stripe is structured, you have keys to represent different purchase options. While building the app, I used test keys that show up in Stripe but don’t charge anything. For the real app, of course, I have live keys.

The Bug

After I added the Stripe code and checked the production site, I found out that it was still using the test Stripe keys instead of the production ones. Which is bad. I want people to be able to give me money. The easy solution would just be to set the keys to always be the live ones, but that’s not ideal - the Stripe keys are a symptom, not the root cause.

It’s standard in the programming world to differentiate between the different places your code runs in, called environments. At a minimum these are development and production, although many people also use testing and staging. These Stripe keys are a great example of why - many times, you’ll have fake values for testing and development and live values that should be used on production only.

As I mentioned above, I’m running my site with the rackup command. rackup comes with an environment flag, -E, and I was using that to set the environment and to decide which keys to use. So the real bug isn’t “the wrong keys are being used,” it’s “the wrong environment is being set.”

RACK_ENV, RAILS_ENV, APP_ENV, Oh My

When debugging, the very beginning is the very best place to start. What does that -E flag actually do? Turns out, it’s the equivalent of setting the RACK_ENV environment variable.

A bit of background here, if you aren’t familiar with Linux programs. In addition to any command-line arguments you pass in, all programs have access to a list of environment variables that define the environment they run in. This includes the current directory, the PATH, the user running the program, and so on. You can also set your own, and by convention they are named in ALL_CAPS. This is different from the development/production environment in the previous section.

Anyway. rackup is setting RACK_ENV. RACK_ENV is not the same thing as your development/production environment either. It’s a Rack-internal variable, and production isn’t even a valid value. This is a common mistake, and in fact after a bit more poking it seems like the Rack team is working towards deprecating it entirely. So what should I be using instead?

One of the reasons so many people get confused by RACK_ENV is that there is a very similarly-named RAILS_ENV that does what we want for Rails apps. If Sinatra were to follow that pattern, we’d use SINATRA_ENV. But there’s no reason to duplicate names like that, so the Sinatra team came up with APP_ENV and that’s now become the standard for Ruby frameworks. I set APP_ENV appropriately for my production system, removed the -E flag, pushed, and went to check again.

If I Fixed It, Why Isn’t It Working?

After setting APP_ENV, the app started using the live Stripe keys. But while testing, I noticed something weird. Conveniently, Puma will tell you what environment it’s running under when you start it. And, when I checked it after pushing the APP_ENV fix, it was still running in development mode. What’s going on?

It’s hard (and just generally a bad idea) to do your debugging on prod, so I tried to reproduce it on my local machine. This wasn’t difficult - I just had to set the APP_ENV env variable before my rackup call. As expected, Sinatra ran in production mode and Puma ran in development mode.

Maybe Sinatra or Rack is deleting or resetting the environment somewhere? It would be weird, but stranger things have happened. There’s nothing mentioned about that in the docs anywhere, but maybe there’s a default value being set that I’m missing. The most obvious place to look is where Sinatra is deciding to run Puma. It’s not explicitly specified in my code anywhere, it’s happening automatically. Time to read the code!

Spelunking

Reading code is an underrated and important skill for a developer. Docs are essential but frequently incomplete or out of date, and there’s no replacement for being able to see what’s actually happening. The trick is that you don’t have to understand the whole codebase to find what you want, you just have to know where to look.

In Sinatra, we don’t really have any key words to search for, so the obvious first place to look is main.rb. It’s a pretty small file, and the only line that really does anything calls Application.run!. Following the code back, that class is defined in base.rb which is…not a small file. Fortunately, since we know we’re looking for the run! method, we can jump to it directly. It creates a bunch of variables and then calls start_server. The interesting line for us is the one that creates the handler variable. It’s using a server variable that isn’t defined in the method, so where is it coming from? Searching the file for “server” returns 38 results. Most of them are false positives, but it’s not difficult to find the one that we want. What it shows is that server is a list of server types, and Puma is hardcoded to be part of it.

There’s nothing in Sinatra that would clear APP_ENV, but there is that call to Rack::Handler.pick to choose from the list of servers. If we look at the code there,it takes a list of servers and chooses the first one it can successfully load. There’s nothing there either that would clear or override APP_ENV, but at least now we know how it chooses to run Puma.

Minimal Working Example

So reading the code turnd out to be a dead end. Something is broken, but I can’t find what it is. And when I’m at a dead end, I like to start narrowing down the possibilities. This is also known as creating a minimal working example - just enough code to show the bug, and no more.

If I run a basic Puma server with APP_ENV=production puma, it runs in production mode. I already know that Sinatra respects APP_ENV on its own, and now I know that Puma does as well. So it’s something about the combination of Sinatra and Puma that’s causing issues.

That means that for my MWE I need a Gemfile with Sinatra and Puma, and a simple app.rb file for my app. Easy enough to spin up. As I said at the beginning, just gotta run the app file with APP_ENV=production bundle exec ruby app.rb.

H-uh. That’s weird. Puma is running in production mode, exactly as I want. Maybe there’s something in rackup that’s causing the problem? The nice part about having a MWE is that it’s really easy to change. I added a config.ru file, as rackup wants, and ran that. Sure enough, Puma is back in development mode. Bingo!

This still isn’t quite a MWE, though. Rack is an interface between frameworks and servers, but you can write endpoints in Rack directly. That means we don’t need Sinatra. And, in fact, when I remove Sinatra, I get the same error!

If it works with Puma, but not with rackup when it’s automatically running Puma, there’s definitely something wrong in rackup. If we search for “environment” in the rackup code, we get to the default_options method in the server.rb file, where it looks like it sets the environment to development by default, and passes that to the handler. With this, I have enough to file a bug report against the rackup repo, so the journey is over.

Conclusion

I hope you enjoyed this tour. Some of the steps here were faster because of my familiarity with web development in Ruby, but I had no prior knowledge of any of the codebases. The key is to start by searching for key words, understanding the method(s) its used in, guessing which is the important one, and following the code back until you’ve learned what you need to. Just gotta stick to it. And take lots of notes. You may also enjoy Julia Evans’ recent article on debugging as well.

Want more? Subscribe via RSS. Want to talk about cleaning up your codebase? Email me! Otherwise, happy coding!
Originally published on September 2, 2022.