Booleans are Baaaaaaaaaad

First off, did you pronounce the title of this article like a sheep? That was definitely the intent. Anyway, onward to the purpose of this here text.

One of the things I have learned the hard way is that booleans are bad. Just to be clear, I do not mean that true/false is bad, but rather that using true/false for state is bad. Rather than rant, lets look at a concrete example.

An Example

The first example that comes to mind is the ever present user model. On signup, most apps force you to confirm your email address.

To do this there might be a temptation to add a boolean, lets say “active”. Active defaults to false and upon confirmation of the email is changed to true. This means your app needs to make sure you are always dealing with active users. Cool. Problem solved.

It might look something like this:

class User
  include MongoMapper::Document

  scope :active, where(:active => true)

  key :active, Boolean
end

To prevent inactive users from using the app, you add a before filter that checks if the current_user is inactive. If they are, you redirect them to a page asking them to confirm there email or resend the email confirmation. Life is grand!

The Requirements Change

Then, out of nowhere comes an abusive user, let’s name him John. John is a real jerk. He starts harassing your other users by leaving mean comments about their moms.

In order to combat John, you add another boolean, lets say “abusive”, which defaults to false. You then add code to allow marking a user as abusive. Doing so sets “abusive” to true. You then add code that disallows users who have abusive set to true from adding comments.

The Problem

You now have split state. Should an abusive user really be active? Then a new idea pops into your head. When a user is marked as abusive, lets also set active to false, so they just can’t use the system. Oh, and when a user is marked as active, let’s make sure that abusive is set to false. Problem solved? Right? RIGHT? Wrong.

You are now maintaining one state with two switches. As requirements change, you end up with more and more situations like this and weird edge cases start to sneak in.

The Solution

How can we improve the situation? Two words: state machine. State machines are awesome. Lets rework our user model to use the state_machine gem.

class User
  include MongoMapper::Document

  key :state, String

  state_machine :state, :initial => :inactive do
    state :inactive
    state :active
    state :abusive

    event :activate do
      transition all => :active
    end

    event :mark_abusive do
      transition all => :abusive
    end
  end
end

With just the code above, we can now do all of this:

user = User.create
user.active? # false because initial is set to inactive
user.activate!
user.active? # true because we activated
user.mark_abusive!
user.active? # false
user.inactive? # false
user.abusive? # true

User.with_state(:active) # scope to return active
User.with_state(:inactive) # another scope
User.with_state(:abusive) # driving the example home

Pretty cool, eh? You get a lot of bang for the buck. I am just showing the beginning of what you can do, head on over to the readme to see more. You can add guards and all kinds of neat things. Problem solved. Right? RIGHT? Wrong.

Requirements Change Again

Uh oh! Requirements just changed again. Mr. CEO decided that instead of calling people abusive, we want to refer to them as “spammy”.

The app has been wildly successful and you now have millions of users. You have two options:

  1. Leave the code as it is and just change the language in the views. This sucks because then you are constantly translating between the two.
  2. Put up the maintenance page and accept downtime, since you have to push out new code and migrate the data. This sucks, because your app is down, simply because you did not think ahead.

A Better State Machine

Good news. With just a few tweaks, you could have built in the flexibility to handle changing your code without needing to change your data. The state machine gem supports changing the value that is stored in the database.

Instead of hardcoding strings in your database, use integers. Integers allow you to change terminology willy nilly in your app and only change app code. Let’s take a look at how it could work:

class User
  include MongoMapper::Document

  States = {
    :inactive => 0,
    :active => 1,
    :abusive => 2,
  }

  key :state, Integer

  state_machine :state, :initial => :inactive do
    # create states based on our States constant
    States.each do |name, value|
      state name, :value => value
    end

    event :activate do
      transition all => :active
    end

    event :mark_abusive do
      transition all => :abusive
    end
  end
end

With just that slight change, we now are storing state as an integer in our database. This means changing from “abusive” to “spammy” is just a code change like this:

class User
  include MongoMapper::Document

  States = {
    :inactive => 0,
    :active => 1,
    :spammy => 2,
  }

  key :state, Integer

  state_machine :state, :initial => :inactive do
    States.each do |name, value|
      state name, :value => value
    end

    event :activate do
      transition all => :active
    end

    event :mark_spammy do
      transition all => :spammy
    end
  end
end

Update the language in the views, deploy your changes and you are good to go. No downtime. No data migration. Copious amounts of flexibility for little to no more work.

Next time you reach for a boolean in your database, think again. Please! Whip out the state machine gem and wow your friends with your wisdom and foresight.

47 Comments

  1. I don’t see how this proves that Booleans are bad. What I see is an article on how to refactor your code as requirements change. Booleans are perfect for keeping track of two states. Any more and yes a Boolean is not the right type.

  2. @Jason: Maybe my point wasn’t clear. I cannot think of a time when something remained just two states. Over time things always change. If you can build your app to support that change up front with no extra work, why not?

  3. State machines are a good solution to a very different problem. Using a state machine gem and all that machinery when a 1-bit boolean does the job is over-engineering to the max.

  4. Lonny Eachus Lonny Eachus

    Oct 10, 2012

    I tend to agree with Jason. Certainly, the improper use of Booleans is bad. But the use-case seems rather contrived.

  5. It seems like you all are reacting to the title and not reading the post. Booleans are not always bad. That said, I have not come across many instances personally where they were a better choice than something more flexible, yet just as simple.

    The state machine gem or something like it is not adding a bunch of machinery and is hardly over-engineering.

  6. So I can see this as a useful example of state machine. However, after using it for the better part of 6 months I have come loathe state machines. It is a very love hate relationship with me.

  7. @John Nunemaker: I definitely agree that state machines are good tools. I love state machines. I’m just commenting that the title is a bit over the top and doesn’t really match the content of the post (and obviously you yourself don’t agree with the statement in the title).

    Otherwise it’s a good post to help people understand the benefit of using state machines for multi-state objects!

  8. I don’t see how the title is grossly misleading either. Over and over I see people use booleans where they should be using something that allows for multiple states. In my experience, booleans are mostly bad because they are misused. Maybe you all are angry because I made you baaaa like a sheep.

  9. What’s the benefit here of using a state machine vs. a string field ‘state’ with values ‘inactive’,‘active’,‘abusive’? Sure, you might have to wire it up a bit more with scopes and methods (assuming you need them), but that’s a much clearer and simpler solution in my opinion.

  10. @Cory Kaufman-Schofield: I have no problem with writing your own code instead of using the state machine gem. I do have a problem with a string state field. Instead, I would use an integer state field. I have actually just written a little bit of code to use integer states, instead of the state machine gem, several times.

  11. When faced with this problem I prefer to ask: what makes a user active? In many sites it’s the act of confirming the email address. In that case, make an “email_confirmed” boolean field. Then if you have a “banned” boolean field you can make an “active?” accessor like this.

    
    def active?
      email_confirmed? && !banned?
    end
    

    The state of a user being active is dynamic and dependent on the events that have happened to that user which is what should be stored in the database. This solution feels much simpler and does not require we add the state_machine dependency to our project with its own DSL. The state_machine gem is quite hefty at over 3k lines of code.

    Tip: I often prefer to store timestamps instead of booleans so I have record of when the event happened (such as banned_at).

  12. Ryan, I agree state machines are overused, but the thing they do that your example doesn’t is store a value in the database that can be queried on. I don’t want to duplicate all the logic for calculating state in my queries; I just want a value that I can query on and trust.

    I do agree that storing datetimes is better than booleans. :)

  13. cpuguy83 cpuguy83

    Oct 10, 2012

    IMO you should only be using true/false (1/0) for deleting records w/o actually deleting it.

    In most cases there are much better ways to handle any other situation (a state_id, for example).

  14. @Mike, right the duplication of the database query is a vlid point. If the complexity of the logic to reach the state is great enough that you don’t want to duplicate it in a query, I think that is a good case for a state machine.

    Another good case is if you find yourself adding a dozen boolean fields to represent many different situations. Storing this all in one field is much cleaner.

    When faced with the example in this article though, I prefer boolean/timestamp fields hands down.

  15. So nobody here has ever heard of foreign keys and look up tables? The state machine example is basically a version of using an Enum in other languages. Having a look up table (id, value) would solve the same problem, and it will not require extra dependencies in your project. You could also just use constant values to solve this. I do agree it is a bit over-engineered for this solution

  16. Ryan, yep, we agree. The example in this article is a poor one for justifying a full state machine. When I have to start coordinating 3 or more boolean/datetime values to calculate state is when I reach for a state machine. Looking forward to your episode on them when it comes out.

  17. I don’t find this post a very persuasive sell for state machines (although I realize the example was designed to be short and easily digestible).

    There are times when I look at a code base and think “oh good, we’re using XYZ technique, that’ll go smoothly.” I have literally never had a thought like that for a state machine. I’ve written graphics code, music code, web apps, one very messy desktop app, and lots of random shit, and I’ve never written my own state machine or used any state machine library of my own free will. I’ve read many of the classics of programming, although definitely not all, and I really couldn’t even tell you what the point of a state machine is. I didn’t even realize it until now, but I categorized state machines under “things I will never need to give a shit about” a very long time ago, and since then I have built an entire career in programming, spanning decades, without once re-assessing that conclusion before today.

    this could mean that state machines do indeed belong in the “who gives a shit” category, but it could also mean that it’s easy to build a career in programming without knowing every single one of the classic concepts. because this second possible explanation is an absolute fact, I’ve always given state machines the benefit of the doubt, but honestly, after 20 or 30 years writing code without once caring about state machines, that’s a whole lot of extending the benefit of the doubt.

    I actually very strongly suspect that state machines are a primitive form of flow control, like goto, and secretly evil. but I want to make it clear that this is a hunch, not a firm conclusion, because I am totally acknowledging my ignorance here. the chances of my actually deciding to rectify the ignorance are so radically outnumbered by the chances of my watching a kung-fu movie instead that I simply can’t back up any more strident position on the matter.

  18. Giles, I totally understand your point. Very often you can ignore state machines without missing too much. But you’ve also probably already used them without even knowing!

    One good example is regular expressions. In most implementations, RegExes are compiled into a FSM (finite state machine), which is then fed the text. Now don’t tell me you’ve never used a RegExp :-)

    So clearly, state machines are a very important component in Computer Science. But maybe not one you’ll have to deal directly with very often.

  19. OK, fair enough. I use and love regexes. in fact I frequently get annoyed when people complain about regexes being difficult (because all you have to do is read the Jeff Friedl book).

  20. It absolutely is over-engineering when you replace a 1-bit flag with a gem that needs to be source-controlled, included and tested on updates. That’s the definition of over-engineering.

    There are a few issues with this post.

    1) the title is just wrong. I know it’s tongue-in-cheek, but it’s the first thing people read and it’s what they come away with. It’s the first impression you make. If you show up at a formal wedding wearing Chucks, even if you explain the Chucks were an in-joke, people will still remember you as the dick who showed up wearing Chucks. And there’s no universe in which “Booleans are bad” isn’t prima facie wrong.

    2) as others have pointed out, your example isn’t a super strong argument for state machines v. other devices.

    3) the point you’re highlighting is a very good one, and a very general one at that: requirements change, and sometimes a simple/obvious solution gets ugly and needs to be redone. But the example you’re giving is steeped into a very specific ecosystem, namely the Rails Weltanschauung (“To prevent inactive users from using the app, you add a before filter that checks if the current_user is inactive.” is very Railsish). It makes sense to use a specific technology stack to illustrate a general point, but in this case it feels like you’re boxing the general problem (booleans and state—remember that’s the title of the article, so we’re all expecting insights into the problems of booleans in general representing state in general) into a very restricted set of technologies and idioms (Rails, web apps and state machines).

    So what do I take away from this?

    1) you raise a good point (booleans and state) but introduce the answer poorly (blanket statement that booleans are bad). The title is polarizing and people miss the good point you’re trying to make because their back is up at that stupid title.

    2) the solution you present is fine for many cases, but it’s very specific to an ecosystem. Mismatched expectation: we thought we were going to get interesting insights into booleans and state, and instead we get a bunch of code that’s mostly useless outside of Rails web apps. There’s a lot more to software than Rails apps. So because of this disappointment, I come home feeling cheated rather than enlightened.

    So basically you’ve missed two good opportunities to enlighten not because your point is bad (it’s good) or your solution is bad (it’s not), but because everything was presented such ham-fistedly.

    Your friendly neighborhood editor.

  21. I tend to agree with @RyanBates. Tracking the events is generally a more useful activity. It does not introduce too much more complexity, and it can be leveraged as much or as little as you want going forward. A timestamp or a datetime field uses more space than a boolean or bit field, but that’s not a huge concern for most applications (in Postgres the max size of a time field is 12 bytes vs. 1 byte for a boolean).

  22. Personally I thought the title and article were very good! The title has punch – it got you to read the article and in a short example, John showed you how to use state machines to solve a problem.

    Was it a perfect problem, perfectly solved? Probably not. But it was good enough to get the point across. I think John illustrated the problem well; the only other thing he could have done to make it more obvious what he was doing would be to mention that the man who has two watches is never sure of the time. While he didn’t need to do that, the problem he posed was the perfect illustration of that idea. And from there, he solved the problem.

    I liked it. :-) (Thanks for sharing that lesson, John.)

  23. Good article, just awfully named…
    This article isn’t REAAAALLLY about booleans at all..

  24. I think the author’s assessment is fair. If anything, he left out tons of other reasons booleans can be bad, such as when passed as function arguments. Or the way they often cause bugs with loosely typed languages that convert them to integers.

    I was hoping to see a good example of how state machines would make less need for code like: if (foo && bar && baz), but even with regular booleans you can create methods that encapsulate this: if (isFooBarBaz())

  25. I’ve worked on several apps where the state machine is added far too late in the game. It’s not so bad when there are only a few attributes on a model to figure out what the state truly is but it can get extremely cumbersome later.

    I’m a bit torn on mapping integers to represent the state in that I think that’s really going to be app dependent. For apps that are never going to have a ton of data I’m all for string states — they really simplify the code.

    Once traffic or data starts getting large enough to the point that this kind of optimization is needed, I’m all for it. Before that, the increased clarity of a string is enough for me.

    For book keeping purposes, another thing to check out is the state_machine-audit_trail gem. This will keep track of the changes to your model’s state and when they happen. In the example that you use for an abusive user, this can be beneficial if they ever want to dispute why they are banned or when it happened. Not all apps need that, though.

  26. In my experience, state machines in Rails are almost always a DRY violation. We had to rip out several state machines in our app where data was out of sync with state. Except in the case where you need to explicitly query on state (which even then can often be done with scopes), state can be inferred and named in a method, like @Ryan Bates suggested. Adding a heavy (or even lightweight) gem to abstract this is undoubtedly over-engineering.

  27. Thanks for the article. I appreciated the state-machine refresher. It’s nice to step back and think about what the code actually needs to do, rather than get wrapped up thinking about what database columns to set to what values. I think a state machine could serve that purpose well.

    I really appreciated Ryan’s idea about the :active? message. That seems like it would provide a lot of flexibility, whatever the implementation. I also really liked his practical tip about timestamp fields (hear, hear), and Mike’s pragmatic point about simpler database querying.

  28. I really like the approach do to use integers in state.

  29. Ryan said it before I could, plus:

    • Ryan’s solution captures the business rule in one easy to read line. The state machine does not.
    • I would have first fixed this app by renaming “active” which is misnamed, to “confirmed”. I believe that this small change leads to elegant solutions without needing the state machine.
    • The state machine only works for mutually exclusive states. Here, that’s not the case: whether a users email is confirmed and whether they are abusive are two separate pieces of information to track.
    • Your answer to Ryan, that you choose this implementation in order to have a value in the database sounds like premature optimization.
  30. Kenn Ejima Kenn Ejima

    Oct 11, 2012

    Interesting. I began with the same thought and ended up with a different solution – found that adding boolean was messy, entered state machine, then realized that state machine was inflexible in that it cannot itself handle multiple states at the same time.

    With a state machine, we may be able to reduce combinatorial explosion of boolean fields but that could precisely be a limitation as well.

    Then it occurred to me that “Set” type is useful when we need a number of named states. It’s not true/false, but existence/nonexistence of a state.

    So I created a gem to support Set fields using ActiveRecord::Store. It’s been working great for my use cases.

    https://github.com/kenn/store_field

    MongoDB supports $addToSet so it is able to do the same for free.

    Also there are so many gems that does integer coding, using ActiveModel.

    https://github.com/kenn/enum_accessor

    It’s really interesting to see how solutions for a simple problem could proliferate…

  31. State machines are awesome! Are you maintaining an audit trail by injecting the current user into a global variable? Do you have actions like “suspend” and “activate” in your controllers? Are you writing procedural code in your specs to test a particular state and related transitions? You need a state machine!(http://svs.io/post/30378688232/using-state-machines-to-keep-your-controllers-happy)

    State machines are great because –
    1) One place to put the state logic and transition guards
    2) Super readable specs! (http://svs.io/post/27124492499/a-peek-into-test-driven-design)
    3) One place to put your audit trail logic.

    The readable specs alone are worth the effort. And if you’ve got “data out of sync with the state machine”, you’re doing it wrong. You have to drive all changes through the state machine. I know it feels unnatural – but once you get used to it you never go back to the old way.

    Are these premature optimisations? On any real-life project you will be glad you did it. I always turn to state machines eventually and so now I do tend to start with them.

  32. If I changed every boolean to a state machine I’d lose my job.

    When the requirements change, then do something about it. If a state machine fits then use that. If not have 2 booleans or whatever fits best.

  33. cbmeeks cbmeeks

    Oct 12, 2012

    @Jason I think you hit the nail on the head. What we usually do is have a “status” column. Then we either use enums (Status.Active, Status.Inactive, Status.Pending…) or even a Statuses table and do a join.

  34. Several of you are being ridiculous. The goal of the article is to make you think next time you use a boolean. A state machine is just one example and I suppose I should have shown more. Thankfully several people have in the comments. Enum is another great option. I typically do this in a similar way as the article except I use scam, a project of mine, or toystore to create an in memory “lookup table”.

  35. Kendall (Chien Po) Kendall (Chien Po)

    Oct 12, 2012

    It seems to me that if you’re talking about using a finite state machine (FSM) having “multiple current states” then you’re not talking about a true FSM which, “…is conceived as an abstract machine that can be in one of a finite number of states” (Wikipedia).

    Now, I know that “in practice” you can use or think of a state machine as having “multiple current states” though technically, the set of all unique tuples of legal state values (cartesian product) would be the true set of states. So, if I’ve got two booleans of state data, then I’ve actually got four distinct “states” for the object. The object just happens to divide storage of this single state information into two internal variables (implementation detail).

    That said, if I’ve got some booleans where the sub-states their values represent are orthogonal, I prefer to keep them as distinct data fields instead of combining my state into one “state” variable. I’ll do this even if I am explicitly implementing a state machine (gem or no).

    However, if I’m in a situation more like what the author described where I potentially have multiple booleans who’s sub-state information is actually intertwined and related, then it makes sense to combine said state information into a single non-boolean field.

    Off course, I like others here have often found that fields some might use a boolean for, make better sense as a timestamp (allowing NULL) in which case the above doesn’t apply (since I’m not dealing w/booleans anymore anyway).

    So, it all comes down to the usual “use the right tool for the job.” Yeah, you might have to refactor later and it can sometimes pay to anticipate possible changes in advance and engineer appropriately. But, that’s a slippery slope that often leads to time wasted due to over-engineering.

  36. I agree with John. This post is not about using state machines, it’s about not misusing booleans.

  37. Jamie Hodge Jamie Hodge

    Oct 13, 2012

    Dan Chak’s approach to so-called Domain Data seems to be a better approach, as far as data integrity and general DB-app integration: http://enterpriserails.chak.org/full-text/chapter-7-domain-data

  38. Never really used state machines myself, so thanks for the introduction. The integer version does remind me of Enums as a similar solution to the problem, though I’m not a big fan of storing integers in a db that basically mean nothing without the front end or business logic translating them…..still, coding up a system always involves some compromise so it’s probably swings and roundabouts :)

  39. Nice article.
    Maybe the only thing to refactor is the title ;-)

  40. Sometimes, boolean logic could be bad only if refactoring was not clearly done the right way (or if the developer is lazy). State machines are always neat and professional and CORRECT; although some logic cannot simply be put into state machines.

    I guess for this one, its a good example to show that booleans can be bad if not understood properly.

    PS: Good depiction of quick business logic decisions though ;) that kinda happens in real life almost too often. :)

  41. You could always use Classy Enum to handle it. Much cleaner, IMO. But, I do agree that boolean values do a poor job of giving context and demonstrating intent.

  42. Easy the most entertaining collection of people missing the point I’ve come across in a long time. The irony of so many people reading a post about the misuse of booleans and interpreting said post in a boolean “booleans bad, state machines good!” way is delicious.

    Is it really so difficult to abstract John’s thinking outward past the use of state machines and to your non-boolean representational model of choice?

  43. Rarrikins Rarrikins

    Dec 30, 2012

    Yeah, a boolean is a bad idea there, but a state machine is a lot of extra complexity. As JC said, enums are the way to go, and In Ruby, symbols are enums. Just use those.

    status = :banned

    Something like that. That way, the status won’t be in any inconsistent state like with the two booleans, and you still have quite a lot of simplicity.

    People who are completely active will have :active, people who are innocently inactive will have :inactive, and people who are banned will have :banned.

  44. “lets” should be “let’s”

  45. Booleans are bad here meant to point to the only 2 options a boolean offers (which classifies your user or anything else you use boolean for as being either good/bad, active/inactive, …). Real-life cases require more than those 2 cases.

  46. Levi Strope Levi Strope

    Mar 08, 2013

    Taking the flexibility of integers one step further…

    defining what to do for that state without hard coding the state name. Makes it that much easier to swap out the meaning of 2 and switch it to “douchebaggish”

    
    state STATES.key(2) do
        #do something meaningful for this state
    end
    
  47. What’s not addressed in any of the comments here is that the idea of combining booleans into a single composite state potentially creates the exact opposite problem where you have an explosion of states along the lines of the cartesian product.

    If most of those states are distinct and need to be individually considered, then you’ve won, but if the former booleans were mostly independent and just interact in a few places, then you would have been much better off with the bools.

Sorry, comments are closed for this article to ease the burden of pruning spam.

About

Authored by John Nunemaker (Noo-neh-maker), a programmer who has fallen deeply in love with Ruby. Learn More.

Projects

Flipper
Release your software more often with fewer problems.
Flip your features.