Fundamentals of monads

Every (algebraic) data type is created for a very specific purpose. A data type isn’t made because you feel like making one (even if you can). Those data types which exist for no reason, in general, will have difficulty being generalized (into monads, as we will see later). However, let’s assume we’re all keen and astute programmers who won’t do that for the heck of it, and that we’re actually trying to accomplish something.

With that, let’s take a whirl around the idea of algebraic data types, and what they really mean. As mentioned, when a data type is created, it’s because it is trying to represent a particular concept or property.

Assumptions: You understand the Haskell type system (typeclasses, algebraic data types, type declarations, etc), and Haskell syntax in general.

For instance, suppose we want to represent failure. Haskell’s Maybe type does that. The very reason we type something as a Maybe, is because we want to represent failure on that something. As an example, if a function f decides whether or not an integer is a perfect square, the intuitive result that the function should return is a boolean – true or false. Either something is a perfect square, or is not. This should be simple.

Hence, a C function may look something like this:

bool isPerfectSquare (int i) {
  if (code_to_check_for_perfect_square)
    return `true`;
    return `false`;

However, what if we pass that function a negative number? By definition, over integers, negative numbers cannot be perfect squares. Hence, we would want to say that the function failed.

In most imperative languages (C/C++, Java, etc), we would use a failure value to represent failure. Perhaps we would also like to abort the “execution” of that function. However, we cannot use a boolean here anymore, because we essentially need a third result, and there are only two possible values for a boolean. Hence, a common (and ugly) solution in C would be to “upgrade” (essentially a downgrade) the return type to an integer. Hence, a C function may look something like this:

int isPerfectSquare (int i) {
  if (i < 0)
    return -1;

  if (code_to_check_for_perfect_square)
    return 1;
    return 0;

There are similar strategies, but basically they all involve using a “magic” value to represent failure. That may be -1, NULL, 0, or whatever other value pleases the programmer.

However, such strategies place the implicit meaning of the “magic” value on the programmer – the human. The programmer says that -1 means failure, and humanly enforces it. The compiler (or the code) doesn’t know that, and hence cannot enforce it. If another programmer comes along and takes isPerfectSquare() for a ride, and decides to alter it and state that -1 means something else, and -999 now means failure, then any code that relies on isPerfectSquare() is going to break. This is not a good approach.

Haskell solves this by allowing the programmer to define new custom types that gives a context to various things (values). For example, in the isPerfectSquare function, Haskell lets a programmer state, through a type, that the function has failed, or succeeded, and if it succeeded, what the result is. The custom data type here is the Maybe type, and is defined as follows:

data Maybe a = Just a | Nothing

Hence, suppose we have a result of True (or False), we can type the result (a boolean) with the Maybe type, by saying Just True (or Just False). Note that here, it is assumed that you understand the Haskell type system, at least where type constructors, value constructors and polymorphic/parameterized types are concerned. And if the function failed, we can just return a Nothing.

Giving a Context

The idea here is not Haskell’s type system, but the idea of typing a value. Alternatively, you can see it as encapsulating, or wrapping, a value. Regardless of how you see it, the key thing is that we are now placing a value (the boolean, in this example), into a context. The context is failure. Hence, in the example above, we just placed the resultant True or False into a context. There is always a very specific reason for creating a particular custom type (Maybe, in this case), and for Maybe, the reason is to represent failure.

If there is another data type, then it must have a reason to have been created.

For instance, if you created a data type called Person:

data Person = Person String String String deriving (Show)

Then the reason for the Person data type is that you, the programmer, has decided that a Person has three string fields (such as first name, last name, middle name).

If you create a data type called Tree, as follows:

data Tree = Empty | Tree a (Tree a) (Tree a) deriving (Show)

then a possible reason is that you are representing indecision, whereby the tree illustrates all possible decisions – all outcomes for a given input.

Similarly, for a list:

data List = Empty | List a (List a) deriving (Show)

For all these data types, whenever we type a value as that data type, we are giving that value a context. And that context has a meaning in which you can articulate. For instance, a Tree 4 has type Tree Int, and what it means is that you have a decision with input 4. Since the tree has no branches, there are no decisions and outcomes yet. If you have a Tree 4 (Tree 5) (Tree 6), it represents a decision with input 4, that can yield a 5 or a 6 (if you “traverse” the tree). Similarly, if you have a list like List 5 (List 6 (List 7)), you may be representing non-determinism, whereby a value can take either 5, 6 or 7.

Note: it is important to note that data types have a reason for their existence. If not, it becomes difficult to see how they can be generalized into a monad.

Abstracting Away (Each and Every Different) Context

Now, what happens with our isPerfectSquare function? Let’s write the proper Haskell version of isPerfectSquare, complete with the Maybe type to represent failure. We also modify it a bit, because we don’t just want a boolean, we want the value of the function succeeds. In other words, if it’s not a perfect square, we return Nothing, if it is, we return the number itself. Let’s also add another similar function that decides if something is a big number a not. We decide that numbers larger than 1024 are “big”:

isPerfectSquare :: Integer -> `Maybe` Integer
isPerfectSquare n =
    let sq = floor (sqrt (fromIntegral n :: Double)) in
    case (sq * sq == n) of
      `True`  -> Just n
      `False` -> Nothing

isBigNumber :: Integer -> `Maybe` Integer
isBigNumber n
    | n < 1024   = Nothing
    | otherwise  = Just n

Great, nice and simple. Now what if we want to treat those two functions as primitives, and develop a isBigAndPerfectSquare function, which tests if an integer is larger than 1024, and is a perfect square? One way to do that would be to check if an integer is a perfect square (isPerfectSquare), and if it’s bigger than 1024 (isBigNumber). Not the most efficient, but just for illustration purposes. Hence we can write this:

isBigAndPerfectSquare :: Integer -> `Maybe` Integer
isBigAndPerfectSquare n =
    case (`isPerfectSquare` n) of
      Nothing  -> Nothing
      Just x   -> case (isBigNumber n) of
                    Nothing  -> Nothing
                    Just y   -> Just y

Again, there are a ton of better ways to write it. It’s for illustration. Let’s go further.

isMagicNumber :: Integer -> `Maybe` Integer
isMagicNumber n
    | n == 262144  = Just n
    | otherwise    = Nothing

isBigAndPerfectSquareAndMagic :: Integer -> `Maybe` Integer
isBigAndPerfectSquareAndMagic n =
    case (`isPerfectSquare` n) of
      Nothing  -> Nothing
      Just x   -> case (isBigNumber n) of
                    Nothing  -> Nothing
                    Just y   -> case (isMagicNumber n) of
                                  Nothing  -> Nothing
                                  Just z   -> Just z

By now something is becoming very apparent. The reason for the custom data type is worming its way into your code. Everytime you have a “primitive” function (such is isPerfectSquare) which returns a result (or computes to a value) of a data type that has a context (i.e. Maybe), you have to check that context (is it a Just x, or a Nothing?) should you wish to use the result.

To state that again, when a result has a context (i.e. Maybe, to represent failure), using that result means unwrapping the context to (a) extract the value out so that you can actually act on the value, and (b) checking the entire context to see if you need to handle special cases. The special case is failure, the value is the x in Just x. We performed the value extraction by pattern matching against Just False and Just True, and we checked the special case by pattern matching against Nothing. Once again, in this case it’s simple, because Maybe is quite simple. If you have a more complex data type, you have to check and extract more stuff. Very quickly, this becomes a pain. Also notice one more thing: The point at which things became a pain was when two “primitives” were sequenced together. If we have a function that just calls isPerfectSquare once, it would be fine. The function could declare a return type of Maybe Bool, and call isPerfectSquare, and return what isPerfectSquare returns, which is a Maybe Bool. All’s fine and dandy. Things got annoying the moment we sequenced calls to isPerfectSquare. In other words, when we want to act on the value of isPerfectSquare, and maybe call isPerfectSquare again, or even another function that will take the value we extracted and act on it, that all that special case and extraction stuff occurred. Therein lies a key point: when we sequence these calls, things get messy.

** Key idea: **

Hence, whenever we have a context that we build up or assign, because we want to represent something, it means we have to deal with the context. Since every context is different (represents different things), the manner in which to:

  • Handle special cases
  • Extract the value

is going to be different.

There are two ways we can handle (a) and (b). We can make the programmer (who uses the type) responsible for value extraction and special case handling. Or, we can make the type creator responsible for it. Of course, they may be the same person, but that’s the fundamentals of abstraction/generalization anyway, right?

What we’ve seen above is making the programmer (who uses the type) responsible for it. He had to write code that checks for the special case (Nothing) and the value extraction (True and False). Not only is that a pain (lots of typing!), but it also distracts away from the real thing the code is doing (check for power of 4 and power of 8). It’s hard to read, annoying to debug, and so on. Basically, it violates the principle of abstraction.

So, how do we make the type creator responsible for all that? Guess what? We abstract the process of sequencing, and define how (a) values can be extracted, and (b) special cases can be handled, when such functions (Integer -> Maybe Bool, in this example) are sequenced. It’s very important to note that the idea of abstraction here acts on sequenced functions.

Hence, we define a special operator, which we call >>=, which represents sequencing. Hence, for the above example, when dealing with Maybe, we define >>= for Maybe as follows:

(>>=) :: `Maybe` a -> (a -> `Maybe` b) -> `Maybe` b
(>>=) Nothing f   = Nothing
(>>=) Just x f    = f x

** Responsibilities of >>= **

The bind operator >>= has a few jobs to do. As alluded to above, it certainly has to define how to extract the value, and handle special cases. However, more generally, this is what is has to do or can do:

  • Handle special cases
  • Extract the value
  • Modify the value (if it wants to)
  • Find a way to meaningfully interact with function f

So, the operator takes a Maybe type value, and a function, and somehow feeds the value to the function. The value is simple: It’s just that – Just 16, Just 4, Nothing. The function f is interesting. It represents the function(s) that we want to use in sequence. A possible function, in our example, is isPerfectSquare. Note that this is not limited to sequencing the same function.

Hence, the idea is this:

isBigAndPerfectSquare :: Integer -> `Maybe` Integer
isBigAndPerfectSquare n =
    Just n >>= `isPerfectSquare` >>= isBigNumber

isBigAndPerfectSquareAndMagic :: Integer -> `Maybe` Integer
isBigAndPerfectSquareAndMagic n =
    Just n >>= `isPerfectSquare` >>= isBigNumber >>= isMagicNumber

What a massive reduction in code and improvement in readability. All that value extraction (Just x) and special cases (Nothing) were all taken care of by the beautiful >>= operator. Sequencing is now easy, and clean.

Notice also that we “fed” Just n into the various functions, and not plain n. That’s because the >>= operator abstracts away the unwrapping (value extraction) process, and it needs something that is wrapped to unwrap. The type signature of >>=’s first parameter confirms this. It’s a Maybe a. Hence, we have to wrap our first n. This is good, if you think about it. n should exist with it’s context, not floating around. The only time it “floats” as a plain value, is when it’s inside a function, such as isPerfectSquare and isBigNumber. At all other times, it is safely wrapped in it’s data type – Maybe. That keeps things clean.

However, having the programmer deal with wrapping things up is all fine and dandy, but would it be nice if there was a generic way to just wrap things up, no matter what data type the programmer is using? Would it be possible to make the type creator responsible for that aspect as well? Turns out there is. And since we’re making the type creator responsible for more and more of these abstraction stuff, maybe all of this behaviour can be made part of a typeclass, and again, indeed it is. That’s your monad typeclass, which we are ready to introduce.

class Monad m where
  return     :: a -> m a
  (>>=)      :: m a -> (a -> m b) -> m b
  (>>)       :: m a -> m b -> m b
  (>>) m m&amp;#039;  =  m >>= _ -> m&amp;#039;
  fail       :: String -> m a

First thing to note: The return types of all the functions defined in the typeclass are monads. We’ve already introduced >>=, and the meaning here is exactly the same (intuitively at least, since we haven’t formally introduced it) as what we’ve been doing above. Basically, it takes a monadic value (our Just x, above), and feeds it into a function that expects the actual value (the inner value, or x, in our example), does something to the value, and spits out yet another monadic value.

Hence, Just x >>= isPerfectSquare feeds Just n into isPerfectSquare, and the definition for >>= results in the extraction of x and calling of isPerfectSquare x (remember when we made the type creator do all that extraction work and special case handling work?). isPerfectSquare x produced either a Nothing or a Just y, which is of our monadic type. The type creator must define >>=. In other words, what it means to extract and handle the edge cases (yes, again! this is important!)

Note that >> is a convenience function, which does exactly what >>= does, but it doesn’t care about the input. It just spits out the second parameter, a monadic value, m'.

fail is a function, which takes an error string and gives a monad, which represents failure, possibly including the given error string. We’ll ignore this because we hardly define it.

Hence, if we wanted to make Maybe a monad (it already is), then we would do the following:

instance Monad `Maybe` where
  return x        = Just x
  Nothing >>= f   = Nothing
  Just x  >>= f   = f x

We should, by now, understand how and why >>= is defined. Return is where we said we wanted to make the type creator responsible for wrapping something into a type (a monad). In the case of Maybe, the simplest way to wrap something (actually the only way, for Maybe) is to put it into a Just. Nothing is not a good idea, because Nothing means failure. If we want to wrap the value 4, we don’t want to generate failure, we want Just 4.

Hence, writing monadically, we should do this:

isBigAndPerfectSquare :: Integer -> `Maybe` Integer
isBigAndPerfectSquare n =
    return n >>= `isPerfectSquare` >>= isBigNumber

isBigAndPerfectSquareAndMagic :: Integer -> `Maybe` Integer
isBigAndPerfectSquareAndMagic n =
    return n >>= `isPerfectSquare` >>= isBigNumber >>= isMagicNumber

Creating Order Of Evaluation

Remember that in a lazily evaluated language like Haskell, there is no guarantee if and when expressions (or computations) are evaluated. For instance, given the following code:

myFunction :: Int -> Bool
myFunction x = if x > 100 then `True` else `False`

We have no guarantee if and when True, or False, will be evaluated. Everything is lazy. Similarly:

myFunction :: Integer -> [Integer]
myFunction x = take 10 [x..]

does not generate the entire list, only the first ten elements. In short, we cannot determine when evaluation occurs.

However, while this has a lot of benefits (performance, conciseness, infinitity, etc), when dealing with the real, external world, many times evaluation order matters, and matters a lot. For instance, it matters when writing to a file, and reading from one. Same with databases (they can be viewed as fancy files). And so on. Hence, we need a way to guarantee that the order of evaluation can be predictable when we need it to be predictable.

How can we do this in a purely functional, lazily evaluated language? By the observation that we can force evaluation order by ensuring that a computations X require a computation Y to compute. Hence, Y must evaluate before X.

If we express X and Y as anonymous functions, we can do something like this:

((y ->  y / ((x -> x + 1) 3)) 12)

This guarantees that the (x+1) function must run first, because the (y/x) function relies on the computed result (the value) of the (x+1) function for it to be evaluated. Hence (x+1) returns 4, given its argument, which is fed to the (y/x) function to becomes (y/4). the (y/4) function takes the parameter 12, and computes the result of 3.

Note how similar this is to assignment in imperative languages! In fact, this is the exact equivalent of the imperative code:

x = 3
x = x + 1
y = 12
y = y / x

Imperative languages determine order of execution simply by order of listing. The compiler must generate code that evaluates those 4 statements in that order (optimizing compilers can rearrange, but the meaning must not change). Since declarative languages cannot have such multi-statement “programs”, and can only consist of functions which consist of expressions which evaluate, we then observe that execution “flow” can be simply expressed by a bunch of interdependent nested functions, exactly as above. In essence, we are forcing the compiler (or the evaluator) to evaluate the (x+1) computation before the (y/x) function, because we created a scenario in which it simply had no choice but to do so, no matter how lazy it wanted to be.

How does this relate to monads? Or rather, how do monads help to create order of evaluation in a nice and simple way? Make the following observation:

This function (exactly the same as above):

isBigAndPerfectSquareAndMagic :: Integer -> `Maybe` Integer
isBigAndPerfectSquareAndMagic n =
    return n >>= `isPerfectSquare` >>= isBigNumber >>= isMagicNumber

is computationally identical to this function:

isBigAndPerfectSquareAndMagic n =
    return n >>= (x ->
        `isPerfectSquare` x >>= (y ->
            isBigNumber y >>= (z ->
                isMagicNumber z)))

The only difference is that now the “fed” and “unpacked” values (from the contextualized monadic value) are given nice names (x, y and z), and the functions are called with those names, instead of being implicit. To be obvious:

return n >>= `isPerfectSquare`
  ==> Just n >>= `isPerfectSquare`   -- by definition of return
  ==> `isPerfectSquare` n            -- by defintion of >>=

which is exactly the same as:

return n >>= (x -> `isPerfectSquare` x)
  ==> Just n >>= (x -> `isPerfectSquare` x)  -- by definition of return
  ==> (x -> `isPerfectSquare` x) n           -- by definition of >>=
  ==> `isPerfectSquare` n                     -- by function application

Hence, when we sequence expressions using monads, via >>=, we are in effect, guaranteeing their order of evaluation! Hence:

return n >>= (x ->
    `isPerfectSquare` x >>= (y ->
        isBigNumber y >>= (z ->
            isMagicNumber z)))

is identical to the imperative code:

x = construct(n)
y = `isPerfectSquare` x
z = isBigNumber y

neglecting the Maybe stuff, and construct means the monad’s return, and return means the imperative style “return” of actually returning a value from a function explicitly.

Since we now have order of evaluation guarantee, we can essentially “write” imperative looking code, in Haskell! Now this is not going to break referential transparency, because at this point, we are just forcing order of evaluation, not dealing with state, or variables. In other words, it would be nice if we could write the above Haskell code something as follows, right?

x = return n
y = `isPerfectSquare` x
z = isBigNumber y
isMagicNumber z

Since the = operator in Haskell already has a meaning, we give it another operator instead. How about:

x <- return n
y <- `isPerfectSquare` x
z <- isBigNumber y
isMagicNumber z

Looks good. Haskell provides exactly that in its do-notation, as in the following real Haskell code:

isBigAndPerfectSquareAndMagic n = do
  x <- return n
  y <- `isPerfectSquare` x
  z <- isBigNumber y
  isMagicNumber z

Wonderful looking syntax. It’s just syntatic sugar, so it really and precisely means the previous looking code with all the lambda expressions (anonymous functions). One thing to note, is that the last expression in the do-notation cannot have an “assignment” type thing. That’s because, if you translate the original code with all the lambda expressions, you realize that the last lambda is the expression to be evaluated and returned. Making an assignment there does not make sense, and is not permitted. If you try, Haskell tells you this:

The last statement in a ‘do’ construct must be an expression.

So what if we want to return something else? Say, for some weird reason, we want isBigAndPerfectSquareAndMagic to always return 1000. Then we do this:

isBigAndPerfectSquareAndMagic n = do
  x <- return n
  y <- `isPerfectSquare` x
  z <- isBigNumber y
  w <- isMagicNumber z
  return 1000

Of course! We use return to pack up 1000 as a monadic value (Just 1000, in this case), and since it’s the last expression, we return it. See the complete abstraction over the monad (Maybe, again, in this case) at work here?

And of course, all this is possible only because Maybe is a monad, which has the wonderful >>= operator implemented by the conscientious! With >>=, we get:

Properties of Monads:

  • Abstraction from context (unpacking, special cases)
  • Guaranteed order of evaluation

That’s the fundamental property of monads (there’s loads more).