Thursday, October 27, 2016

Electoral vote distributions are Monoids

I'm a political junkie and I spend way too much time following the polling results on FiveThirtyEight's election forecast.

A couple of days ago I was surprised that FiveThirtyEight gave Trump a 13.7% chance of winning, which seemed too high to be consistent with the state-by-state breakdowns. After reading their methodology I learned that this was due to them not assuming that state outcomes were independent. In other words, if one swing state breaks for Trump this might increase the likelihood that other swing states also break for Trump.

However, I still wanted to do the exercise to ask: what would be Hillary's chance of winning if each state's probability of winning was truly independent of one another? Let's write a program to find out!

Raw data

A couple of days ago (2016-10-24) I collected the state-by-state data from FiveThirtyEight's website (by hand) and recorded:

  • the name of the state
  • the chance that Hillary Clinton would win the state
  • the number of electoral college votes for that state

I recorded this data as a list of 3-tuples:

probabilities :: [(String, Double, Int)]
probabilities =
    [ ("Alabama"             , 0.003,  9)
    , ("Alaska"              , 0.300,  3)
    , ("Arizona"             , 0.529, 11)
    , ("Arkansas"            , 0.012,  6)
    , ("California"          , 0.999, 55)
    , ("Colorado"            , 0.889,  9)
    , ("Connecticut"         , 0.977,  7)
    , ("Delaware"            , 0.948,  3)
    , ("District of Columbia", 0.999,  3)
    , ("Florida"             , 0.731, 29)
    , ("Georgia"             , 0.259, 16)
    , ("Hawaii"              , 0.996,  4)
    , ("Idaho"               , 0.026,  4)
    , ("Illinois"            , 0.994, 20)
    , ("Indiana"             , 0.121, 11)
    , ("Iowa"                , 0.491,  6)
    , ("Kansas"              , 0.089,  6)
    , ("Kentucky"            , 0.042,  8)
    , ("Louisiana"           , 0.009,  8)
    , ("Maine"               , 0.852,  2)
    , ("Maine - 1"           , 0.944,  1)
    , ("Maine - 2"           , 0.517,  1)
    , ("Maryland"            , 0.999, 10)
    , ("Massachussetts"      , 0.998, 11)
    , ("Michigan"            , 0.929, 16)
    , ("Minnesota"           , 0.886, 10)
    , ("Mississippi"         , 0.034,  6)
    , ("Missouri"            , 0.168, 10)
    , ("Montana"             , 0.119,  3)
    , ("Nebraska"            , 0.040,  2)
    , ("Nebraska - 1"        , 0.154,  1)
    , ("Nebraska - 2"        , 0.513,  1)
    , ("Nebraska - 3"        , 0.014,  1)
    , ("Nevada"              , 0.703,  6)
    , ("New Hampshire"       , 0.868,  4)
    , ("New Jersey"          , 0.981, 14)
    , ("New Mexico"          , 0.941,  5)
    , ("New York"            , 0.998, 29)
    , ("North Carolina"      , 0.689, 15)
    , ("North Dakota"        , 0.070,  3)
    , ("Oklahoma"            , 0.002,  7)
    , ("Ohio"                , 0.563, 18)
    , ("Oregon"              , 0.957,  7)
    , ("Pennsylvania"        , 0.880, 20)
    , ("Rhode Island"        , 0.974,  4)
    , ("South Carolina"      , 0.086,  9)
    , ("South Dakota"        , 0.117,  3)
    , ("Tennessee"           , 0.025, 11)
    , ("Texas"               , 0.166, 38)
    , ("Utah"                , 0.067,  6)
    , ("Vermont"             , 0.984,  3)
    , ("Virginia"            , 0.943, 13)
    , ("Washington"          , 0.975, 12)
    , ("West Virginia"       , 0.006,  5)
    , ("Wisconsin"           , 0.896, 10)
    , ("Wyoming"             , 0.021,  3)
    ]

Note that some states (like Maine) apportion electoral votes in a weird way:

probabilities :: [(String, Double, Int)]
probabilities =
    ...
    , ("Maine"               , 0.852,  2)
    , ("Maine - 1"           , 0.944,  1)
    , ("Maine - 2"           , 0.517,  1)
    ...
    ]

Maine apportions two of its electoral votes based on a state-wide vote (i.e. "Maine" in the above list) and then two further electoral votes are apportioned based on two districts (i.e. "Maine - 1" and "Maine - 2". FiveThirtyEight computes the probabilities for each subset of electoral votes, so we just record them separately.

Combinatorial explosion

So how might we compute Hillary's chances of winnings assuming the independence of each state's outcome?

One naïve approach would be to loop through all possible electoral outcomes and compute the probability and electoral vote for each outcome. Unfortunately, that's not very efficient since the number of possible outcomes doubles with each additional entry in the list:

>>> 2 ^ length probabilities
72057594037927936

... or approximately 7.2 * 10^16 outcomes. Even if I only spent a single CPU cycle to compute each outcome (which is unrealistic) on a 2.5 GHz processor that would take almost a year to compute them all. The election is only a couple of weeks away so I don't have that kind of time or computing power!

Distributions

Fortunately, we can do much better than that! We can efficiently solve this using a simple "divide-and-conquer" approach where we subdivide the large problem into smaller problems until the solution is trivial.

The central data structure we'll use is a probability distribution which we'll represent as a Vector of Doubles:

import Data.Vector.Unboxed (Vector)

newtype Distribution = Distribution (Vector Double)
    deriving (Show)

This Vector will always have 539 elements, one element per possible final electoral vote count that Hillary might get. Each element is a Double representing the probability of that corresponding electoral vote count. We will maintain an invariant that all the probabilities (i.e. elements of the Vector) must sum to 1.

For example, if the Distribution were:

[1, 0, 0, 0, 0 ... ]

... that would represent a 100% chance of Hillary getting 0 electoral votes and a 0% chance of any other outcome. Similarly, if the Distribution were:

[0, 0.5, 0, 0.5, 0, 0, 0 ... ]

... then that would represent a 50% chance of Hillary getting 1 electoral vote and a 50% chance of Hillary getting 3 electoral votes.

In order to simplify the problem we need to subdivide the problem into smaller problems. For example, if I want to compute the final electoral vote probability distribution for all 50 states perhaps we can break that down into two smaller problems:

  • Split the 50 states into two sub-groups of 25 states each
  • Compute an electoral vote probability distribution for each sub-group of 25 states
  • Combine probability distributions for each sub-group into the final distribution

In order to do that, I need to define a function that combines two smaller distributions into a larger distribution:

import qualified Data.Vector.Unboxed

combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j =
            Data.Vector.Unboxed.unsafeIndex xs j
          * Data.Vector.Unboxed.unsafeIndex ys (i - j)

The combine function takes two input distributions named xs and ys and generates a new distribution named zs. To compute the probability of getting i electoral votes in our composite distribution, we just add up all the different ways we can get i electoral votes from the two sub-distributions.

For example, to compute the probability of getting 4 electoral votes for the entire group, we add up the probabilities for the following 5 outcomes:

  • We get 0 votes from our 1st group and 4 votes from our 2nd group
  • We get 1 votes from our 1st group and 3 votes from our 2nd group
  • We get 2 votes from our 1st group and 2 votes from our 2nd group
  • We get 3 votes from our 1st group and 1 votes from our 2nd group
  • We get 4 votes from our 1st group and 0 votes from our 2nd group

The probabilityOfEachOutcome function computes the probability of each one of these outcomes and then the totalProbability function sums them all up to compute the total probability of getting i electoral votes.

We can also define an empty distribution representing the probability distribution of electoral votes given zero states:

empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
  where
    makeElement 0 = 1
    makeElement _ = 0

This distribution says that given zero states you have a 100% chance of getting zero electoral college votes and 0% chance of any other outcome. This empty distribution will come in handy later on.

Divide and conquer

There's no limit to how many times we can subdivide the problem. In the extreme case we can sub-divide the problem down to individual states (or districts for weird states like Maine and Nebraska):

  • subdivide our problem into 56 sub-groups (one group per state or district)
  • compute the probability distribution for each sub-group, which is trivial
  • combine all the probability distributions to retrieve the final result

In fact, this extreme solution is surprisingly efficient!

All we're missing is a function that converts each entry in our original probabilities list into a Distribution:

toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
    Distribution (Data.Vector.Unboxed.generate 539 makeElement)
  where
    makeElement 0              = 1 - probability
    makeElement i | i == votes = probability
    makeElement _              = 0

This says that if our probability distribution for a single state should have two possible outcomes:

  • Hillary clinton has probability x of winning n votes for this state
  • Hillary clinton has probability 1 - x of winning 0 votes for this state
  • Hillary clinton has 0% probability of any other outcome for this state

Let's test this out on a couple of states:

>>> toDistribution ("Alaska"      , 0.300, 3)
Distribution [0.7,0.0,0.0,0.3,0.0,0.0,...
>>> toDistribution ("North Dakota", 0.070, 3)
Distribution [0.9299999999999999,0.0,0.0,7.0e-2,0.0...

This says that:

  • Alaska has a 30% chance of giving Clinton 3 votes and 70% chance of 0 votes
  • North Dakota has a 7% chance of giving Clinton 3 votes and a 93% chance of 0 votes

We can also verify that combine works correctly by combining the electoral vote distributions of both states. We expect the new distribution to be:

  • 2.1% chance of 6 votes (the probability of winning both states)
  • 65.1% chance of 0 votes (the probability of losing both states)
  • 32.8% chance of 3 votes (the probability of winning just one of the two states)

... and this is in fact what we get:

>>> let alaska      = toDistribution ("Alaska"      , 0.300, 3)
>>> let northDakota = toDistribution ("North Dakota", 0.070, 3)
>>> combine alaska northDakota
Distribution [0.6509999999999999,0.0,0.0,0.32799999999999996,0.0,0.0,2.1e-2,0.0,...

Final result

To compute the total probability of winning, we just transform each element of the list to the corresponding distribution:

distributions :: [Distribution]
distributions = map toDistribution probabilities

... then we reduce the list to a single value repeatedly applying the combine function, falling back on the empty distribution if the entire list is empty:

import qualified Data.List

distribution :: Distribution
distribution = Data.List.foldl' combine empty distributions

... and if we want to get Clinton's chances of winning, we just add up the probabilities for all outcomes greater than or equal to 270 electoral college votes:

chanceOfClintonVictory :: Double
chanceOfClintonVictory =
    Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
  where
    Distribution xs = distribution

main :: IO ()
main = print chanceOfClintonVictory

If we compile and run this program we get the final result:

$ stack --resolver=lts-7.4 build vector
$ stack --resolver=lts-7.4 ghc -- -O2 result.hs
$ ./result
0.9929417642334847

In other words, Clinton has a 99.3% chance of winning if each state's outcome is independent of every other outcome. This is significantly higher than the probability estimated by FiveThirtyEight at that time: 86.3%.

These results differ for the same reason I noted above: FiveThirtyEight assumes that state outcomes are not necessarily independent and that a Trump in one state could correlate with Trump wins in other states. This possibility of correlated victories favors the person who is behind in the race.

As a sanity check, we can also verify that the final probability distribution has probabilities that add up to approximately 1:

>>> let Distribution xs = distribution
>>> Data.Vector.Unboxed.sum xs
0.9999999999999994

Exercise: Expand on this program to plot the probability distribution

Efficiency

Our program is also efficient, running in 30 milliseconds:

$ bench ./result
benchmarking ./result
time                 30.33 ms   (29.42 ms .. 31.16 ms)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 29.43 ms   (29.13 ms .. 29.81 ms)
std dev              710.6 μs   (506.7 μs .. 992.6 μs)

This is a significant improvement over a year's worth of running time.

We could even speed this up further using parallelism. Thanks to our divide and conquer approach we can subdivide this problem among up to 53 CPUs to accelerate the solution. However, after a certain point the overhead of splitting up the work might outweigh the benefits of parallelism.

Monoids

People more familiar with Haskell will recognize that this solution fits cleanly into a standard Haskell interface known as the Monoid type class. In fact, many divide-and-conquer solutions tend to be Monoids of some sort.

The Monoid typeclass is defined as:

class Monoid m where
    mappend :: m -> m -> m

    mempty :: m

-- An infix operator that is a synonym for `mappend`
(<>) :: Monoid m => m -> m -> m
x <> y = mappend x y

... and the Monoid class has three rules that every implementation must obey, which are known as the "Monoid laws".

The first rule is that mappend (or the equivalent (<>) operator) must be associative:

x <> (y <> z) = (x <> y) <> z

The second and third rules are that mempty must be the "identity" of mappend, meaning that mempty does nothing when combined with other values:

mempty <> x = x

x <> mempty = x

A simple example of a Monoid is integers under addition, which we can implement like this:

instance Monoid Integer where
    mappend = (+)
    mempty = 0

... and this implementation satisfies the Monoid laws thanks to the laws of addition:

(x + y) + z = x + (y + z)

0 + x = x

x + 0 = x

However, Distributions are Monoids, too! Our combine and empty definitions both have the correct types to implement the mappend and mempty methods of the Monoid typeclass, respectively:

instance Monoid Distribution where
    mappend = combine

    mempty = empty

Both mappend and mempty for Distributions satisfy the Monoid laws:

  • mappend is associative (Proof omitted)
  • mempty is the identity of mappend

We can prove the identity law using the following rules for how Vectors behave:

-- These rules assume that all vectors involved have 539 elements

-- If you generate a vector by just indexing into another vector, you just get back
-- the other vector
Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs

-- If you index into a vector generated by a function, that's equivalent to calling
-- that function
Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i

Equipped with those rules, we can then prove that mappend xs mempty = xs

mapppend (Distribution xs) mempty

-- mappend = combine
= combine (Distribution xs) mempty

-- Definition of `mempty`
= combine (Distribution xs) (Distribution ys)
  where
    ys = Data.Vector.Unboxed.generate 539 makeElement
      where
        makeElement 0 = 1
        makeElement _ = 0

-- Definition of `combine`
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j =
            Data.Vector.Unboxed.unsafeIndex xs j
          * Data.Vector.Unboxed.unsafeIndex ys (i - j)

    ys = Data.Vector.Unboxed.generate 539 makeElement
      where
        makeElement 0 = 1
        makeElement _ = 0

-- Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j =
            Data.Vector.Unboxed.unsafeIndex xs j
          * makeElement (i - j)

        makeElement 0 = 1
        makeElement _ = 0

-- Case analysis on `j`
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j
            | j == i =
                    Data.Vector.Unboxed.unsafeIndex xs j
                  * 1  -- makeElement (i - j) = makeElement 0 = 1
            | otherwise =
                    Data.Vector.Unboxed.unsafeIndex xs j
                  * 0  -- makeElement (i - j) = 0

-- x * 1 = x
-- y * 0 = 0
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j
            | j == i    = Data.Vector.Unboxed.unsafeIndex xs j
            | otherwise = 0

-- Informally: "Sum of a vector with one non-zero element is just that element"
= Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i = Data.Vector.Unboxed.unsafeIndex xs i

-- Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs
= Distribution xs

Exercise: Prove the associativity law for combine

Conclusion

I hope people find this an interesting example of how you can apply mathematical design principles (in this case: Monoids) in service of simplifying and speeding up programming problems.

If you would like to test this program out yourself the complete program is provided below:

import Data.Vector.Unboxed (Vector)

import qualified Data.List
import qualified Data.Vector.Unboxed

probabilities :: [(String, Double, Int)]
probabilities =
    [ ("Alabama"             , 0.003,  9)
    , ("Alaska"              , 0.300,  3)
    , ("Arizona"             , 0.529, 11)
    , ("Arkansas"            , 0.012,  6)
    , ("California"          , 0.999, 55)
    , ("Colorado"            , 0.889,  9)
    , ("Connecticut"         , 0.977,  7)
    , ("Delaware"            , 0.948,  3)
    , ("District of Columbia", 0.999,  3)
    , ("Florida"             , 0.731, 29)
    , ("Georgia"             , 0.259, 16)
    , ("Hawaii"              , 0.996,  4)
    , ("Idaho"               , 0.026,  4)
    , ("Illinois"            , 0.994, 20)
    , ("Indiana"             , 0.121, 11)
    , ("Iowa"                , 0.491,  6)
    , ("Kansas"              , 0.089,  6)
    , ("Kentucky"            , 0.042,  8)
    , ("Louisiana"           , 0.009,  8)
    , ("Maine"               , 0.852,  2)
    , ("Maine - 1"           , 0.944,  1)
    , ("Maine - 2"           , 0.517,  1)
    , ("Maryland"            , 0.999, 10)
    , ("Massachussetts"      , 0.998, 11)
    , ("Michigan"            , 0.929, 16)
    , ("Minnesota"           , 0.886, 10)
    , ("Mississippi"         , 0.034,  6)
    , ("Missouri"            , 0.168, 10)
    , ("Montana"             , 0.119,  3)
    , ("Nebraska"            , 0.040,  2)
    , ("Nebraska - 1"        , 0.154,  1)
    , ("Nebraska - 2"        , 0.513,  1)
    , ("Nebraska - 3"        , 0.014,  1)
    , ("Nevada"              , 0.703,  6)
    , ("New Hampshire"       , 0.868,  4)
    , ("New Jersey"          , 0.981, 14)
    , ("New Mexico"          , 0.941,  5)
    , ("New York"            , 0.998, 29)
    , ("North Carolina"      , 0.689, 15)
    , ("North Dakota"        , 0.070,  3)
    , ("Oklahoma"            , 0.002,  7)
    , ("Ohio"                , 0.563, 18)
    , ("Oregon"              , 0.957,  7)
    , ("Pennsylvania"        , 0.880, 20)
    , ("Rhode Island"        , 0.974,  4)
    , ("South Carolina"      , 0.086,  9)
    , ("South Dakota"        , 0.117,  3)
    , ("Tennessee"           , 0.025, 11)
    , ("Texas"               , 0.166, 38)
    , ("Utah"                , 0.067,  6)
    , ("Vermont"             , 0.984,  3)
    , ("Virginia"            , 0.943, 13)
    , ("Washington"          , 0.975, 12)
    , ("West Virginia"       , 0.006,  5)
    , ("Wisconsin"           , 0.896, 10)
    , ("Wyoming"             , 0.021,  3)
    ]

newtype Distribution = Distribution { getDistribution :: Vector Double }
    deriving (Show)

combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
  where
    zs = Data.Vector.Unboxed.generate 539 totalProbability

    totalProbability i =
        Data.Vector.Unboxed.sum
            (Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
      where
        probabilityOfEachOutcome j =
            Data.Vector.Unboxed.unsafeIndex xs j
          * Data.Vector.Unboxed.unsafeIndex ys (i - j)

empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
  where
    makeElement 0 = 1
    makeElement _ = 0

instance Monoid Distribution where
    mappend = combine

    mempty = empty

toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
    Distribution (Data.Vector.Unboxed.generate 539 makeElement)
  where
    makeElement 0              = 1 - probability
    makeElement i | i == votes = probability
    makeElement _              = 0

distributions :: [Distribution]
distributions = map toDistribution probabilities

distribution :: Distribution
distribution = mconcat distributions

chanceOfClintonVictory :: Double
chanceOfClintonVictory =
    Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
  where
    Distribution xs = distribution

main :: IO ()
main = print chanceOfClintonVictory

2 comments:

  1. This suggests the type

    "newtype RV a = RV [(a,double)]"

    modelling a random variable of type a, which conveniently also is a Monad thereby allowing conditional probabilities.

    If each State's vote is Vn, represented by "RV Int", the total distribution according to probability theory is V1 + V2 + ... + Vn, which of course is exactly what you derived in your post.

    ReplyDelete