World Cup Prognostication

June 29, 2010

We represent a team as a list containing its three-letter abbreviation and current elo rating. The list of teams is given in the order of the round-of-sixteen bracket, so repeatedly running through the list two-at-a-time always matches teams against the proper opponents:

(define teams '(("URU" 1890) ("KOR" 1746) ("USA" 1785) ("GHA" 1711) ("NED" 2045) ("SVK" 1654) ("BRA" 2082) ("CHI" 1883) ("ARG" 1966) ("MEX" 1873) ("GER" 1930) ("ENG" 1945) ("PAR" 1771) ("JPN" 1744) ("ESP" 2061) ("POR" 1874)))

The two formulas are shown below. Win-expectation is the percentage of times that t1 defeats t2, and point-change is the number of points by which the winner’s elo rating increases:

(define (win-expectation t1 t2) (/ (+ 1 (expt 10 (/ (- (cadr t2) (cadr t1)) 400)))))

(define (point-change t1 t2) (* 60 (- 1 (win-expectation t1 t2))))

Given the Elo ratings shown above, the United States had a 60.5% expectation of defeating Ghana, just a tiny bit better than three-to-two odds. Ghana earned 36 Elo points for defeating the higher-ranked United States, increasing its Elo rating to 1747 and jumping it over Japan and Korea in the Elo rankings (Japan is yet to play, but Korea lost to Uruguay, so its Elo rating decreased even before Ghana defeated the United States).

Match takes two teams, plays a simulated game, and returns the winning team with its new elo rating:

(define (match t1 t2) (if (< (rand) (win-expectation t1 t2)) (list (car t1) (+ (cadr t1) (point-change t1 t2))) (list (car t2) (+ (cadr t2) (point-change t2 t1)))))

I should have named the arguments to point-change as winner and loser rather than t1 and t2. In my first version of match, I didn’t reverse the parameters in the else clause of the if, because the association of t1 to t1 and t2 to t2 was strong, and my simulation was wrong. Fortunately, it was wrong enough that my intuition of the problem suggested an error, and the bug was easy to find and fix.

Round loops over all the teams in a single round of the tournament and returns a list of the winners, which is half the length of the previous round:

(define (round teams) (let loop ((teams teams) (result '())) (if (null? teams) (reverse result) (loop (cddr teams) (cons (match (car teams) (cadr teams)) result)))))

Tournament simulates rounds until there is only one team left, the winner of the World Cup:

(define (tournament teams) (let loop ((teams teams)) (if (null? (cdr teams)) (caar teams) (loop (round teams)))))

Simulate runs the requested number of tournaments, counts the winners using the standard unix sort-uniq-sort idiom, and reports the results:

(define (simulate n) (let loop ((n n) (result '())) (if (zero? n) (sort (lambda (x y) (> (cdr x) (cdr y))) (uniq-c string=? (sort string<? result))) (loop (- n 1) (cons (tournament teams) result)))))

Here is one result of a million simulated tournaments:

> (simulate 1000000) (("BRA" . 223869) ("ESP" . 220378) ("NED" . 191683) ("ARG" . 79709) ("ENG" . 60550) ("URU" . 54380) ("GER" . 49965) ("POR" . 27458) ("MEX" . 24785) ("CHI" . 23526) ("USA" . 14928) ("PAR" . 9382) ("KOR" . 7416) ("JPN" . 6202) ("GHA" . 4901) ("SVK" . 868))

Though all sixteens teams have at least a chance to win (even lowly Slovakia), Brazil narrowly edges Spain for the most simulated World Cup victories. Each simulation is different; although Brazil wins more often than Spain in the simulation shown above, other simulations based on a different random sequence have Spain win more often than Brazil, as the two teams’ Elo ratings are close, and Spain has a somewhat easier draw (assuming Spain defeats Portugal, they will play the winner of the Paraguay/Japan match, which should be a cakewalk, while Brazil, assuming they defeat Chile, will presumably play third-ranked Netherlands, which is very close). There are also changes for the other teams; notice in the simulation above that Germany and Uruguay have swapped places, based on their Elo ratings, and that Chile has fallen two spots while Portugal and Mexico each moved up. And even if you consider Brazil and Spain the two best teams, as many soccer pundits do, notice that between them they win less than half the simulated tournaments.

We used sort, uniq-c and rand from the Standard Prelude. You can run the program at http://codepad.org/vmpwcj3B.

Posted by programmingpraxis

Filed in Exercises

14 Comments »

14 Responses to “World Cup Prognostication”

Programming Praxis – World Cup Prognostication « Bonsai Code said
June 29, 2010 at 12:29 PM
[…] Praxis – World Cup Prognostication By Remco Niemeijer In today’s Programming Praxis exercise, the goal is to write a simulator for the knockout stage of the World […]

Remco Niemeijer said

June 29, 2010 at 12:33 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2010/06/29/programming-praxis-world-cup-prognostication/ for a version with comments):

{-# LANGUAGE BangPatterns #-} 

import Control.Monad
import qualified Data.List.Key as K
import qualified Data.Map as M
import System.Random

teams :: [(String, Float)]
teams = [ ("URU", 1890), ("KOR", 1746), ("USA", 1785), ("GHA", 1711)
        , ("NED", 2045), ("SVK", 1654), ("BRA", 2082), ("CHI", 1883)
        , ("ARG", 1966), ("MEX", 1873), ("GER", 1930), ("ENG", 1945)
        , ("PAR", 1771), ("JPN", 1744), ("ESP", 2061), ("POR", 1874)]

winChance :: Float -> Float -> Float
winChance eloA eloB = 1 / (1 + 10 ** ((eloB - eloA) / 400))

update :: Float -> Float -> Float
update winner loser = winner + 60 * (1 - winChance winner loser)

match :: Float -> (a, Float) -> (a, Float) -> (a, Float)
match r (a, ea) (b, eb) | r < winChance ea eb = (a, update ea eb)
                        | otherwise           = (b, update eb ea)

simround :: [(a, Float)] -> [Float] -> [(a, Float)]
simround (a:b:xs) (r:rs) = match r a b : simround xs rs
simround _        _      = []

tournament :: [(a, Float)] -> IO a
tournament [(w,_)] = return w
tournament xs      = tournament . simround xs . randoms =<< newStdGen

simulate :: Int -> IO ()
simulate n = print . K.sort (negate . snd) . M.assocs =<<
    foldM (\ !m _ -> fmap (\x -> M.adjust succ x m) $ tournament teams)
          (M.map (const 0) $ M.fromList teams) [1..n]

Remco Niemeijer said
June 29, 2010 at 12:37 PM
Hm, my code doesn’t seem to want to show up. Phil, could you have a look to see if it ended up in the spam queue? The only other explanation I can think of is that the source code highlighter doesn’t like the pragma in my code.
programmingpraxis said
June 29, 2010 at 12:55 PM
Fixed. It did go to the spam queue. I can’t imagine why.
Graham said
June 29, 2010 at 5:08 PM
You wouldn’t happen to have the standard prelude in a file or on the codepad, would you?
It’s an awful lot of stuff, and copy-pasting it all together is a rough task.

Thanks!
programmingpraxis said
June 29, 2010 at 9:09 PM
Graham: Fixed. See the Contents page or the introduction at the top of the Standard Prelude.
Graham said
June 30, 2010 at 2:18 AM
Wonderful. Thanks very much!
programmingpraxis said
July 5, 2010 at 12:15 AM
I re-ran my simulation after the quarter finals. The new Elo rankings put Netherlands and Spain in a tie for first place with 2085 points, Brazil dropped from first to third with 2072 points, Germany in fourth with 2044 points, Argentina in fifth with 1940 points, and Uruguay in sixth with 1895 points; the United States drops to twenty-fifth with 1749 points. Germany is the big mover with an increase from 1930 points to 2044 points following two wins with big goal differentials against higher-ranked teams. The new teams variable reflecting the semi-final bracket and the new Elo ratings is (("URU" 1895) ("NED" 2085) ("GER" 2044) ("ESP" 2085)), and the result of a million simulated tournaments is (("NED" . 378294) ("ESP" . 317881) ("GER" . 231524) ("URU" . 72301)). I have been quite impressed with Germany; they are a young team that is visibly improving with each half they play, and I wouldn’t be surprised to see them win the World Cup or, at least, defeat Spain then lose in the final. However, I am sticking to my prediction that the winner of the Brazil/Netherlands game, who we now know to be Netherlands, will win the tournament; Netherlands are a great side, and the numbers back me up.
Elvis Montero said
July 5, 2010 at 12:57 AM
Phil,

Something I don’t quite understand is how you can infer the winner based on a random number (I noticed Remco does something along the same lines too). Are you basically saying the winner of a match is, in spite of the ELO rankings, the toss of a coin? If that’s not the case, are you using actual world cup results to aid in your computations?

Great problem statement by the way (decidedly apropos). Keep them coming!

slabounty said

July 5, 2010 at 5:56 PM

A simple ruby version commented at http://steamcode.blogspot.com/2010/07/world-cup-simulation.html

class Team
    include Comparable

    attr_reader :symbol, :country, :initial_elo, :new_elo
    attr_accessor :world_cup_wins

    def initialize(symbol, country, initial_elo)
        @symbol = symbol
        @country = country
        @initial_elo = initial_elo
        @new_elo = initial_elo
        @world_cup_wins = 0
    end

    def <=>(team_other)
        @world_cup_wins <=> team_other.world_cup_wins
    end

    def to_s
        "#{@symbol} #{@country} #{@initial_elo} Wins = #{@world_cup_wins}"
    end

    def winning_expectation(team_other)
        1.0 / (1.0 + 10.0 ** ((team_other.new_elo - @new_elo) / 400.0))
    end

    def game(team_other)
        w_e = winning_expectation(team_other)
        if rand <= w_e
            calc_elo(w_e, 1.0)
            team_other.calc_elo(w_e, 0.0)

            return true
        else
            calc_elo(w_e, 0.0)
            team_other.calc_elo(w_e, 1.0)

            return false
        end
    end

    def reset_elo
        @new_elo = initial_elo
    end

    def calc_elo(w_e, win_loss)
        @new_elo = @new_elo + 60.0 * 1.0 * (win_loss - w_e)
    end
end

initial_rankings = [

    [ 7, "URU", "Uruguay",        1890],
    [25, "KOR", "Korea",          1746],
    [15, "USA", "United States",  1785],
    [32, "GHA", "Ghana",          1711],
    [ 3, "NED", "Netherlands",    2045],
    [45, "SVK", "Slovakia",       1654],
    [ 1, "BRA", "Brazil",         2082],
    [ 8, "CHI", "Chile",          1883],
    [ 4, "ARG", "Argentina",      1966],
    [10, "MEX", "Mexico",         1873],
    [ 6, "GER", "Germany",        1930],
    [ 5, "ENG", "England",        1945],
    [19, "PAR", "Paraguay",       1771],
    [26, "JPN", "Japan",          1744],
    [ 2, "ESP", "Spain",          2061],
    [ 9, "POR", "Portugal",       1874],
]

teams = Array.new()
initial_rankings.each do |r|
    teams << Team.new(r[1], r[2], r[3])
end

1.upto(1000000) do | sim |

    current_round = teams
    current_round.each { |t| t.reset_elo }

    while current_round.size > 1 do 
        next_round = Array.new

        i = 0
        while i < current_round.size
            next_round << (current_round[i].game(current_round[i+1]) ? current_round[i] : current_round[i+1])
            i += 2
        end
        
        current_round = next_round
    end

    current_round[0].world_cup_wins += 1
end

teams.sort!
teams.each do | t |
    puts "team: #{t} "
end

Khanh said
July 5, 2010 at 7:27 PM
Mine in F#

Khanh said

July 5, 2010 at 7:31 PM

Another try..


open System
open System.Collections.Generic

let make_team (name : string) (rating : float) =
    fun x -> if (x = 0) then name else string rating
let name_of (team : int -> string) =  team 0
let rating_of (team : int -> string) = float (team 1)

let win_prob (A : int -> string) (B : int -> string) =
    let elo_us   = rating_of A
    let elo_them = rating_of B
    
    1.0 / (1.0 + 10.0 ** ((elo_them - elo_us) / 400.0))


let new_rating (A : int -> string) (B : int -> string) = 
    let win_p = win_prob A B
    fun x -> //A wins 
             if (x = "W") then
                int ((rating_of A) + 60.0 * (1.0 - win_p))
             else
             //A loses
                int ((rating_of A) + 60.0 * (0.0 - win_p))

let rand = new Random()

let play_off (A: int -> string) (B : int -> string) = 
    let win_p = win_prob A B


    let p    = rand.NextDouble()

    //if A wins
    if (p <= win_p) then
        make_team (name_of A) (float (new_rating A B "W"))
    else //if B wins
        make_team (name_of B) (float (new_rating B A "W"))



let print_team (A: int->string) = 
    sprintf "%s - %f" (name_of A) (rating_of A)

let rec sim (teams: (int ->string) list) =     
    //printfn "%A" (List.map print_team teams)
    if (teams.Length = 2) then        
        play_off teams.[0] teams.[1]
    else
        let mid = teams.Length / 2  
        play_off (sim (teams |> Seq.take mid |> Seq.toList))
                 (sim (teams |> Seq.skip mid |> Seq.take mid |> Seq.toList))

let bra = make_team "BRA" 2082.0
let esp = make_team "ESP" 2061.0
let ned = make_team "NED" 2045.0
let arg = make_team "ARG" 1966.0
let eng = make_team "ENG" 1945.0
let ger = make_team "GER" 1930.0
let uru = make_team "URU" 1890.0
let chi = make_team "CHI" 1883.0
let por = make_team "POR" 1874.0
let mex = make_team "MEX" 1873.0
let usa = make_team "USA" 1785.0
let par = make_team "PRA" 1771.0
let kor = make_team "KOR" 1746.0
let jpn = make_team "JPN" 1744.0
let gha = make_team "GHA" 1711.0
let svk = make_team "SVK" 1654.0

let teams = [uru;kor; 
             usa;gha; 
             ned;svk; 
             bra;chi; 
             
             arg;mex; 
             ger;eng; 
             par;jpn; 
             esp;por]

let simulation = 
    let table = new Dictionary<string, int>()
    for i in [1 .. 1000000] do
        let winner = sim teams
        
        if table.ContainsKey(name_of winner) then
            table.[(name_of winner)] <- table.[(name_of winner)] + 1
        else 
            table.Add(name_of winner, 1)
    table

razvan said

July 5, 2010 at 7:53 PM

Well, this is a bit late, but here’s mine, in Python.
Should probably be updated to account for the rankins as of today and the fixtures in the semis. Spain is the predicted winner.

import copy
import random
import operator

def simulate_match(ranks, team1, team2):
    team1_odds = 1.0 / (1 + 10 ** ((ranks[team2] - ranks[team1]) / 400.0))
    winner = team1 if random.random() < team1_odds else team2
    winner_odds = team1_odds if winner == team1 else 1.0 - team1_odds
    loser = team1 if winner == team2 else team2
    ranks[winner] += 60 * (1 - winner_odds)
    ranks[loser] -= 60 * (1 - winner_odds)
    return winner

def simulate_round(ranks, matchlist):
    newlist = []
    i = 0
    while i < len(matchlist) - 1:
            newlist.append(simulate_match(ranks, matchlist[i], matchlist[i+1]))
            i += 2
    return newlist

def simulate_cup(ranks, matchlist):
    i = 0
    while len(matchlist) != 1:
        matchlist = simulate_round(ranks, matchlist)
    return matchlist[0]

def main():
    cup_winners = {}
    for i in range(1000000):
        init_ranks = {"brazil" : 2082, "spain" : 2061, "netherlands" : 2045, "argentina" : 1966, "england" : 1945,
         "germany" : 1930, "uruguay" : 1890, "chile" : 1883, "portugal" : 1874, "mexico" : 1873,
         "united states" : 1875, "paraguay" : 1771, "korea" : 1746, "japan" : 1744, "ghana" : 1711,
         "slovakia" : 1654}
        init_matches = ["uruguay", "korea", "united states", "ghana", "netherlands", "slovakia", "brazil", "chile",
           "argentina", "mexico", "germany", "england", "paraguay", "japan", "spain", "portugal"]
        cup_winner = simulate_cup(init_ranks, init_matches)
        if cup_winner in cup_winners:
            cup_winners[cup_winner] += 1
        else:
            cup_winners[cup_winner] = 1
    sorted_winners = sorted(cup_winners.iteritems(), key=operator.itemgetter(1), reverse=True)
    for (k, v) in sorted_winners:
        print k.title(), v

if __name__ == '__main__':
    main()

And the results:
Spain 219542
Brazil 216734
Netherlands 185961
Argentina 78825
England 59688
Germany 50021
Uruguay 48066
United States 41076
Portugal 27293
Mexico 24100
Chile 22260
Paraguay 9498
Korea 6417
Japan 6012
Ghana 3791
Slovakia 716

jeeve said
July 6, 2010 at 12:04 AM
with online elo ranking in drracket

#lang racket

(require net/url)

(define (elo-ranking country)
(let* ((in (get-pure-port (string->url “http://en.wikipedia.org/wiki/World_Football_Elo_Ratings”)))
(the-line(do ((the-line (read-line in) (read-line in)))
((regexp-match (string-append “>” country “number (car (regexp-match “[0-9]+” value)))))

(define sort #f)
(define merge #f)
(let ()
(define dosort
(lambda (pred? ls n)
(if (= n 1)
(list (car ls))
(let ((i (quotient n 2)))
(domerge pred?
(dosort pred? ls i)
(dosort pred? (list-tail ls i) (- n i)))))))
(define domerge
(lambda (pred? l1 l2)
(cond
((null? l1) l2)
((null? l2) l1)
((pred? (car l2) (car l1))
(cons (car l2) (domerge pred? l1 (cdr l2))))
(else (cons (car l1) (domerge pred? (cdr l1) l2))))))
(set! sort
(lambda (pred? l)
(if (null? l) l (dosort pred? l (length l)))))
(set! merge
(lambda (pred? l1 l2)
(domerge pred? l1 l2))))

(define (uniq-c eql? xs)
(if (null? xs) xs
(let loop ((xs (cdr xs)) (prev (car xs)) (k 1) (result ‘()))
(cond ((null? xs) (reverse (cons (cons prev k) result)))
((eql? (car xs) prev) (loop (cdr xs) prev (+ k 1) result))
(else (loop (cdr xs) (car xs) 1 (cons (cons prev k) result)))))))

(define rand #f)
(define randint #f)
(let ((two31 #x80000000) (a (make-vector 56 -1)) (fptr #f))
(define (mod-diff x y) (modulo (- x y) two31)) ; generic version
; (define (mod-diff x y) (logand (- x y) #x7FFFFFFF)) ; fast version
(define (flip-cycle)
(do ((ii 1 (+ ii 1)) (jj 32 (+ jj 1))) ((< 55 jj))
(vector-set! a ii (mod-diff (vector-ref a ii) (vector-ref a jj))))
(do ((ii 25 (+ ii 1)) (jj 1 (+ jj 1))) ((< 55 ii))
(vector-set! a ii (mod-diff (vector-ref a ii) (vector-ref a jj))))
(set! fptr 54) (vector-ref a 55))
(define (init-rand seed)
(let* ((seed (mod-diff seed 0)) (prev seed) (next 1))
(vector-set! a 55 prev)
(do ((i 21 (modulo (+ i 21) 55))) ((zero? i))
(vector-set! a i next) (set! next (mod-diff prev next))
(set! seed (+ (quotient seed 2) (if (odd? seed) #x40000000 0)))
(set! next (mod-diff next seed)) (set! prev (vector-ref a i)))
(flip-cycle) (flip-cycle) (flip-cycle) (flip-cycle) (flip-cycle)))
(define (next-rand)
(if (negative? (vector-ref a fptr)) (flip-cycle)
(let ((next (vector-ref a fptr))) (set! fptr (- fptr 1)) next)))
(define (unif-rand m)
(let ((t (- two31 (modulo two31 m))))
(let loop ((r (next-rand)))
(if (list a)))
((eq? (car seed) ‘set) (set! fptr (caadr seed))
(set! a (list->vector (cdadr seed))))
(else (/ (init-rand (modulo (numerator
(inexact->exact (car seed))) two31)) two31)))))
(set! randint (lambda args
(cond ((null? (cdr args))
(if (< (car args) two31) (unif-rand (car args))
(floor (* (next-rand) (car args)))))
((< (car args) (cadr args))
(let ((span (- (cadr args) (car args))))
(+ (car args)
(if (< span two31) (unif-rand span)
(floor (* (next-rand) span))))))
(else (let ((span (- (car args) (cadr args))))
(- (car args)
(if (< span two31) (unif-rand span)
(floor (* (next-rand) span))))))))))

(define countries
'("Uruguay"
"Korea Republic"
"United States"
"Ghana"
"Netherlands"
"Slovakia"
"Brazil"
"Chile"
"Argentina"
"Mexico"
"Germany"
"England"
"Paraguay"
"Japan"
"Spain"
"Portugal"))

(define teams
(map (lambda (x) (list x (elo-ranking x))) countries))

(define (win-expectation t1 t2)
(/ (+ 1 (expt 10 (/ (- (cadr t2) (cadr t1)) 400)))))

(define (point-change t1 t2)
(* 60 (- 1 (win-expectation t1 t2))))

(define (match t1 t2)
(if ( (cdr x) (cdr y)))
(uniq-c string=? (sort string<? result)))
(loop (- n 1) (cons (tournament teams) result)))))

(display (simulate 5000))

Programming Praxis