World Cup Prognostication
June 29, 2010
On Saturday morning, inspired by Andrew Moylan’s article on Wolfram’s Blog, I sat down to work out a simulation of the knockout stage of the World Cup competition. I used the bracket shown at right, and found elo ratings of the sixteen teams, as of that morning, at Wikipedia:
1 BRA Brazil 2082
2 ESP Spain 2061
3 NED Netherlands 2045
4 ARG Argentina 1966
5 ENG England 1945
6 GER Germany 1930
7 URU Uruguay 1890
8 CHI Chile 1883
9 POR Portugal 1874
10 MEX Mexico 1873
15 USA United States 1785
19 PAR Paraguay 1771
25 KOR Korea 1746
26 JPN Japan 1744
32 GHA Ghana 1711
45 SVK Slovakia 1654
The table shows that there are forty-four national teams with ratings higher than Slovakia’s rating of 1654; they are lucky to be in the tournament.
The likelihood that a team will win its match can be computed from the elo rankings of the team and its opponent according to the formula . Thus, the United States had a 60.5% expectation of winning its match against Ghana this afternoon, and Ghana had a 39.5% expectation of defeating the United States. Harrumph!
Every time a match is played, the elo rating of a team changes. The amount of the change is based on the actual result as compared to the expected result. If a team wins when they have a high expectation of winning, their elo rating goes up by a small amount, since they were expected to win. However, if a team wins when they have a low expectation of winning, their elo rating goes up by a large amount. The formula is , where K is a weighting for the importance of the game (K is 60 for the World Cup), G is a parameter based on the goal differential (we’ll assume that all games are won by a single goal, so G = 1), W is 1 for a win and 0 for a loss, and We is the winning expectation calculated by the formula given above.
Your task is to use the data and formulas described above to simulate the knockout stage of the World Cup a million times and report the number of times each nation wins. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
[…] Praxis – World Cup Prognostication By Remco Niemeijer In today’s Programming Praxis exercise, the goal is to write a simulator for the knockout stage of the World […]
My Haskell solution (see http://bonsaicode.wordpress.com/2010/06/29/programming-praxis-world-cup-prognostication/ for a version with comments):
{-# LANGUAGE BangPatterns #-} import Control.Monad import qualified Data.List.Key as K import qualified Data.Map as M import System.Random teams :: [(String, Float)] teams = [ ("URU", 1890), ("KOR", 1746), ("USA", 1785), ("GHA", 1711) , ("NED", 2045), ("SVK", 1654), ("BRA", 2082), ("CHI", 1883) , ("ARG", 1966), ("MEX", 1873), ("GER", 1930), ("ENG", 1945) , ("PAR", 1771), ("JPN", 1744), ("ESP", 2061), ("POR", 1874)] winChance :: Float -> Float -> Float winChance eloA eloB = 1 / (1 + 10 ** ((eloB - eloA) / 400)) update :: Float -> Float -> Float update winner loser = winner + 60 * (1 - winChance winner loser) match :: Float -> (a, Float) -> (a, Float) -> (a, Float) match r (a, ea) (b, eb) | r < winChance ea eb = (a, update ea eb) | otherwise = (b, update eb ea) simround :: [(a, Float)] -> [Float] -> [(a, Float)] simround (a:b:xs) (r:rs) = match r a b : simround xs rs simround _ _ = [] tournament :: [(a, Float)] -> IO a tournament [(w,_)] = return w tournament xs = tournament . simround xs . randoms =<< newStdGen simulate :: Int -> IO () simulate n = print . K.sort (negate . snd) . M.assocs =<< foldM (\ !m _ -> fmap (\x -> M.adjust succ x m) $ tournament teams) (M.map (const 0) $ M.fromList teams) [1..n]Hm, my code doesn’t seem to want to show up. Phil, could you have a look to see if it ended up in the spam queue? The only other explanation I can think of is that the source code highlighter doesn’t like the pragma in my code.
Fixed. It did go to the spam queue. I can’t imagine why.
You wouldn’t happen to have the standard prelude in a file or on the codepad, would you?
It’s an awful lot of stuff, and copy-pasting it all together is a rough task.
Thanks!
Graham: Fixed. See the Contents page or the introduction at the top of the Standard Prelude.
Wonderful. Thanks very much!
I re-ran my simulation after the quarter finals. The new Elo rankings put Netherlands and Spain in a tie for first place with 2085 points, Brazil dropped from first to third with 2072 points, Germany in fourth with 2044 points, Argentina in fifth with 1940 points, and Uruguay in sixth with 1895 points; the United States drops to twenty-fifth with 1749 points. Germany is the big mover with an increase from 1930 points to 2044 points following two wins with big goal differentials against higher-ranked teams. The new
teamsvariable reflecting the semi-final bracket and the new Elo ratings is(("URU" 1895) ("NED" 2085) ("GER" 2044) ("ESP" 2085)), and the result of a million simulated tournaments is(("NED" . 378294) ("ESP" . 317881) ("GER" . 231524) ("URU" . 72301)). I have been quite impressed with Germany; they are a young team that is visibly improving with each half they play, and I wouldn’t be surprised to see them win the World Cup or, at least, defeat Spain then lose in the final. However, I am sticking to my prediction that the winner of the Brazil/Netherlands game, who we now know to be Netherlands, will win the tournament; Netherlands are a great side, and the numbers back me up.Phil,
Something I don’t quite understand is how you can infer the winner based on a random number (I noticed Remco does something along the same lines too). Are you basically saying the winner of a match is, in spite of the ELO rankings, the toss of a coin? If that’s not the case, are you using actual world cup results to aid in your computations?
Great problem statement by the way (decidedly apropos). Keep them coming!
A simple ruby version commented at http://steamcode.blogspot.com/2010/07/world-cup-simulation.html
class Team include Comparable attr_reader :symbol, :country, :initial_elo, :new_elo attr_accessor :world_cup_wins def initialize(symbol, country, initial_elo) @symbol = symbol @country = country @initial_elo = initial_elo @new_elo = initial_elo @world_cup_wins = 0 end def <=>(team_other) @world_cup_wins <=> team_other.world_cup_wins end def to_s "#{@symbol} #{@country} #{@initial_elo} Wins = #{@world_cup_wins}" end def winning_expectation(team_other) 1.0 / (1.0 + 10.0 ** ((team_other.new_elo - @new_elo) / 400.0)) end def game(team_other) w_e = winning_expectation(team_other) if rand <= w_e calc_elo(w_e, 1.0) team_other.calc_elo(w_e, 0.0) return true else calc_elo(w_e, 0.0) team_other.calc_elo(w_e, 1.0) return false end end def reset_elo @new_elo = initial_elo end def calc_elo(w_e, win_loss) @new_elo = @new_elo + 60.0 * 1.0 * (win_loss - w_e) end end initial_rankings = [ [ 7, "URU", "Uruguay", 1890], [25, "KOR", "Korea", 1746], [15, "USA", "United States", 1785], [32, "GHA", "Ghana", 1711], [ 3, "NED", "Netherlands", 2045], [45, "SVK", "Slovakia", 1654], [ 1, "BRA", "Brazil", 2082], [ 8, "CHI", "Chile", 1883], [ 4, "ARG", "Argentina", 1966], [10, "MEX", "Mexico", 1873], [ 6, "GER", "Germany", 1930], [ 5, "ENG", "England", 1945], [19, "PAR", "Paraguay", 1771], [26, "JPN", "Japan", 1744], [ 2, "ESP", "Spain", 2061], [ 9, "POR", "Portugal", 1874], ] teams = Array.new() initial_rankings.each do |r| teams << Team.new(r[1], r[2], r[3]) end 1.upto(1000000) do | sim | current_round = teams current_round.each { |t| t.reset_elo } while current_round.size > 1 do next_round = Array.new i = 0 while i < current_round.size next_round << (current_round[i].game(current_round[i+1]) ? current_round[i] : current_round[i+1]) i += 2 end current_round = next_round end current_round[0].world_cup_wins += 1 end teams.sort! teams.each do | t | puts "team: #{t} " endMine in F#
Another try..
open System open System.Collections.Generic let make_team (name : string) (rating : float) = fun x -> if (x = 0) then name else string rating let name_of (team : int -> string) = team 0 let rating_of (team : int -> string) = float (team 1) let win_prob (A : int -> string) (B : int -> string) = let elo_us = rating_of A let elo_them = rating_of B 1.0 / (1.0 + 10.0 ** ((elo_them - elo_us) / 400.0)) let new_rating (A : int -> string) (B : int -> string) = let win_p = win_prob A B fun x -> //A wins if (x = "W") then int ((rating_of A) + 60.0 * (1.0 - win_p)) else //A loses int ((rating_of A) + 60.0 * (0.0 - win_p)) let rand = new Random() let play_off (A: int -> string) (B : int -> string) = let win_p = win_prob A B let p = rand.NextDouble() //if A wins if (p <= win_p) then make_team (name_of A) (float (new_rating A B "W")) else //if B wins make_team (name_of B) (float (new_rating B A "W")) let print_team (A: int->string) = sprintf "%s - %f" (name_of A) (rating_of A) let rec sim (teams: (int ->string) list) = //printfn "%A" (List.map print_team teams) if (teams.Length = 2) then play_off teams.[0] teams.[1] else let mid = teams.Length / 2 play_off (sim (teams |> Seq.take mid |> Seq.toList)) (sim (teams |> Seq.skip mid |> Seq.take mid |> Seq.toList)) let bra = make_team "BRA" 2082.0 let esp = make_team "ESP" 2061.0 let ned = make_team "NED" 2045.0 let arg = make_team "ARG" 1966.0 let eng = make_team "ENG" 1945.0 let ger = make_team "GER" 1930.0 let uru = make_team "URU" 1890.0 let chi = make_team "CHI" 1883.0 let por = make_team "POR" 1874.0 let mex = make_team "MEX" 1873.0 let usa = make_team "USA" 1785.0 let par = make_team "PRA" 1771.0 let kor = make_team "KOR" 1746.0 let jpn = make_team "JPN" 1744.0 let gha = make_team "GHA" 1711.0 let svk = make_team "SVK" 1654.0 let teams = [uru;kor; usa;gha; ned;svk; bra;chi; arg;mex; ger;eng; par;jpn; esp;por] let simulation = let table = new Dictionary<string, int>() for i in [1 .. 1000000] do let winner = sim teams if table.ContainsKey(name_of winner) then table.[(name_of winner)] <- table.[(name_of winner)] + 1 else table.Add(name_of winner, 1) tableWell, this is a bit late, but here’s mine, in Python.
Should probably be updated to account for the rankins as of today and the fixtures in the semis. Spain is the predicted winner.
import copy import random import operator def simulate_match(ranks, team1, team2): team1_odds = 1.0 / (1 + 10 ** ((ranks[team2] - ranks[team1]) / 400.0)) winner = team1 if random.random() < team1_odds else team2 winner_odds = team1_odds if winner == team1 else 1.0 - team1_odds loser = team1 if winner == team2 else team2 ranks[winner] += 60 * (1 - winner_odds) ranks[loser] -= 60 * (1 - winner_odds) return winner def simulate_round(ranks, matchlist): newlist = [] i = 0 while i < len(matchlist) - 1: newlist.append(simulate_match(ranks, matchlist[i], matchlist[i+1])) i += 2 return newlist def simulate_cup(ranks, matchlist): i = 0 while len(matchlist) != 1: matchlist = simulate_round(ranks, matchlist) return matchlist[0] def main(): cup_winners = {} for i in range(1000000): init_ranks = {"brazil" : 2082, "spain" : 2061, "netherlands" : 2045, "argentina" : 1966, "england" : 1945, "germany" : 1930, "uruguay" : 1890, "chile" : 1883, "portugal" : 1874, "mexico" : 1873, "united states" : 1875, "paraguay" : 1771, "korea" : 1746, "japan" : 1744, "ghana" : 1711, "slovakia" : 1654} init_matches = ["uruguay", "korea", "united states", "ghana", "netherlands", "slovakia", "brazil", "chile", "argentina", "mexico", "germany", "england", "paraguay", "japan", "spain", "portugal"] cup_winner = simulate_cup(init_ranks, init_matches) if cup_winner in cup_winners: cup_winners[cup_winner] += 1 else: cup_winners[cup_winner] = 1 sorted_winners = sorted(cup_winners.iteritems(), key=operator.itemgetter(1), reverse=True) for (k, v) in sorted_winners: print k.title(), v if __name__ == '__main__': main()And the results:
Spain 219542
Brazil 216734
Netherlands 185961
Argentina 78825
England 59688
Germany 50021
Uruguay 48066
United States 41076
Portugal 27293
Mexico 24100
Chile 22260
Paraguay 9498
Korea 6417
Japan 6012
Ghana 3791
Slovakia 716
with online elo ranking in drracket
#lang racket
(require net/url)
(define (elo-ranking country)
(let* ((in (get-pure-port (string->url “http://en.wikipedia.org/wiki/World_Football_Elo_Ratings”)))
(the-line(do ((the-line (read-line in) (read-line in)))
((regexp-match (string-append “>” country “number (car (regexp-match “[0-9]+” value)))))
(define sort #f)
(define merge #f)
(let ()
(define dosort
(lambda (pred? ls n)
(if (= n 1)
(list (car ls))
(let ((i (quotient n 2)))
(domerge pred?
(dosort pred? ls i)
(dosort pred? (list-tail ls i) (- n i)))))))
(define domerge
(lambda (pred? l1 l2)
(cond
((null? l1) l2)
((null? l2) l1)
((pred? (car l2) (car l1))
(cons (car l2) (domerge pred? l1 (cdr l2))))
(else (cons (car l1) (domerge pred? (cdr l1) l2))))))
(set! sort
(lambda (pred? l)
(if (null? l) l (dosort pred? l (length l)))))
(set! merge
(lambda (pred? l1 l2)
(domerge pred? l1 l2))))
(define (uniq-c eql? xs)
(if (null? xs) xs
(let loop ((xs (cdr xs)) (prev (car xs)) (k 1) (result ‘()))
(cond ((null? xs) (reverse (cons (cons prev k) result)))
((eql? (car xs) prev) (loop (cdr xs) prev (+ k 1) result))
(else (loop (cdr xs) (car xs) 1 (cons (cons prev k) result)))))))
(define rand #f)
(define randint #f)
(let ((two31 #x80000000) (a (make-vector 56 -1)) (fptr #f))
(define (mod-diff x y) (modulo (- x y) two31)) ; generic version
; (define (mod-diff x y) (logand (- x y) #x7FFFFFFF)) ; fast version
(define (flip-cycle)
(do ((ii 1 (+ ii 1)) (jj 32 (+ jj 1))) ((< 55 jj))
(vector-set! a ii (mod-diff (vector-ref a ii) (vector-ref a jj))))
(do ((ii 25 (+ ii 1)) (jj 1 (+ jj 1))) ((< 55 ii))
(vector-set! a ii (mod-diff (vector-ref a ii) (vector-ref a jj))))
(set! fptr 54) (vector-ref a 55))
(define (init-rand seed)
(let* ((seed (mod-diff seed 0)) (prev seed) (next 1))
(vector-set! a 55 prev)
(do ((i 21 (modulo (+ i 21) 55))) ((zero? i))
(vector-set! a i next) (set! next (mod-diff prev next))
(set! seed (+ (quotient seed 2) (if (odd? seed) #x40000000 0)))
(set! next (mod-diff next seed)) (set! prev (vector-ref a i)))
(flip-cycle) (flip-cycle) (flip-cycle) (flip-cycle) (flip-cycle)))
(define (next-rand)
(if (negative? (vector-ref a fptr)) (flip-cycle)
(let ((next (vector-ref a fptr))) (set! fptr (- fptr 1)) next)))
(define (unif-rand m)
(let ((t (- two31 (modulo two31 m))))
(let loop ((r (next-rand)))
(if (list a)))
((eq? (car seed) ‘set) (set! fptr (caadr seed))
(set! a (list->vector (cdadr seed))))
(else (/ (init-rand (modulo (numerator
(inexact->exact (car seed))) two31)) two31)))))
(set! randint (lambda args
(cond ((null? (cdr args))
(if (< (car args) two31) (unif-rand (car args))
(floor (* (next-rand) (car args)))))
((< (car args) (cadr args))
(let ((span (- (cadr args) (car args))))
(+ (car args)
(if (< span two31) (unif-rand span)
(floor (* (next-rand) span))))))
(else (let ((span (- (car args) (cadr args))))
(- (car args)
(if (< span two31) (unif-rand span)
(floor (* (next-rand) span))))))))))
(define countries
'("Uruguay"
"Korea Republic"
"United States"
"Ghana"
"Netherlands"
"Slovakia"
"Brazil"
"Chile"
"Argentina"
"Mexico"
"Germany"
"England"
"Paraguay"
"Japan"
"Spain"
"Portugal"))
(define teams
(map (lambda (x) (list x (elo-ranking x))) countries))
(define (win-expectation t1 t2)
(/ (+ 1 (expt 10 (/ (- (cadr t2) (cadr t1)) 400)))))
(define (point-change t1 t2)
(* 60 (- 1 (win-expectation t1 t2))))
(define (match t1 t2)
(if ( (cdr x) (cdr y)))
(uniq-c string=? (sort string<? result)))
(loop (- n 1) (cons (tournament teams) result)))))
(display (simulate 5000))