Top Five Test Scores
July 3, 2018
We begin by creating a data set:
(define scores '((Bob 78) (Sue 75) (Bill 83) (Jill 88) (Joe 58) (Bob 93) (Sue 84) (Bill 81) (Jill 91) (Joe 75) (Bob 62) (Sue 93) (Bill 91) (Jill 99) (Joe 62) (Bob 92) (Sue 88) (Bill 88) (Jill 76) (Joe 61) (Bob 87) (Sue 92) (Bill 87) (Jill 85) (Joe 69) (Bob 85) (Sue 87) (Bill 93) (Jill 96) (Joe 65) (Bob 72) (Sue 88) (Bill 85) (Jill 83) (Joe 71)))
Our algorithm is to collect all the scores in an R6RS hash table keyed by student name with a list of scores for each student, then for each student sort the scores, take the top five, and report the result:
(define (final-score scores) (let ((ht (make-eq-hashtable))) (do ((scores scores (cdr scores))) ((null? scores) (call-with-values (lambda () (hashtable-entries ht)) (lambda (names score-lists) (map (lambda (name score-list) (cons name (/ (sum (take 5 (sort > score-list))) 5.0))) (vector->list names) (vector->list score-lists))))) (hashtable-update! ht (caar scores) (lambda (score) (cons (cadar scores) score)) '()))))
There are other ways to do that. You could use an association list instead of a hash table, or some kind of ordered tree. Instead of collecting all the scores in a list, you could collect the scores in a priority queue of fixed length, thus storing only five scores at any time. Or, since the number of stores test scores is so small, you could keep them in a sorted array and shift the array items as new scores arrive. And there are probably dozens of other alternatives, which I’m sure all of you clever folks will discuss in the comments. Here’s the output:
> (final-score scores) ((Bill . 88.8) (Bob . 87.0) (Jill . 91.8) (Sue . 89.6) (Joe . 68.4))
You can run the program at https://ideone.com/3a9opt.
In Python, using a heapq for keeping the best 5 scores on the go. Should be more efficient than sorting the all results per student afterwards.
Also works for students with less than 5 scores (student3 example), although that might not be really fair :)
Here is solution that I believe is asymptotically optimal O(nlog(k)),
with the usual hashing caveat. It uses R7RS Scheme, hash tables from
SRFI 69, and priority queues from SLIB. We could also replace the
hash table with a balanced-tree data structure to get guaranteed
O(nlog(n)) and output sorted by names if needed.
@programmingpraxis: In Scheme, numbers should be compared with eqv?, not eq?.
Here’s a solution in C++11.
Output:
Probably my longest Perl ever here – we need to quickly implement an ordered queue to keep track of the top 5 scores for each candidate.
I think I prefer to sort by score descending – so we will have one sort here!
A Haskell version.