## Sum Distinct Items

### August 20, 2019

There are two obvious answers to this task, one using sorting and the other using searching, though to many people one of the answers will be obvious and the other will be obscure. Sorting first:

(define (sum-distinct xs) (let loop ((xs (sort < xs)) (sum 0)) (cond ((null? xs) sum) ((and (pair? (cdr xs)) (= (car xs) (cadr xs))) (loop (drop-while (lambda (x) (= x (car xs))) xs) sum)) (else (loop (cdr xs) (+ sum (car xs))))))) > (sum-distinct '(4 2 3 1 7 4 2 7 1 7 5)) 8

Searching seems more natural to me, though your taste may vary. We use lists to track *once* and *more*, but if the input is large it would be faster to use hash tables or some sort of balanced tree:

(define (sum-distinct xs) (let loop ((xs xs) (once (list)) (more (list)) (sum 0)) (cond ((null? xs) sum) ((member (car xs) more) (loop (cdr xs) once more sum)) ((member (car xs) once) (loop (cdr xs) once (cons (car xs) more) (- sum (car xs)))) (else (loop (cdr xs) (cons (car xs) once) more (+ sum (car xs))))))) > (sum-distinct '(4 2 3 1 7 4 2 7 1 7 5)) 8

Every integer is added to the sum the first time it is seen. If it is seen a second time, it is subtracted from the sum (since it was improperly added the first time). Additional appearances after the second are ignored.

Note the time-for-space trade-off here. The sorting solution is O(*n* log *n*). The searching solution is O(*n*), assuming we use hash tables with O(1) search-time to store the *once* and *more* sets, but requires space to store the sets.

This makes an interesting interview question. Watch the candidate subconsciously select either the sorting solution or the searching solution. Challenge the candidate to find the other solution. Then have the candidate talk about the trade-offs between the two solutions.

You can run the program at https://ideone.com/8AKBAY.

In Python it is easy using a Counter.

Here is a solution in standard R7RS Scheme with hash-tables of SRFI

69. It runs in linear time (modulo usual hashing caveats).

Output:

Klong version

Rust generic version, no hash table required:

Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ace1791e4d6532b0bb39c4d5f4b3eb99

Rust generic version, no hash table required:

Rust generic version, no hash table required:

Playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ace1791e4d6532b0bb39c4d5f4b3eb99

Paul beat me to it:

Here’s a Haskell version. We’ll say the empty list sums to 0.

Here’s some C++, using a hash table (‘unordered_map’) to store element counts, and a sorting solution, both use the “increment on first occurrence, decrement on second, subsequently ignore” approach, though for the sorting solution it’s probably better to just count the elements, as in Bill’s solution above.

Interestingly, for a million items, the sorting method is significantly faster than the hash table method (about 160ms, vs. 500ms or so).

Here is my take on this, without using any auxiliary functions from packages, in Julia 1.1:

function main(x::Array{Int64, 1})

s = 0

n = length(x)

end

Cheers!

Here’s a solution in C.

Output:

My last solution should work, but the load factor of the hash table could reach 1. This was from mistakenly using the wrong variable on lines 10 and 13. Here’s the fixed solution, which limits the load factor to 0.7.

Java version: