January 24, 2017
Although the problem didn’t say so, we will assume that the input file consists of one word per line; if that’s not right, you can preprocess the input to make it right.
Unless there is some reason to do something different, I would call on the standard Unix sort utility to solve this problem; the
-u flag to
sort specifies unique output:
$ sort -u infile
If that doesn’t work, the solution depends on the exact problem and on the capabilities of your computing environment. One answer is to write your own program that mimics the operation of
sort -u. How you do that depends strongly on your computing environment — language, compiler, and operating system — so we won’t try to tackle it here.
If the number of distinct words in the input will fit into memory, you could use a set data structure, perhaps implementing the set with an underlying balanced binary tree to provide the final output in sorted order. Here’s a program that uses the AVL trees of the Standard Prelude to solve the problem:
(define (sort-unique infile) (with-input-from-file infile (lambda () (let ((words (make-dict string<?))) (do ((word (read-line) (read-line))) ((eof-object? word) (for-each (lambda (word) (display word) (newline)) (map car (words 'to-list)))) (set! words (words 'update (lambda (k v) (+ v 1)) word 1)))))))
Given that there are less than a quarter of a million words in the English language, and most documents won’t use more than a fraction of them, that will likely work in the majority of cases. You can run the program at http://ideone.com/ogxnFq.
Pages: 1 2