Double Double Words
October 13, 2015
The biggest problem with today’s task is working out the specification. In some cases, it would be very, very bad to notify the user of doubled in words, as in this case, where the words are obviously intentionally doubled. One idea is to be strict about the doubling by including attached punctuation, but that can backfire, so we decide to simply report all doubled words and let the user sort out any that are intentional.
The other problem with today’s task is checking that a word that ends one line is not doubled at the beginning of the next line. The simplest solution here is to ignore line breaks and look only at words. But for reporting, it would be nice to report the line number where the doubling occurs. So we’ll keep track of line numbers as we read in the text.
Our solution uses global variables to track the current line and line number and a function that gets words from the input in order, updating the current line number as it goes. Here’s the function that gets words:
(define (read-word) (if (pair? line) (let ((word (car line))) (set! line (cdr line)) word) (let ((input (read-line))) (if (eof-object? line) line (begin (set! line (string-split #\space (cleanup input))) (set! number (+ number 1)) (read-word))))))
It calls an auxiliary function to remove punctuation:
(define (cleanup str) (let loop ((cs (string->list str)) (zs (list))) (cond ((null? cs) (list->string (reverse zs))) ((char-alphabetic? (car cs)) (loop (cdr cs) (cons (car cs) zs))) ((char-whitespace? (car cs)) (loop (cdr cs) (cons #\space zs))) (else (loop (cdr cs) zs)))))
The global variables are initially #f
and are initialized in the driver function:
(define line #f) (define number #f)
And here’s the driver function that reads words, reports doubles, and controls processing; notice that we ignore case in the string comparison:
(define (double file-name) (call-with-input-file file-name (lambda () (set! line "") (set! number 0) (let loop ((prev "") (word (read-word))) (when (not (eof-object? word)) (when (string-ci=? prev word) (display number) (display " ") (display word) (newline)) (loop word (read-word)))))))
Called on a file that contains the three lines
This is a very, very good good example of doubled doubled words.
the function returns three instances of doubled words:
> (double "sample") 1 very 2 good 3 doubled
Note that in the case where the word pair spans two lines, it is the second line number that is reported.
We used read-line
and string-split
from the Standard Prelude. You can run the program at http://ideone.com/idcTOo, where it has been modified slightly to read from standard input instead of a filename.
Perfect example where perl rocks!
Python
Alternate Perl solution:
Here’s my Python version:
Uses fileinput from the standard library to handle opening and closing files provided on the command line. It also keeps track of name of the file and line number. Uses regex’s to find the words in a line.
If a repeated word is found, the program prints the word, the line number(s), and a portion of the line(s) surrounding the repeated word for context.
Example output:
string content = File.ReadAllText("file.txt");
string[] words = text.Split(’ ‘, ‘\t’, ‘\n’);
string output = "";
for (int i = 0; i < words.Length – 1; i++)
{
if (words[i] == words[i1])
{
output += "Word Index: " i.ToString() ", "
"Word: " words[i] "\n";
}
}
MessageBox.Show(output);
1. I don’t know to format the text as code,
2. For some reason, the “+” is removed from some line, for example the correct code is: if (words[i] == words[i+1])
3. Why I cannot edit my comment to correct it?! :D