Double Double Words

October 13, 2015

Today’s task is to write a program that reads a file and reports any instances of doubled words, which is a useful program for anyone that does a lot of writing, as I do in this blog.

Your task is to write a program to find doubled words. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

6 Comments »

6 Responses to “Double Double Words”

James Curtis-Smith said

October 13, 2015 at 9:11 AM

Perfect example where perl rocks!

while(<>){$l++;foreach(split/\W+/,lc$_){printf "%4d %s\n",$l,$_ if$x eq$_;$x=$_;}}

Rutger said

October 13, 2015 at 2:16 PM

Python

from collections import Counter
import re

text = """   Assassin beef noodles savant human chrome order-flow 
lights neural physical render-farm post-stimulate fluidity skyscraper 
8-bit. Free-market physical vinyl towards nano-Tokyo sign render-farm. 
Decay digital katana disposable apophenia modem dissident narrative. 
Soul-delay euro-pop vinyl pre-ablative market bridge sunglasses dead 
youtube hotdog rebar claymore mine. """

c = Counter(split for line in text.splitlines() for split in re.sub("[^\w]", " ",  line).split())
print [word for word in c if c[word] > 1]

mcmillhj said

October 13, 2015 at 2:19 PM

Alternate Perl solution:

use strict; 
use warnings; 

my $text = do {
   local $/ = undef;
   <>;
};

my $line_no = 1;
while ( my ($w1,$sep,$w2) = $text =~ m/(\w+)(\W+)(\w+)/ ) {
   $text =~ s/$w1\W+//;
   $line_no++ if $sep eq "\n";
   printf "%04d %s\n", $line_no, $w1 if $w1 eq $w2;
}

Mike said

October 14, 2015 at 5:45 PM

Here’s my Python version:

Uses fileinput from the standard library to handle opening and closing files provided on the command line. It also keeps track of name of the file and line number. Uses regex’s to find the words in a line.

If a repeated word is found, the program prints the word, the line number(s), and a portion of the line(s) surrounding the repeated word for context.

with fileinput.input() as f:
    for line in f:
        line = line.rstrip()

        if fileinput.isfirstline():
            print(fileinput.filename())
            prevline = ''
            prevword = None

        firstword = True
        for match in pat.finditer(line):
            word = match.group().lower()
            if word == prevword:
                b, e = match.span()
                lineno = fileinput.filelineno()
                fmt = "\t'{}' at {}: ...{}..."
                if firstword:
                    context = prevline[-15:] + ' ' + line[:e+10]
                    where = "lines {}-{}".format(lineno-1, lineno)
                else:
                    context = line[b-15:e+10]
                    where = "line {}".format(lineno)

                print(fmt.format(word, where, context))

            prevword = word
            firstword = False

        prevline = line

Example output:

C:/projects/testdata.txt
	'a' at lines 2-3: ...upon a a time. The...
	'of' at lines 4-5: ...was a test of of the emerg...
	'if' at lines 6-7: ...cast system. If if there had...
	'been' at line 7: ...there had been been...

maroonedsia said
November 13, 2015 at 8:15 PM
string content = File.ReadAllText("file.txt");
string[] words = text.Split(’ ‘, ‘\t’, ‘\n’);
string output = "";

for (int i = 0; i < words.Length – 1; i++)
{
if (words[i] == words[i1])
{
output += "Word Index: " i.ToString() ", "
"Word: " words[i] "\n";
}
}

MessageBox.Show(output);
maroonedsia said
November 13, 2015 at 8:17 PM
1. I don’t know to format the text as code,
2. For some reason, the “+” is removed from some line, for example the correct code is: if (words[i] == words[i+1])
3. Why I cannot edit my comment to correct it?! :D

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Programming Praxis