Abbreviated Sentences
April 28, 2017
The task doesn’t specify what to do with words that are one or two letters long; we arbitrarily decide to pass them through unchanged. Our program considers the input a character at a time, writing output every time it sees a non-letter:
(define (abbrev sentence) (with-output-to-string (lambda () (define (word head len prev) (display head) (when (positive? len) (display (number->string len))) (when prev (display prev))) (let loop ((cs (string->list sentence)) (head #f) (len -1) (prev #f)) (cond ((null? cs) ; end of sentence (when head (word head len prev))) ((char-alphabetic? (car cs)) ; in a word (if head (loop (cdr cs) head (+ len 1) (car cs)) (loop (cdr cs) (car cs) -1 #f))) (else ; not in a word (when head (word head len prev)) (display (car cs)) (loop (cdr cs) #f 0 #f)))))))
Here’s an example, with one-letter and two-letter words and a word with an embedded non-letter:
> (abbrev "A is one; Programming Praxis hello-goodbye.") "A is o1e; P9g P4s h3o-g5e."
You can run the program at http://ideone.com/P6UgbQ.
This is the sort of thing perl is great for… use a “regular expression” with replace function…
Python’s got a more expressive regex library that can be installed and used instead of Python’s standard re. One thing it has is the named character classes. (Except I couldn’t see them in the interactive help text. But they seem to be there.)
Can you give an example of when there are digits in the word? And one where there is a digit at the beginning and the end?
@bookofstevegraham: Words are maximal sequences of letters. Digits are not part of words. So a word like 1Texas2Step3 is abbreviated as 1, followed by T3s for Texas, followed by 2, followed by S2p for Step, followed by 3, so the full abbreviation is 1T3s2S2p3.
If you have questions about how a program works in a particular situation, you can always go to ideone.com, fork the recommended solution, and plug in your own data, like this: http://ideone.com/QETUMI.
Just what I needed. Thanks.
Cache (version of MUMPS)
abbrsent(str) ;New routine
;
n buffer,char,i,status
s (buffer,status)=””
w !!
f i=1:1:$l(str) d
. s char=$e(str,i)
. i char?1a d
. . i status=”char” d
. . . s buffer=buffer_char
. . e d
. . . s status=”char”,buffer=char
. e d
. . i status=”char” d
. . . d abbr(buffer)
. . . s (buffer,status)=””
. . w char
i buffer]”” d abbr(buffer)
q
;
abbr(str) ;
i $l(str)<3 w str
e w $e(str)_($l(str)-2)_$e($reverse(str))
q
===
d ^abbrsent("hello")
h3o
—
d ^abbrsent("Programming Praxis")
P9g P4s
—
d ^abbrsent("12Progra3m4ming5 6Praxi7s89")
12P4a3m4m2g5 6P3i7s89
—
d ^abbrsent("12Progra3m4mi5 6Praxi7s89")
12P4a3m4mi5 6P3i7s89
We now have regex is the C++ standard library, but it still seems much harder than it should be to do some simple things.
Refactored it to create a reusable gsub function.
Using C11:
#include <ctype.h>
#include <iso646.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
size_t word_len = 0;
int current = getchar();
int next = getchar();
while (current != EOF) {
if (isalpha(current)) {
++word_len;
bool word_begin =
word_len == 1;
bool word_end =
not isalpha(next);
if (word_begin) {
putchar(current);
} else if (word_end) {
if (word_len - 2 > 0) {
printf("%zd", word_len - 2);
}
putchar(current);
word_len = 0;
}
} else {
putchar(current);
}
current = next;
next = getchar();
}
if (ferror(stdin)) {
fprintf(stderr, "fatal error while reading stdin.\n");
exit(1);
}
exit(0);
}