Abbreviated Sentences

April 28, 2017

The task doesn’t specify what to do with words that are one or two letters long; we arbitrarily decide to pass them through unchanged. Our program considers the input a character at a time, writing output every time it sees a non-letter:

(define (abbrev sentence)
  (with-output-to-string (lambda ()
    (define (word head len prev)
      (display head)
      (when (positive? len) (display (number->string len)))
      (when prev (display prev)))
    (let loop ((cs (string->list sentence))
               (head #f) (len -1) (prev #f))
      (cond ((null? cs) ; end of sentence
              (when head (word head len prev)))
            ((char-alphabetic? (car cs)) ; in a word
              (if head
                  (loop (cdr cs) head (+ len 1) (car cs))
                  (loop (cdr cs) (car cs) -1 #f)))
            (else ; not in a word
              (when head (word head len prev))
              (display (car cs))
              (loop (cdr cs) #f 0 #f)))))))

Here’s an example, with one-letter and two-letter words and a word with an embedded non-letter:

> (abbrev "A is one; Programming Praxis hello-goodbye.")
"A is o1e; P9g P4s h3o-g5e."

You can run the program at http://ideone.com/P6UgbQ.

Posted by programmingpraxis

Filed in Exercises

9 Comments »

9 Responses to “Abbreviated Sentences”

James Curtis-Smith said
April 28, 2017 at 11:38 AM
This is the sort of thing perl is great for… use a “regular expression” with replace function…
```
print s{\b([[:alpha:]])([[:alpha:]]+)([[:alpha:]])\b}{$1.length($2).$3}ger while <>
```

Jussi Piitulainen said

April 28, 2017 at 12:08 PM

Python’s got a more expressive regex library that can be installed and used instead of Python’s standard re. One thing it has is the named character classes. (Except I couldn’t see them in the interactive help text. But they seem to be there.)

import regex as re
import sys

l4s = re.compile('([[:alpha:]])([[:alpha:]]+)([[:alpha:]])')

def abbr(match):
    begin, middle, end = match.groups()
    return '{}{}{}'.format(begin, len(middle), end)

def program(sentence):
    return re.sub(l4s, abbr, sentence)

for sentence in sys.stdin:
    print(program(sentence), end = '')

bookofstevegraham said
April 28, 2017 at 5:30 PM
Can you give an example of when there are digits in the word? And one where there is a digit at the beginning and the end?
programmingpraxis said
April 28, 2017 at 6:29 PM
@bookofstevegraham: Words are maximal sequences of letters. Digits are not part of words. So a word like 1Texas2Step3 is abbreviated as 1, followed by T3s for Texas, followed by 2, followed by S2p for Step, followed by 3, so the full abbreviation is 1T3s2S2p3.

If you have questions about how a program works in a particular situation, you can always go to ideone.com, fork the recommended solution, and plug in your own data, like this: http://ideone.com/QETUMI.
bookofstevegraham said
April 28, 2017 at 6:59 PM
Just what I needed. Thanks.
bookofstevegraham said
April 28, 2017 at 8:48 PM
Cache (version of MUMPS)

abbrsent(str) ;New routine
;
n buffer,char,i,status
s (buffer,status)=””
w !!
f i=1:1:$l(str) d
. s char=$e(str,i)
. i char?1a d
. . i status=”char” d
. . . s buffer=buffer_char
. . e d
. . . s status=”char”,buffer=char
. e d
. . i status=”char” d
. . . d abbr(buffer)
. . . s (buffer,status)=””
. . w char
i buffer]”” d abbr(buffer)
q
;
abbr(str) ;
i $l(str)<3 w str
e w $e(str)_($l(str)-2)_$e($reverse(str))
q

===

d ^abbrsent("hello")

h3o

—

d ^abbrsent("Programming Praxis")

P9g P4s

—

d ^abbrsent("12Progra3m4ming5 6Praxi7s89")

12P4a3m4m2g5 6P3i7s89

—

d ^abbrsent("12Progra3m4mi5 6Praxi7s89")

12P4a3m4mi5 6P3i7s89

fisherro said

May 1, 2017 at 4:11 AM

We now have regex is the C++ standard library, but it still seems much harder than it should be to do some simple things.

#include <clocale>
#include <iostream>
#include <iterator>
#include <regex>
#include <string>

int main()
{
    std::setlocale(LC_ALL, NULL);
    std::string input;
    while (std::getline(std::cin, input)) {
        //Using [[:alpha:]] in the hopes of UTF-8 support by the locale
        //and the implementation
        std::regex regex(R"(([[:alpha:]])([[:alpha:]]+)([[:alpha:]]))");
        std::smatch match;
        while (std::regex_search(input, match, regex)) {
            std::cout << match.prefix()
                << match[1]
                << match[2].length()
                << match[3];
            input = match.suffix();
        }
        std::cout << input << '\n';
    }
}

fisherro said

May 1, 2017 at 4:28 AM

Refactored it to create a reusable gsub function.

#include <clocale>
#include <iostream>
#include <iterator>
#include <regex>
#include <sstream>
#include <string>

template<typename F>
std::string gsub(std::string in, const std::regex& rx, F f)
{
    std::string out;
    std::smatch m;
    while (std::regex_search(in, m, rx)) {
        out += m.prefix();
        out += f(m);
        in = m.suffix();
    }
    return out + in;
}

int main()
{
    std::setlocale(LC_ALL, NULL);
    std::string line;
    while (std::getline(std::cin, line)) {
        //Using [[:alpha:]] in the hopes of UTF-8 support by the locale
        //and the implementation
        std::regex regex(R"(([[:alpha:]])([[:alpha:]]+)([[:alpha:]]))");
        std::cout << gsub(line, regex, [](const std::smatch& m) {
            return m[1].str() + std::to_string(m[2].length()) + m[3].str();
        });
        std::cout << '\n';
    }
}

john said
May 2, 2017 at 2:29 PM
Using C11:

#include <ctype.h> #include <iso646.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h>
int main(int argc, char **argv) { size_t word_len = 0; int current = getchar(); int next = getchar(); while (current != EOF) { if (isalpha(current)) { ++word_len; bool word_begin = word_len == 1; bool word_end = not isalpha(next); if (word_begin) { putchar(current); } else if (word_end) { if (word_len - 2 > 0) { printf("%zd", word_len - 2); } putchar(current); word_len = 0; } } else { putchar(current); } current = next; next = getchar(); }
if (ferror(stdin)) { fprintf(stderr, "fatal error while reading stdin.\n"); exit(1); } exit(0); }

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Programming Praxis