## Extract Number From String

### January 18, 2019

We have another string-handling exercise today:

Given a string containing a number, extract the number from the string. For instance, the strings “-123”, “-123junk”, “junk-123”, “junk-123junk” and “junk-123junk456” should all evaluate to the number -123.

Your task is to write a program to extract a number from a string. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

This is a piece of cake for Perl – as it is its raison d’etra… uses some later Perl 5 features like “r” flag on regexp to replace in place and say to print with a “\n” at the end of it… {hence using -E rather than -e switch}

Well, we could just use a regex like James’ solution, but it’s more fun to roll our own recognizer. Also, we should take cognizance of internationalization and check things work fine with non-Latin numerics. Here’s some Python 3:

A Haskell version.

@matthew, your solution doesn’t handle negative numbers.

Here’s a solution in C, supporting ASCII digits 0-9.

Examples:

Here’s a less readable version of

`extract`

that uses compiler built-ins to handle overflow.[soucecode lang=”c”]

int extract(char* input, int* output) {

int start_idx = -1;

int end_idx = -1;

for (int i = 0; ; ++i) {

char c = input[i];

if (c == ‘\0’) return 0;

if (c < ‘0’ || c > ‘9’) continue;

start_idx = i;

break;

}

for (int i = start_idx + 1; ; ++i) {

char c = input[i];

if (c >= ‘0’ && c <= ‘9’) continue;

end_idx = i – 1;

break;

}

int result = 0;

int multiplier = 1;

if (start_idx > 0 && input[start_idx – 1] == ‘-‘)

multiplier = -1;

for (int i = end_idx; i >= start_idx; –i) {

if (i < end_idx && __builtin_mul_overflow(10, multiplier, &multiplier))

return 0;

int addend = input[i] – ‘0’;

if (__builtin_mul_overflow(addend, multiplier, &addend))

return 0;

if (__builtin_add_overflow(result, addend, &result))

return 0;

}

*output = result;

return 1;

}

[/sourcecode]

I had a typo (“soucecode”) that messed up the formatting of my solution that accommodates overflow. Here’s the properly formatted version.

@Daniel: Good point. I was reading your solution (incidentally, you need to initialize

`result`

on line 35 of the original) and wondering how you were going to handle minus signs, so liked your trick when I came to it. It seems unreasonable not to allow the Unicode minus sign (U+2212) which is not supported by the Python`int`

function@matthew, thanks. My intent was to initialize the result in

`extract`

.Here’s the updated code.

Regular expressions are clearly the way to go, but I implemented also another solution based on groupby. Split the string in groups of (possible) sign(s) digits and the rest and return the (signed) digits.

Mumps version

Mumps version