Distinct Characters

May 9, 2017

We’ve done similar exercises in the past:

Write a program to determine if all the characters in a string are distinct. For instance, the word “Programming” has two m and two g, so all the characters are not distinct, but the word “Praxis” has six distinct characters.

Your task is to write a program to determine if all the characters in a string are distinct; you should provide three solutions, one that takes time O(n²), one that takes time O(n log n), and one that takes time O(n). When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

8 Comments »

8 Responses to “Distinct Characters”

Paul said

May 9, 2017 at 11:09 AM

In Python.

def order_n(word):
    return len(set(word)) == len(word)

def order_nln(word):
    chars = sorted(word)
    return not any(a == b for a, b in zip(chars, chars[1:]))

def order_n2(word):
    return not any(a == b for i, a in enumerate(word) for b in word[i+1:])

Rutger said

May 9, 2017 at 11:53 AM

Added an additional method to Paul’s solution in Python. Uses pruning, combined with dictionary add/lookup is O(1) average. Has better performance, depending on distribution of frequently occuring characters amongst all different words (i.e. character ‘e’ reoccurs frequently in English).

import time

def order_n(word):
    return len(set(word)) == len(word)
 
def order_nln(word):
    chars = sorted(word)
    return not any(a == b for a, b in zip(chars, chars[1:]))
 
def order_n2(word):
    return not any(a == b for i, a in enumerate(word) for b in word[i+1:])

def order_n_prune(word):
	d = {}
	for c in word:
		if c in d:
			return False
		else:
			d[c] = 1
	return True



def test_performance(list_of_words, func):
	start_time = time.time()	
	distinct = 0
	for word in list_of_words:
		if func(word):
			distinct += 1

	print(func.__name__, distinct, len(list_of_words))
	print("Took", time.time() - start_time, "seconds.")


list_of_words = []
words_file = '/usr/share/dict/words'
for line in open(words_file):
	word = line.strip()
	list_of_words.append(word)

test_performance(list_of_words, order_n)
test_performance(list_of_words, order_nln)
test_performance(list_of_words, order_n2)
test_performance(list_of_words, order_n_prune)


# RESULTS
# python unique.py 
# ('order_n', 99419, 479828)
# Took 0.4857931137084961 seconds.
# ('order_nln', 99419, 479828)
# Took 1.2898380756378174 seconds.
# ('order_n2', 99419, 479828)
# Took 1.4195671081542969 seconds.
# ('order_n_prune', 99419, 479828)
# Took 0.416719913482666 seconds.

John Cowan said
May 9, 2017 at 12:14 PM
Rutger: I expect the difference between the set object and your emulation of it is just noise.
Rutger said
May 9, 2017 at 1:25 PM
John: order_n_prune is rougly 10-20% faster than order_n for the words in /usr/share/dict/words (linux). For other datasets this may not hold, i.e. a dataset with words that only have unique characters.
bookofstevegraham said
May 9, 2017 at 6:50 PM
Klong: Not sure how to provide the 3 different solutions, so here’s the quickest one

str::”Programming Praxis”
“Programming Praxis”
str=?str
0
str::”Praxis”
“Praxis”
str=?str
1
:” ?a returns a list containing unique elements from ‘a’ in order of appearance.”
bookofstevegraham said
May 9, 2017 at 7:40 PM
[sourcecode lang="css"]

Not sure how to do the 3 solutions, so here’s just one:

DISTCHAR (str) ; Check for distinct characters
n arr,flg,i
s flg=1
f i=1:1:$l(str) d q:’flg
. s char=$e(str,i)
. i $d(arr(char)) s flg=0
. e s arr(char)=””
q flg

MCL> W $$DISTCHAR^DISTCHAR(“Programming Praxis”)
0
MCL> W $$DISTCHAR^DISTCHAR(“Praxis”)
1

bookofstevegraham said

May 9, 2017 at 7:41 PM


Not sure how to do the 3 solutions, so here’s just one:

DISTCHAR (str) ; Check for distinct characters
n arr,flg,i
s flg=1
f i=1:1:$l(str) d q:’flg
. s char=$e(str,i)
. i $d(arr(char)) s flg=0
. e s arr(char)=””
q flg

MCL> W $$DISTCHAR^DISTCHAR(“Programming Praxis”)
0
MCL> W $$DISTCHAR^DISTCHAR(“Praxis”)
1

john said
May 10, 2017 at 5:09 PM
Just the fastest solution (O[n] time) in C11:

#include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h>
bool distinct (char *str); int main (int argc, char **argv) { char *strings[] = {"Programming", "Praxis", "Cat", "Dog", "Dogg", NULL}; for (register size_t i = 0; strings[i] != NULL; ++i) { if (distinct(strings[i])) printf("'%s' is distinct\n", strings[i]); else printf("'%s' is not distinct\n", strings[i]); } exit(0); } // O(n) solution bool distinct (char *str) { size_t arr[127] = {0}; // ASCII const size_t n = strlen(str); for (register size_t i = 0; i < n; ++i) if (++arr[str[i]] > 1) return false;
return true; }

Programming Praxis