Steve Yegge’s Phone-Screen Coding Exercises
June 30, 2009
Steve Yegge is a programmer and popular blogger. One of his blog entries proposes these seven phone-screen coding exercises:
- Write a function to reverse a string.
- Write a function to compute the Nth fibonacci number.
- Print out the grade-school multiplication table up to 12 x 12.
- Write a function that sums up integers from a text file, one per line.
- Write a function to print the odd numbers from 1 to 99.
- Find the largest int value in an int array.
- Format an RGB value (three 1-byte numbers) as a 6-digit hexadecimal string.
Your task is to write solutions to stevey’s phone-screen exercises. When you are finished, you are welcome to read a suggested solution, or to post your solution or discuss the exercise in the comments below.
Treaps
June 26, 2009
Dictionaries are a common data type, which we have used in several exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams). Hash tables are often used as the underlying implementation for dictionaries, and in a previous exercise we developed ternary search tries as another implementation for dictionaries, which are useful when the keys are strings. Today’s exercise examines treaps, which are yet another method for implementing dictionaries.
A treap is essentially a binary search tree, represented as either an empty node, called nil, or a node with a key and a value and pointers to two child nodes, the left child and the right child. Binary search trees obey the rule that all nodes in the left child have keys that are less than the key of the current node, and all nodes in the right child have keys that are greater than the key of the current node. One node is distinguished as the root of the tree; it is the only node visible outside the tree. Obviously, a single set of keys could be represented by many different binary search trees, depending on which key is chosen as the root and the choices made at each level under the root.
The speed of a search depends on the depth at which the key is located in the tree. Trees that have more balance (fewer nodes with nil children) are faster to search than trees that have less balance (more nodes with nil children) because the overall depth of the tree is less. The two extreme cases occur when a tree is perfectly balanced (no internal nil nodes), when a search takes time logarithmic in the size of the tree, and when a tree is perfectly imbalanced (a long chain of nodes each with one nil child), when a search takes time linear in the size of the tree.
Treaps use an ingenious method to assure that the tree is approximately balanced at all times. Each node, in addition to the key, value, and two child nodes, contains a numeric priority, which is randomly assigned at the time the node is created and never changed. Of the many possible trees that could represent a particular set of keys, the one that is chosen obeys the heap-order property, with the priority of each node being greater than the priorities of all of its child nodes. Assuming all keys and priorities are distinct, there will be exactly one tree that satisfies both the tree-order and heap-order properties; it is the same tree that would be created if keys were dynmically inserted into a normal binary search tree in order of the priority field. The genius of this method is that the priorities are independent of the keys, making it extremely unlikely that highly-imbalanced trees could be created, thus keeping search times approximately logarithmic.
As nodes are dynamically inserted into and deleted from the tree, it is sometimes necessary to restructure the tree to maintain both the tree-order and the heap-order properties. The restructurings, known as rotations, come in two varieties, left and right:
[ This image, by Ramasamy, is used under the terms of the GNU Free Documentation License or the Creative Commons Attribution Share-Alike License. ]
In both trees shown above, all the nodes in sub-tree A have keys less than p, all the nodes in sub-tree B have keys between p and q, and all the nodes in sub-tree C have keys greater than q. Transforming the tree so that the root moves from p to q is called a left rotation, and transforming the tree so that the root moves from q to p is called a right rotation; in both cases the root node moves down the tree toward the leaves. Note that both rotations preserve the tree-order property. It is the goal of the insert and delete algorithms to use rotations to maintain the heap-order property.
Your task is to implement the lookup, insert, update, delete and enlist operations on treaps. When you are finished, you are welcome to read or run a suggested solution, or post your solution or discuss the exercise in the comments below.
The Mod Out System
June 23, 2009
Over at The Daily WTF, Alex Papadimoulis catalogs some insanely bad programming practices. The Mod Out System is one of the best … worst … absolutely sickening … no boss, I would never do such a thing … well, you get the point.
The hero of the story, Gary, must write a function to expand a list of ranges like 1-6,9,13-19 into a list without ranges like 1,2,3,4,5,6,9,13,14,15,16,17,18,19.
Your task is to write that function. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.
Monte Carlo Factorization
June 19, 2009
We have previously examined two methods of integer factorization, trial division using wheels and Fermat’s method of the differences of squares. In this exercise we examine a probabilistic method of integer factorization that takes only a small, fixed amount of memory and tends to be very fast. This method was developed by John Pollard in 1975 and is commonly called the Pollard rho algorithm.
The algorithm is based on the observation that, for any two random integers x and y that are congruent modulo p, the greatest common divisor of their difference and n, the integer being factored, will be 1 if p and n are coprime (have no factors in common) but will be between 1 and n if p is a factor of n. By the birthday paradox, p will be found with reasonably large probability after trying about random pairs.
Pollard’s algorithm uses a function modulo n to generate a pseudo-random sequence. Two copies of the sequence are run, one twice as fast as the other, and their values are saved as x and y. At each step, we calculate gcd(x–y,n). If the greatest common divisor is one, we loop, since the two values are coprime. If the greatest common divisor is n, then the values of the two sequences have become equal and Pollard’s algorithm fails, since the sequences have fallen into a cycle, which is detected by Floyd’s tortoise-and-hare cycle-finding algorithm; that’s why we have two copies of the sequence, one (the “hare”) running twice as fast as the other (the “tortoise”). But if the greatest common divisor is between 1 and n, we have found a factor of n.
Failure doesn’t mean failure. It just means that the particular pseudo-random sequence that we chose doesn’t lead to success. Our response to failure is to try another sequence. We use the function x² + c (mod n), where c is initially 1. If Pollard’s algorithm fails, we increase c to 2, then 3, and so on. If we keep increasing c, we will eventually find a factor, though it may take a long time if n is large.
Your task is to implement Pollard’s factorization algorithm. You can test it by calculating the factors of the 98th Mersenne number, 298 – 1. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
We’re moving and changing!
June 17, 2009
Programming Praxis has a new web address: programmingpraxis.com. For the moment, Programming Praxis is still hosted at wordpress.com, so the old address at programmingpraxis.wordpress.com will still work (it maps the old address to the new address), but that will change sometime in the future. Please update your bookmarks. The RSS feed moves to programmingpraxis.com/feed. All existing content, including your comments, remains.
Programming Praxis also has a new theme, and a logo, which was designed by Remco Niemeijer. Thanks, Remco! Some of the pages have moved, others will be moving shortly, some will change, and there will soon be some new additions.
Thanks to all my readers for their patience during this transition. Please let me know if you see anything that breaks, or if you have suggestions for other changes. You can contact me at programmingpraxis@gmail.com.
Who Owns The Zebra?
June 16, 2009
This famous puzzle was first published by Life magazine on December 17, 1962. It has been variously attributed to both Albert Einstein and Lewis Carroll, but the true author is not known. There are several versions; this is the original from Life:
1 There are five houses.
2 The Englishman lives in the red house.
3 The Spaniard owns the dog.
4 Coffee is drunk in the green house.
5 The Ukrainian drinks tea.
6 The green house is immediately to the right of the ivory house.
7 The Old Gold smoker owns snails.
8 Kools are smoked in the yellow house.
9 Milk is drunk in the middle house.
10 The Norwegian lives in the first house.
11 The man who smokes Chesterfields lives in the house next to the man with the fox.
12 Kools are smoked in the house next to the house where the horse is kept.
13 The Lucky Strike smoker drinks orange juice.
14 The Japanese smokes Parliaments.
15 The Norwegian lives next to the blue house.
In the interest of clarity, it must be added that each of the five houses is painted a different color, and their inhabitants are of different national extractions, own different pets, drink different beverages and smoke different brands of American cigarettes.
Your task is to write a program to solve the puzzle and determine: Who drinks water? Who owns the zebra? When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Feynman’s Puzzle
June 12, 2009
Richard Feynman was an American physicist; he worked on the Manhattan Project that developed the atomic bomb, won the Nobel Prize in 1965 for his work in the development of quantum electrodynamics, and famously demonstrated the inelasticity of the rubber o-rings on the space shuttle Challenger using a clamp, a rubber band, and a glass of ice water. Feynman proposed this puzzle:
Your task is to solve Feynman’s Puzzle. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercises in the comments below.
Longest Common Subsequence
June 9, 2009
Finding the longest common subsequence of two sequences is a classic computer science problem with an equally classic solution that dates to the folklore of computing. The longest common subsequence is the longest set of elements that appear in order (not necessarily contiguous) in two sequences; for instance, the longest common subsequence of PROGRAMMING and PRAXIS is PRAI:
P R O G R A M M I N G
| | | |
P R A X I S
The classic solution uses dynamic programming. A matrix is prepared with one sequence in the rows and the other sequence in the columns, giving in each cell the minimum edit distance between the two, where the edit distance is the least number of adds, deletes and changes that can convert one sequence to the other. For instance:
P R O G R A M M I N G
0 0 0 0 0 0 0 0 0 0 0 0
P 0 1 1 1 1 1 1 1 1 1 1 1
R 0 1 2 2 2 2 2 2 2 2 2 2
A 0 1 2 2 2 2 3 3 3 3 3 3
X 0 1 2 2 2 2 3 3 3 3 3 3
I 0 1 2 2 2 2 3 3 3 4 4 4
S 0 1 2 2 2 2 3 3 3 4 4 4
If we represent the rows as Xi and the columns as Yj, the matrix can be filled top-to-bottom, left-to-right using the formula:
{ 0, if i=0 or j=0
{
LCS(i,j) = { LCS(i-1,j-1) + 1, if X(i) = Y(j)
{
{ max(LCS(i-1,j), LCS(i,j-1)), otherwise
Intuitively, the two sequences are scanned in parallel. If the current two cells are identical, the length of the longest common subsequence increases by one; otherwise, there are two possibilities to consider recursively, after deleting the current cell from either one input sequence or the other, and the length of the longest common subsequence is simply the greater of the two possibilities.
Once the matrix of longest common subsequence lengths has been calculated, the longest common subsequence itself can be recovered by noting each point where the length “bumps” to the next lower value along the diagonal, starting at the lower right-hand corner: from 4 to 3 when both sequences are at I, from 3 to 2 when both sequences are at A, from 2 to 1 when both sequences are at R, and from 1 to 0 when both sequences are at P.
Note that the longest common sequence is not necessarily unique. For instance, given the two sequences ABC and BAC, there are two longest common subsequences, AC and BC. Either answer is correct.
Your task is to write a function that takes two sequences and returns their longest common subsequence. When you are finished, you are welcome to read or run a suggested solution, or to post your solution or discuss the exercise in the comments below.
Ternary Search Tries
June 5, 2009
Several of our exercises (Mark V. Shaney, Word Frequencies, Dodgson’s Doublets, Anagrams) have used hash tables based on string keys. When the keys are strings, a useful alternative to hash tables is a data structure called a ternary search trie, which can be faster than a hash table because it takes advantage of the fact that keys are strings, and also provides in-order access to the keys. Ternary search tries were introduced by Jon Bentley and Robert Sedgewick at the 1997 SODA conference.
As the name implies, ternary search tries are a cross between binary search trees and radix tries. Ternary search tries consist of recursive nodes and leaves, just like binary search trees and radix tries. As the name implies, a node of a ternary search trie has three children, one for lesser children, one for greater children, and a third for equal children. Searches proceed character-by-character, as with a trie. At each level of the trie, the search compares the current character of the search string with the character stored in the node. If the search character is less, the search continues at the less-than child, and if the search character is greater, the search continues at the greater-than child. However, when the two characters are equal, the search continues recursively at the equal-to child, proceeding to the next character in the search string.
The picture below, taken from the Bentley/Sedgewick SODA paper, shows a ternary search trie containing twelve two-character words:
Your task is to implement the lookup, insert, update, delete and enlist operations on ternary search tries. When you are finished, you are welcome to read or run a suggested solution, or post your solution or discuss the exercise in the comments below.
Pig Latin
June 2, 2009
Pig Latin is an English-language word game, usually played by children, in which words are mutated systematically. A word that begins with a vowel (a, e, i, o, or u) has the syllable “way” added to the end; for instance, art becomes art-way and eagle becomes eagle-way. A word that begins with one or more consonants has the initial consonants stripped, moved to the end of the word, then the syllable “ay” is added; for instance, start becomes art-stay and door becomes oor-day. A hyphen is added as shown above as an aid to prevent ambiguity; otherwise, a word like aspray could be the translation of spray (ay-spray) or prays (ays-pray). Even so, some words remain ambiguous; art-way could be translated as either art or wart.
Your task is to write functions that translate English to Pig Latin and Pig Latin to English. When you are finished, you may read or run a suggested solution, or post your solution or discuss the exercise in the comments below.