Natural Join

June 15, 2010

In relational databases, the natural join of two tables is an operator that takes two tables as input and returns a new table that combines the two input tables on their common keys. If the input tables are sorted, the join simply scans the two tables, writing records for the cross-product of all records with equal keys. For instance, the join of the two tables

Key Field1 Field2 Key Field3 A w p A 1 B x q and A 2 B y r B 3 C z s

is the table

Key Field1 Field2 Field3 A w p 1 A w p 2 B x q 3 B y r 3

We represent a table as a file; each line is a record, and fields are separated by tabs. For simplicity, we’ll assume that the first field in each record is the key.

Your task is to write a program that takes two input files representing tables (you may assume they are sorted) and produces their natural join as output. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Posted by programmingpraxis

Filed in Exercises

5 Comments »

5 Responses to “Natural Join”

Programming Praxis – Natural Join « Bonsai Code said
June 15, 2010 at 12:30 PM
[…] Praxis – Natural Join By Remco Niemeijer In today’s Programming Praxis exercise we have to implement a program that joins two files with tables in […]

Remco Niemeijer said

June 15, 2010 at 12:31 PM

My Haskell solution (see http://bonsaicode.wordpress.com/2010/06/15/programming-praxis-natural-join/ for a version with comments):

import Data.List
import Data.List.Split
import System.Environment

loadFile :: FilePath -> IO [[String]]
loadFile = fmap (map (splitOn "\t") . lines) . readFile

join :: Eq a => [[a]] -> [[a]] -> [[a]]

join (ks1:f1) (ks2:f2) = union ks1 ks2 :
    [k1 : v1++v2 | (k1:v1) <- f1, (k2:v2) <- f2, k1 == k2]
join _        _        = []

main :: IO ()
main = mapM_ putStrLn . map (intercalate "\t") .
       foldl1 join =<< mapM loadFile =<< getArgs

kbob said

June 15, 2010 at 12:59 PM

Here’s a Python solution. It uses Python’s defaultdict(list) as a multimap from keys to records. make_table() and join_tables() do the work; the rest is just I/O.

from collections import defaultdict

def make_table(records):
    d = defaultdict(list)
    for rec in records:
        d[rec[0]].append(rec[1:])
    return d

def join_tables(t1, t2):
    return [[k] + r1 + r2
            for k in t1
            for r1 in t1[k]
            for r2 in t2[k]]
         
def read_file(fname):
    with open(fname) as f:
        return [line.strip('\n').split('\t') for line in f]

def read_table(fname):
    return make_table(read_file(fname))

def join_files(fn1, fn2):
    return join_tables(read_table(fn1), read_table(fn2))

def print_records(records):
    for rec in records:
        print '\t'.join(str(f) for f in rec)

print_records(join_tables(read_table('t1'), read_table('t2')))

Elvis Montero said
June 19, 2010 at 1:22 AM
I’ve posted my Java solution here. The bulk of the code is I/O and the definition of rows and columns as full-fledged classes.
Vikas Tandi said
April 25, 2011 at 8:32 AM
my solution in C
http://codepad.org/k8tw70k0

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Programming Praxis

Natural Join

June 15, 2010

5 Responses to “Natural Join”

Leave a comment

Categories

Archives

Archives

Programming Praxis

Natural Join

June 15, 2010

Share this:

Related

5 Responses to “Natural Join”

Leave a comment

Categories

Archives

Archives