Sum, Revisited

October 21, 2011

We looked at the Unix SysV sum command in a previous exercise. At the time, my intention was to do the BSD version of the sum command immediately, in the next exercise, but for some reason I failed to do so. That oversight is corrected in today’s exercise.

The original SysV sum command simply calculated the sum of all the bytes in the file, modulo 216. Thus, it failed to distinguish two files in which the order of the bytes was changed, null bytes were added, or the bytes were manipulated so that additions to one byte were subtracted from another byte. Although the BSD algorithm isn’t perfect, it identifies all those differences by summing 2-byte words, rotating the accumulating sum one bit to the right after each step.

Today’s exercise is to implement the standard Unix sum command that calculates either of these checksums:

NAME

sum -- checksum and count the blocks in a file

SYNOPSIS

sum [-r | -s] [file ...]

DESCRIPTION

Print checksum and block counts for each file. The BSD algorithm is used unless the System V algorithm is specified.

-r -- Use BSD sum algorithm with 1024-bit blocks.

-s -- Use System V sum algorithm with 512-bit blocks.

When file is - or is not given, read standard input.

Your task is to write the Unix V7 sum command as specified above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

One Response to “Sum, Revisited”

  1. Jebb said
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    
    struct commandArgs {
    	void (*sumFn)(FILE *);	/* Checksum function to be called */
    	char **inputFiles;
    	int numInputFiles;
    };
    static const char *optString = "rsh?";
    
    void SysVsum(FILE *fp);
    void BSDsum(FILE *fp);
    void show_usage();
    
    int main(int argc, char *argv[])
    {
    	int option = 0;
    	struct commandArgs myArgs = {SysVsum, NULL, 0};
    	FILE *fp;
    
    	option = getopt(argc, argv, optString);
    	while (option != -1) {
    		switch(option) {
    			case 'r':
    				myArgs.sumFn = BSDsum;
    				break;
    			case 's':
    				myArgs.sumFn = SysVsum;
    				break;
    			case 'h':
    			case '?':
    				show_usage();
    				break;
    			default:
    				break;
    		}
    		option = getopt(argc, argv, optString);
    	}
    	myArgs.inputFiles = argv + optind;
    	myArgs.numInputFiles = argc - optind;
    
    	if (myArgs.numInputFiles == 0)
    		myArgs.sumFn(stdin);
    	else 
    		while (myArgs.numInputFiles-- > 0) 
    			if ((fp = fopen(*(myArgs.inputFiles), "r"))== NULL) {
    				fprintf(stderr, 
    				        "couldn't open file %s\n",
    				        *(myArgs.inputFiles));
    				exit(EXIT_FAILURE);
    			}
    			else {
    				myArgs.sumFn(fp);
    				fclose(fp);
    				myArgs.inputFiles++;
    			}
    	return 0;
    }
    
    void show_usage()
    {
    	printf("jebb_sum [-r|-s] (<files>)\n");
    	exit(EXIT_FAILURE);
    }
    
    void SysVsum(FILE *fp)
    {
    	unsigned int bytes_sum;
    	unsigned int bytes_count;
    	int next_byte;
    	bytes_count = bytes_sum = 0;
    	while ((next_byte = getc(fp)) != EOF) {
    		bytes_sum = (bytes_sum + next_byte) % 65535;
    		bytes_count++;
    	}
    	printf("%d %d\n", bytes_sum, 1 + bytes_count / 512); 
    }
    
    void BSDsum(FILE *fp)
    {
    	unsigned int bytes_sum;
    	unsigned int bytes_count;
    	int next_byte;
    	bytes_count = bytes_sum = 0;
    	while ((next_byte = getc(fp)) != EOF) {
    		/* From the man page on Snow Leopard:
    		 * 16-bit checksum with right rotation before each addition */
    		bytes_sum = ((bytes_sum >> 1) + 0x8000 * (bytes_sum & 1));
    		/* overflow is discarded */
    		bytes_sum = (bytes_sum + next_byte) & 0xFFFF;
    		bytes_count++;
    	}
    	/* block size is 1024 bits in the BSD sum */
    	printf("%d %d\n", bytes_sum, 1 + bytes_count / 1024); 
    }

Leave a comment