Data Laundry, Revisited

March 22, 2019

Data laundry is the act of cleaning data, as when it arrives in one format and must be translated to another, or when external data must be checked for validity. Some weeks, as this week, data laundry occupies a significant portion of my time at work, so it’s an exercise worth examining. We looked at data laundry in two previous exercises. Today’s exercise in data laundry comes to us from Stack Overflow:

I have a file like this:

AAKRKA HIST1H1B AAGAGAAKRKATGPP
AAKRKA HIST1H1E RKSAGAAKRKASGPP
AAKRLN ACAT1 LMTADAAKRLNVTPL
AAKRLN SUCLG2 NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA

I would like to kind of merge 2 lines if they are exactly the same in the 1st column. The desired output is:

AAKRKA HIST1H1B,HIST1H1E AAGAGAAKRKATGPP,RKSAGAAKRKASGPP
AAKRLN ACAT1,SUCLG2 LMTADAAKRLNVTPL,NEALEAAKRLNAKEI
AAKRLR GTF2F1 VSEMPAAKRLRLDTG
AAKRMA VCL NDIIAAAKRMALLMA
AAKRPL WIZ YLGSVAAKRPLQEDR
AAKRQK MTA2 SSSQPAAKRQKLNPA

Sometimes there could be more than two lines starting with the same word. How could I reach the desired output with bash/awk?

Your task is to write a program to solve this simple task of data laundry. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisements

Pages: 1 2

Target Sum Subarray

March 19, 2019

Given an array of non-negative integers, write a program that finds the contiguous portion of the array that sums to a given target, or report that there is no such subarray. For instance, in the array 1, 4, 20, 3, 10, 5, target 33 exists in the subarray 20, 3, 10, in the array 1, 4, 0, 0, 3, 10, 5, target 7 exists in the subarray 4, 0, 0, 3, and in the array 1, 4, target 3 does not exist in any subarray.

Your task is to write a program that finds a target sum in an array. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Today’s exercise is simple, but leads me to ask a question. Here is the exercise, taken directly from /r/learnprogramming:

Draw the hierarchy chart and then plan the logic for a program needed by the sales manager of The Henry Used Car Dealership.

  1. The program will determine the profit on any car sold. Input includes the sale price and actual purchase price for a car.
  2. The output is the profit, which is the sale price minus the purchase price.
  3. Use three modules. The main program declares global variables and calls housekeeping, detail, and end-of-job modules.
  4. The housekeeping module prompts for and accepts a sale price. The detail module prompts for and accepts the purchase price, computes the profit, and displays the result. The end-of-job module displays the message Thanks for using this program.

I have written the actual program, but I am struggling with the hierarchy chart. Can someone give me any guidance on this one?

Is this for real? Separate modules for each statement? Why does the detail module get input, compute profit and display the result? If you have to have separate modules to ask for the purchase price and say thank you, why not split getting the sale price, computing the profit, and printing the result into separate modules? Is this how programming students are taught these days? To take a fundamentally simple program and make it ridiculously complex by splitting it into modules? What would David Parnas say about this modularization?

Your task is to write a program to compute profit for The Henry Used Car Dealership; you might also help me understand the way this program is intended to be modularized. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Recursion

March 12, 2019

Although we sometimes discuss rather more advanced topics, one focus of this blog will always be on student programmers, so I monitor several internet sites that cater to questions from learners. I have seen several posts in the last week or so about recursion, so I assume that we have reached the point in the semester where programming students are learning about recursion. Most programming professors seem to think that the factorial function (what is the value of n factorial) and the fibonacci function (what is the nth fibonacci number) are good demonstrations of recursion; personnally, I think it is better to discuss recursion in terms of binary trees.

Your task is to write versions of the factorial and fibonacci functions that use recursion to calculate their result. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

Frequency Counts

March 8, 2019

Today we have a tricky exercise from Geeks For Geeks:

Given an unsorted array of n integers from 1 to n, possibly with some items appearing multiple times and others missing from the array, count the frequency of all items that are present. Your solution must run in O(n) time and take O(1) extra space.

Your task is to write a program to count frequencies as described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

We have something unusual today: a turing machine built from a business card that I learned about on Reddit. I immediately stopped and built my own, and had a lot of fun, and I suspect many of my readers will feel the same way.

Your task is to build your own turing machine. No coding, just playing. Feel free to discuss the exercise in the comments below.

Jumble

March 1, 2019

Last week I gave a rather silly solution to the Scrabble problem. Today’s exercise is my penance for that silliness.

As I’ve mentioned previously, my day job is on a team of programmers that supports our enterprise-wide computer system. I sit in a cube farm, where there is neither audible nor visual privacy. We recently hired a new programmer to replace a retiring team member, and he has a daily calendar on his desk that provides a jumbled series of letters that you have to rearrange to form a word. Yesterday’s puzzle was L T E A D E; most of the puzzles I solve in a few seconds, but that one took several minutes. The calendar appears to have a flaw: the solutions, one day after the next, are in alphabetical order, so I know before I start that the first letter will be E.

Your task is to write a program that solves jumbles. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2