Linear Regression

June 10, 2016

Linear regression is a widely-used statistical technique for relating two sets of variables, traditionally called x and y; the goal is to find the line-of-best-fit, y = m x + b, that most closely relates the two sets. The formulas for computing the line of best fit are:

m = (n × Σxy − Σx × Σy) ÷ (n × Σx2 − (Σx)2)

b = (Σym × Σx) ÷ n

You can find those formulas in any statistics textbook. As an example, given the sets of variables

x    y
60   3.1
61   3.6
62   3.8
63   4.0
65   4.1

the line of best fit is y = 0.1878 x − 7.9635, and the estimated value of the missing x = 64 is 4.06.

Your task is to write a program that calculates the slope m and intercept b for two sets of variables x and y. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2