## Linear Regression

### June 10, 2016

Linear regression is a widely-used statistical technique for relating two sets of variables, traditionally called x and y; the goal is to find the line-of-best-fit, y = m x + b, that most closely relates the two sets. The formulas for computing the line of best fit are:

m = (n × Σxy − Σx × Σy) ÷ (n × Σx2 − (Σx)2)

b = (Σym × Σx) ÷ n

You can find those formulas in any statistics textbook. As an example, given the sets of variables

```x    y
60   3.1
61   3.6
62   3.8
63   4.0
65   4.1```

the line of best fit is y = 0.1878 x − 7.9635, and the estimated value of the missing x = 64 is 4.06.

Your task is to write a program that calculates the slope m and intercept b for two sets of variables x and y. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

### 5 Responses to “Linear Regression”

1. Daniel said

Here’s a solution in matlab. The same approach can be used for multiple regression, but X would have additional columns for the additional variables.

```x = [60;61;62;63;65];
y = [3.1;3.6;3.8;4.0;4.1];

X = [ones(size(x,1),1), x];

w = X \ y

y_hat = [1,64] * w
```

Output:

```w =
-7.9635
0.1878

y_hat =
4.0581
```
2. Jeff said

Shouldn’t the calculation for b be:

b = (Σy / n – m × Σx / n)

3. programmingpraxis said

Fixed. Thank you.

4. Jeff said

Thanks for your website!

5. davor said

Doing it in APL felt like cheating.

∇X LINREG Y
M←(((⍴X)×+/X×Y)-((+/X)×+/Y))÷((⍴X)×+/X*2)-(+/X)*2
B←((+/Y)-M×+/X)÷⍴X