Linear Regression
June 10, 2016
Linear regression is a widely-used statistical technique for relating two sets of variables, traditionally called x and y; the goal is to find the line-of-best-fit, y = m x + b, that most closely relates the two sets. The formulas for computing the line of best fit are:
m = (n × Σxy − Σx × Σy) ÷ (n × Σx2 − (Σx)2)
b = (Σy − m × Σx) ÷ n
You can find those formulas in any statistics textbook. As an example, given the sets of variables
x y 60 3.1 61 3.6 62 3.8 63 4.0 65 4.1
the line of best fit is y = 0.1878 x − 7.9635, and the estimated value of the missing x = 64 is 4.06.
Your task is to write a program that calculates the slope m and intercept b for two sets of variables x and y. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.
Here’s a solution in matlab. The same approach can be used for multiple regression, but X would have additional columns for the additional variables.
Output:
Shouldn’t the calculation for b be:
b = (Σy / n – m × Σx / n)
Fixed. Thank you.
Thanks for your website!
Doing it in APL felt like cheating.
∇X LINREG Y
M←(((⍴X)×+/X×Y)-((+/X)×+/Y))÷((⍴X)×+/X*2)-(+/X)*2
B←((+/Y)-M×+/X)÷⍴X
∆