Linear Regression

June 10, 2016

Linear regression is a widely-used statistical technique for relating two sets of variables, traditionally called x and y; the goal is to find the line-of-best-fit, y = m x + b, that most closely relates the two sets. The formulas for computing the line of best fit are:

m = (n × Σxy − Σx × Σy) ÷ (n × Σx2 − (Σx)2)

b = (Σym × Σx) ÷ n

You can find those formulas in any statistics textbook. As an example, given the sets of variables

x    y
60   3.1
61   3.6
62   3.8
63   4.0
65   4.1

the line of best fit is y = 0.1878 x − 7.9635, and the estimated value of the missing x = 64 is 4.06.

Your task is to write a program that calculates the slope m and intercept b for two sets of variables x and y. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2

5 Responses to “Linear Regression”

  1. Daniel said

    Here’s a solution in matlab. The same approach can be used for multiple regression, but X would have additional columns for the additional variables.

    x = [60;61;62;63;65];
    y = [3.1;3.6;3.8;4.0;4.1];
    
    X = [ones(size(x,1),1), x];
    
    w = X \ y
    
    y_hat = [1,64] * w
    

    Output:

    w =
       -7.9635
        0.1878
    
    y_hat =
        4.0581
    
  2. Jeff said

    Shouldn’t the calculation for b be:

    b = (Σy / n – m × Σx / n)

  3. programmingpraxis said

    Fixed. Thank you.

  4. Jeff said

    Thanks for your website!

  5. davor said

    Doing it in APL felt like cheating.

    ∇X LINREG Y
    M←(((⍴X)×+/X×Y)-((+/X)×+/Y))÷((⍴X)×+/X*2)-(+/X)*2
    B←((+/Y)-M×+/X)÷⍴X

Leave a comment