September 27, 2011

In today’s exercise we calculate some of the basic measures in statistics: mean, standard deviation, linear regression, and correlation. The only hard part is that different sources use different standard names to refer to the different statistics. The formulas are shown below; all the summations are over $i$ from 1 to the number of items $n$:

mean: \mu = \bar{x} = \frac{1}{n} \sum x_i

standard deviation: \sigma = s = \sqrt{\frac{1}{n} \sum (x_i - \mu)^2}

linear regression: y = mx+b = \hat{\beta}x + \hat{\alpha}

slope: m = \hat{\beta} = \frac{n \sum x_i y_i - \sum x_i \sum y_i}{n \sum x_i^2 - \left(\sum x_i\right)^2}

intercept: b = \hat{\alpha} = \frac{1}{n} \sum y - \hat{\beta} \frac{1}{n} \sum x

correlation: r = \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{(n-1) s_x s_y}

Your task is to write functions to compute these basic statistics. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Pages: 1 2