Statistics

September 27, 2011

In today’s exercise we calculate some of the basic measures in statistics: mean, standard deviation, linear regression, and correlation. The only hard part is that different sources use different standard names to refer to the different statistics. The formulas are shown below; all the summations are over $i$ from 1 to the number of items $n$:

mean: $\mu = \bar{x} = \frac{1}{n} \sum x_i$

standard deviation: $\sigma = s = \sqrt{\frac{1}{n} \sum (x_i - \mu)^2}$

linear regression: $y = mx+b = \hat{\beta}x + \hat{\alpha}$

slope: $m = \hat{\beta} = \frac{n \sum x_i y_i - \sum x_i \sum y_i}{n \sum x_i^2 - \left(\sum x_i\right)^2}$

intercept: $b = \hat{\alpha} = \frac{1}{n} \sum y - \hat{\beta} \frac{1}{n} \sum x$

correlation: $r = \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{(n-1) s_x s_y}$

Your task is to write functions to compute these basic statistics. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

Advertisement

Pages: 1 2