The slope
Why not to use Ms. Smith way to calculate the slope for the regression model?. Well, we use the same principle, difference in Y divided difference in X, but we cannot use the same formula because in the case of a regression model we have more than two points.
There are numerous ways to calculate the slope, \(m\), of a linear regression model. However, the simplest is:
\[\begin{equation} Slope = m = \frac{\sum(x-\bar{x})*(y-\bar{y})}{\sum(x-\bar{x})^2} \end{equation}\]
We have seen those terms before. The numerator was included in the covariance (i.e., how two variables trend together) and the denominator was included in the variance (i.e., how disperse are the data in one variable).
If you think about it…that equation speaks by itself.
As we mentioned earlier, the slope of any line can be described as the change in Y divided by the change in X.
From Chapter 5, the Section on Dispersion, you may recall that the best indicator of the variability in a variable was the variance, which has as term \(\sum(x-\bar{x})^2\). From chapter 6, you may recall that the best indicator of the tendency between two variables was the covariance \(\sum(x-\bar{x})*(y-\bar{y})\).
As such, we if want the slope among a set of points that follows their central tendency, then the change in X will be \(\sum(x-\bar{x})^2\), and the change in Y will be how Y varies with X, which mathematically is \(\sum(x-\bar{x})*(y-\bar{y})\).
You can phrase the equation for the slope in the regression line in a different way, if X changes by the variance of X, then Y will change by the covariance of Y and X.
Ok, now that the equation for the slope is clear, lets calculate it.
Let’s use the data we have been using on the time studying and grades,
Names | Hours Studying | Grade | \[(x-\bar{x})\] | \[(y-\bar{y})\] |
---|---|---|---|---|
Peter | 0.5 | 55 | -2.1 | -19.2 |
Laura | 1.8 | 64 | -0.8 | -10.2 |
John | 2.4 | 75 | -0.2 | 0.8 |
Chip | 3.8 | 82 | 1.2 | 7.8 |
Tom | 4.5 | 95 | 1.9 | 20.8 |
a a
So, all we have to do is to replace the differences in X and the difference in Y in the slope formula:
\[\begin{equation} slope = m = \frac{(-2.1*-19.2) + (-0.8*-10.2) + (-0.2*0.8) + (1.2*7.8) +(1.9*20.8)} { (-2.1)^2 + (-0.8)^2 + (-0.2)^2 + (1.2)^2 +(1.9)^2} \end{equation}\]
\[\begin{equation} slope = m = 9.6 \end{equation}\]
So the slope, m, is equal to 9.59. The units will be the units in Y (i.e., grade), divided by the units of x (i.e., hours studying). So if the unit of change in X in one hour then the unit of change in Y will be 9.59 points higher in grade.
Put another way, for each extra hour that you study a week, you can expect a 9.59 points higher in your grade, neat or what?.
I should mention that there as several other ways to calculate the slope of the linear regression model, but I find them a bit more complicated and difficult understand. I prefer to use the simple formula above, but be aware there are a few other ways to get to the slope of the least-squares line.