Outliers

One of the critical issues with regression models, is that they can be influenced by extreme points. Those extreme points that clearly do not follow the main pattern are called outliers.

Sometimes, those outliers could be measurement errors, but at times could also indicate the influence of variables that you did not measure. It is always good to visualize the data in an scatterplot to see if such cases examples of outliers exist on your data.

Outliers can also be tested mathematically, by re-running the linear model without them, and check for the effect of removing them on the linear model.

There are several approaches to test if such an effect of removing the outlier is significant or not in the linear model, which we will not cover here. But you need to know.

The outlier

Figure 7.9: The outlier