5 Descriptive statistics

Ok, guys, by now you know how:

  1. to set a question,
  2. reformulate the question based on your literature review,
  3. set up an error proof experiment,
  4. enter and manipulate data in R and
  5. do some basic plots in R.

Nice, ah?.

Now we move on to the analysis of any data you collect. That process is called inference; basically, drowning a conclusion based on the numbers and your interpretation of those numbers.

iii

Inference is defined as the process of drawing conclusions based on evidence and reasoning.

Once you finish an experiment and collect your data, there are a few things to do:

First, you need visualize the data, using plots you learned how to do in the prior chapter.

Second, you need to use a set of available metrics to describe your data in general. You need to get the big picture first. Those BIG picture metrics are called DESCRIPTIVE STATISTICS.

Say you get hired to analyze the visiting times of costumers in a store. The manager wants to ensure his costumers are always being attended to so he wants to hire more people at pick hours.

To tackled this problem, you probably want to record the time people walk into the store. Let’s say 100 people came in a day.

If you want to report your findings to the manager, you cannot just go and tell him, the first costumer came at 7 am, the second at 7:10, the third at 8:20 am….(one hour later)…and the 100th costumer came in at 7:00pm.

Very likely there is a pattern in that data, but the manager cannot really perceived it when the data are presented in raw form. So the manager cannot really make any decision to solve his problem based on what you did, so he may not have a reason to keep you.

Really what you want to do is to describe that data for him, as to facilitate his decision.

The metrics used to describe data overall are called descriptive statistics. Such metrics are commonly divided into those dealing with what is at the middle of your data (the so-call metrics of central tendency) and those dealing with how variable your data are (the so-call metrics of dispersion).