Part 7 Describing the data
7.1 The summary function
The summary()
function is convenient. We can describe all of the variables in a data frame using the function on our data frame object, dat
:
## PID Lik1 Lik2 Lik3 Lik4
## Min. : 1.00 Min. :1.00 Min. :1.00 Min. :2 Min. :1.00
## 1st Qu.: 3.25 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:3 1st Qu.:1.25
## Median : 5.50 Median :3.00 Median :2.00 Median :3 Median :3.00
## Mean : 5.50 Mean :3.10 Mean :2.70 Mean :3 Mean :2.70
## 3rd Qu.: 7.75 3rd Qu.:4.75 3rd Qu.:3.75 3rd Qu.:3 3rd Qu.:3.75
## Max. :10.00 Max. :5.00 Max. :5.00 Max. :4 Max. :5.00
## Lik5 Teacher
## Min. :1.00 Length:10
## 1st Qu.:2.00 Class :character
## Median :2.50 Mode :character
## Mean :2.80
## 3rd Qu.:3.75
## Max. :5.00
![]() Also, we can see that Teacher is not a numeric variable. The results of the summary function with this variable do not return the mean, median and so forth because those would not make sense with non-numeric data. Instead, the frequency of observations in each category of this variable is reported. |
We can apply the summary()
function to a single variable:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 3.00 3.10 4.75 5.00
We can use many functions on subsets of data, using indexing. Which rows are included in the following summary function?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 4.000 3.667 4.500 5.000
Let’s use summary()
on the Likerts
object we created above.17
## Lik1 Lik2 Lik3 Lik4 Lik5
## Min. :1.00 Min. :1.00 Min. :2 Min. :1.00 Min. :1.00
## 1st Qu.:2.00 1st Qu.:2.00 1st Qu.:3 1st Qu.:1.25 1st Qu.:2.00
## Median :3.00 Median :2.00 Median :3 Median :3.00 Median :2.50
## Mean :3.10 Mean :2.70 Mean :3 Mean :2.70 Mean :2.80
## 3rd Qu.:4.75 3rd Qu.:3.75 3rd Qu.:3 3rd Qu.:3.75 3rd Qu.:3.75
## Max. :5.00 Max. :5.00 Max. :4 Max. :5.00 Max. :5.00
7.2 Specific functions for summarizing data
We can also perform functions like min()
, max()
, median()
and assign them to objects that we can later use:
![]() minX , maxX , and medX . |
Let’s view the objects in the console by typing them and running them:
Here’s the output:
## [1] 1
## [1] 3
## [1] 5
Let’s subtract minX
from maxX
to have a look at the range:
## [1] 4
There are many functions we can use. For example, we can get the mean, variance, and standard deviation of a variable using these functions:
## [1] 3.1
## [1] 2.766667
## [1] 1.66333
A word of caution in working with Likert-type data such as these is worth our attention. We should probably not estimate the mean, variance, and standard deviation with individual Likert-type variables that have fewer than five categories because these types of functions assume the variables are continuous—that is, we are assuming that the conceptual distance between any two neighboring points on the numbered scale is the same as the distance between any other two consecutively numbered points.18
Remember that we do not use quotes with objects (but that we did with names of variables).↩︎
Mean and variance are more appropriate with continuous, interval-level, variables than with ordinal level variables. Next in this tutorial, we create composite scores, which we can probably more comfortably treat as continuous.↩︎