Visualizations and the Grammar of Graphics

MACS 33001 University of Chicago

ID \(N\) \(\bar{X}\) \(\bar{Y}\) \(\sigma_{X}\) \(\sigma_{Y}\) \(R\)
1 142 54.26327 47.83225 16.76514 26.93540 -0.0644719
2 142 54.26610 47.83472 16.76983 26.93974 -0.0641284
3 142 54.26144 47.83025 16.76590 26.93988 -0.0617148
4 142 54.26993 47.83699 16.76996 26.93768 -0.0694456
5 142 54.26015 47.83972 16.76996 26.93000 -0.0655833
6 142 54.26734 47.83955 16.76896 26.93027 -0.0629611
7 142 54.26881 47.83545 16.76670 26.94000 -0.0685042
8 142 54.26030 47.83983 16.76774 26.93019 -0.0603414
9 142 54.26732 47.83772 16.76001 26.93004 -0.0683434
10 142 54.26873 47.83082 16.76924 26.93573 -0.0685864
11 142 54.26588 47.83150 16.76885 26.93861 -0.0686092
12 142 54.26785 47.83590 16.76676 26.93610 -0.0689797
13 142 54.26692 47.83160 16.77000 26.93790 -0.0665752

Grammar

The whole system and structure of a language or of languages in general, usually taken as consisting of syntax and morphology (including inflections) and sometimes also phonology and semantics.

Grammar of graphics

  • “The fundamental principles or rules of an art or science”
  • A grammar used to describe and create a wide range of statistical graphics
  • Layered grammar of graphics
    • ggplot2

Layered grammar of graphics

  • Layer
    • Data
    • Mapping
    • Statistical transformation (stat)
    • Geometric object (geom)
    • Position adjustment (position)
  • Scale
  • Coordinate system (coord)
  • Faceting (facet)
  • Defaults
    • Data
    • Mapping

Layer

  • Responsible for creating the objects that we perceive on the plot
  • Defined by its subcomponents

Data and mapping

  • Data defines the source of the information to be visualized
  • Mapping defines how the variables are applied to the graphic

Data: mpg

## # A tibble: 234 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    cla…
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <ch>
##  1 audi         a4      1.8  1999     4 auto… f        18    29 p     com…
##  2 audi         a4      1.8  1999     4 manu… f        21    29 p     com…
##  3 audi         a4      2    2008     4 manu… f        20    31 p     com…
##  4 audi         a4      2    2008     4 auto… f        21    30 p     com…
##  5 audi         a4      2.8  1999     6 auto… f        16    26 p     com…
##  6 audi         a4      2.8  1999     6 manu… f        18    26 p     com…
##  7 audi         a4      3.1  2008     6 auto… f        18    27 p     com…
##  8 audi         a4 q…   1.8  1999     4 manu… 4        18    26 p     com…
##  9 audi         a4 q…   1.8  1999     4 auto… 4        16    25 p     com…
## 10 audi         a4 q…   2    2008     4 manu… 4        20    28 p     com…
## # ... with 224 more rows

Data: mpg

## # A tibble: 234 x 2
##    displ   hwy
##    <dbl> <int>
##  1   1.8    29
##  2   1.8    29
##  3   2      31
##  4   2      30
##  5   2.8    26
##  6   2.8    26
##  7   3.1    27
##  8   1.8    26
##  9   1.8    25
## 10   2      28
## # ... with 224 more rows

Mapping: mpg

## # A tibble: 234 x 2
##        x     y
##    <dbl> <int>
##  1   1.8    29
##  2   1.8    29
##  3   2      31
##  4   2      30
##  5   2.8    26
##  6   2.8    26
##  7   3.1    27
##  8   1.8    26
##  9   1.8    25
## 10   2      28
## # ... with 224 more rows

Statistical transformation (stat)

  • Transforms the data (typically by summarizing the information)

Raw data

## # A tibble: 234 x 1
##      cyl
##    <int>
##  1     4
##  2     4
##  3     4
##  4     4
##  5     6
##  6     6
##  7     6
##  8     4
##  9     4
## 10     4
## # ... with 224 more rows

Transformed data

## # A tibble: 4 x 2
##     cyl     n
##   <int> <int>
## 1     4    81
## 2     5     4
## 3     6    79
## 4     8    70

Transformed data

Geometric objects (geoms)

  • Control the type of plot you create
    • 0 dimensions - point, text
    • 1 dimension - path, line
    • 2 dimensions - polygon, interval
  • Geoms have specific aesthetics
    • Point geom - position, color, shape, and size
    • Bar geom - position, height, width, and fill

Position adjustment

Position adjustment

Position adjustment

Position adjustment

Scale

  • Controls the mapping from data to aesthetic attributes

Scale: color

Scale: color

Coordinate system (coord)

  • Maps the position of objects onto the plane of the plot

Cartesian coordinate system

Semi-log

Polar

Faceting

Defaults

ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point", stat = "identity", position = "identity"
  ) +
  scale_x_continuous() +
  scale_y_continuous() +
  coord_cartesian()

Defaults

ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point", stat = "identity", position = "identity"
  ) +
  scale_x_continuous() +
  scale_y_continuous() +
  coord_cartesian()
ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point"
  )

Defaults

ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point", stat = "identity", position = "identity"
  ) +
  scale_x_continuous() +
  scale_y_continuous() +
  coord_cartesian()
ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point"
  )
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point()

Defaults

ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point", stat = "identity", position = "identity"
  ) +
  scale_x_continuous() +
  scale_y_continuous() +
  coord_cartesian()
ggplot() +
  layer(
    data = mpg, mapping = aes(x = displ, y = hwy),
    geom = "point"
  )
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
  geom_point()
ggplot(mpg, aes(displ, hwy)) +
  geom_point()

Defaults

ggplot(mpg, aes(displ, hwy)) +
  geom_point()

Defaults

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  geom_smooth()

Defaults

ggplot(mpg) +
  geom_point(aes(displ, hwy)) +
  geom_smooth()
## Error: stat_smooth requires the following missing aesthetics: x, y

Carte figurative des pertes successives en hommes de l’Armee Français dans la campagne de Russe 1812–1813 by Charles Joseph Minard

Building Minard’s map in R

troops
## # A tibble: 51 x 5
##     long   lat survivors direction group
##    <dbl> <dbl>     <int> <chr>     <int>
##  1  24    54.9    340000 A             1
##  2  24.5  55      340000 A             1
##  3  25.5  54.5    340000 A             1
##  4  26    54.7    320000 A             1
##  5  27    54.8    300000 A             1
##  6  28    54.9    280000 A             1
##  7  28.5  55      240000 A             1
##  8  29    55.1    210000 A             1
##  9  30    55.2    180000 A             1
## 10  30.3  55.3    175000 A             1
## # ... with 41 more rows
cities
## # A tibble: 20 x 3
##     long   lat city          
##    <dbl> <dbl> <chr>         
##  1  24    55   Kowno         
##  2  25.3  54.7 Wilna         
##  3  26.4  54.4 Smorgoni      
##  4  26.8  54.3 Moiodexno     
##  5  27.7  55.2 Gloubokoe     
##  6  27.6  53.9 Minsk         
##  7  28.5  54.3 Studienska    
##  8  28.7  55.5 Polotzk       
##  9  29.2  54.4 Bobr          
## 10  30.2  55.3 Witebsk       
## 11  30.4  54.5 Orscha        
## 12  30.4  53.9 Mohilow       
## 13  32    54.8 Smolensk      
## 14  33.2  54.9 Dorogobouge   
## 15  34.3  55.2 Wixma         
## 16  34.4  55.5 Chjat         
## 17  36    55.5 Mojaisk       
## 18  37.6  55.8 Moscou        
## 19  36.6  55.3 Tarantino     
## 20  36.5  55   Malo-Jarosewii

Minard’s grammar

  • Troops
    • Latitude
    • Longitude
    • Survivors
    • Advance/retreat
  • Cities
    • Latitude
    • Longitude
    • City name

plot_troops <- ggplot(data = troops,
                      mapping = aes(x = long, y = lat)) +
  geom_path(aes(size = survivors,
                color = direction,
                group = group))
plot_troops

plot_both <- plot_troops + 
  geom_text(data = cities, mapping = aes(label = city), size = 4)
plot_both

plot_polished <- plot_both +
  scale_size(range = c(0, 12),
             breaks = c(10000, 20000, 30000),
             labels = c("10,000", "20,000", "30,000")) + 
  scale_color_manual(values = c("tan", "grey50")) +
  coord_map() +
  labs(title = "Map of Napoleon's Russian campaign of 1812",
       x = NULL,
       y = NULL)
plot_polished

plot_polished +
  theme_void() +
  theme(legend.position = "none")