Due before class on October 16th.
hw02
repositoryGo here to fork the repo for homework 02.
FiveThirtyEight, a data journalism site devoted to politics, sports, science, economics, and culture, recently published a series of articles on gun deaths in America. Gun violence in the United States is a significant political issue, and while reducing gun deaths is a noble goal, we must first understand the causes and patterns in gun violence in order to craft appropriate policies. As part of the project, FiveThirtyEight collected data from the Centers for Disease Control and Prevention, as well as other governmental agencies and non-profits, on all gun deaths in the United States from 2012-2014.
I have included this dataset in the rcfss
library on GitHub. To install the package, use the command devtools::install_github("uc-cfss/rcfss")
in R. If you don’t already have the devtools
library installed, you will get an error. Go back and install this first using install.packages()
, then install rcfss
. The gun deaths dataset can be loaded using data("gun_deaths")
. Use the help function in R (?gun_deaths
) to get detailed information on the variables and coding information.
kable()
table.1
, 2
, 3
.Answer the following questions. Generate appropriate figures/tables to support your conclusions.
While you are practicing data analysis, your final graphs should be appropriate for sharing with outsiders. That means your graphs should have:
?labs
for details)This is just a starting point. Consider adopting your own color scales, taking control of your legends (if any), playing around with themes, etc.
When presenting tabular data (aka dplyr::summarize()
), make sure you format it correctly. Use the kable()
function from the knitr
package to format the table for the final document. For instance, this is a poorly presented table summarizing where gun deaths occurred:
library(tidyverse)
library(knitr)
library(rcfss)
# calculate total gun deaths by location
count(gun_deaths, place)
## # A tibble: 11 x 2
## place n
## <chr> <int>
## 1 Farm 470
## 2 Home 60486
## 3 Industrial/construction 248
## 4 Other specified 13751
## 5 Other unspecified 8867
## 6 Residential institution 203
## 7 School/instiution 671
## 8 Sports 128
## 9 Street 11151
## 10 Trade/service area 3439
## 11 <NA> 1384
Instead, use kable()
to format the table, add a caption, and label the columns:
count(gun_deaths, place) %>%
kable(caption = "Gun deaths in the United States (2012-2014), by location",
col.names = c("Location", "Number of deaths"))
Location | Number of deaths |
---|---|
Farm | 470 |
Home | 60486 |
Industrial/construction | 248 |
Other specified | 13751 |
Other unspecified | 8867 |
Residential institution | 203 |
School/instiution | 671 |
Sports | 128 |
Street | 11151 |
Trade/service area | 3439 |
NA | 1384 |
Run ?kable
in the console to see how additional options.
Note that when viewed on GitHub, table captions will not show up. Just a (missing) feature of Markdown on GitHub 😔
In the rcfss
package, there is a data frame called dadmom
.
## # A tibble: 3 x 5
## famid named incd namem incm
## <dbl> <chr> <dbl> <chr> <dbl>
## 1 1 Bill 30000 Bess 15000
## 2 2 Art 22000 Amy 18000
## 3 3 Paul 25000 Pat 50000
Tidy this data frame so that it adheres to the tidy data principles:
NOTE: You can accomplish this task in a single piped operation using only tidyr
functions. Code which does not use tidyr
functions is acceptable, but will not merit a “check plus” on your evaluation.
Recall the gapminder
data frame we previously explored. That data frame contains just six columns from the larger data in Gapminder World. In this part, you will join the original gapminder
data frame with a new data file containing the HIV prevalence rate in the country.1
The HIV prevalence rate is stored in the data
folder as a CSV file. You need to import and merge the data with gapminder
to answer these two questions:
For each question, you need to perform a specific type of join operation. Think about what type makes the most sense and explain why you chose it.
Your assignment should be submitted as a set of R Markdown documents. Don’t know what an R Markdown document is? Read this! Or this! I have included starter files for you to modify to complete the assignment, so you are not beginning completely from scratch.
Follow instructions on homework workflow. As part of the pull request, you’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc.
More specifically, the estimated number of people living with HIV per 100 population of age group 15-49.↩
This work is licensed under the CC BY-NC 4.0 Creative Commons License.