Due before class on November 6th November 13th.
hw05
repositoryGo here to fork the repo for homework 05.
Complete each of the following exercises. Some exercises require an analytical answer, others require you to write code to complete the exercise. When writing your answer to analytical exercises, be sure to use appropriate \(\LaTeX\) mathematical notation.
\[\newcommand{\E}{\mathrm{E}} \newcommand{\Var}{\mathrm{Var}} \newcommand{\sd}{\mathrm{sd}} \newcommand{\Cov}{\mathrm{Cov}}\]
Let \(X\) be uniformly distributed in the unit interval \([0,1]\). Consider the random variable \(Y = g(X)\), where \[g(x) = \left\{ \begin{array}{ll} 1 & \quad \text{if } x \leq 1/3 \\ 2 & \quad \text{if } x > 1/3 \end{array} \right.\]
Find the expected value of \(Y\) by first calculating its probability mass function (PMF). Verify the result using the expected value rule.Let \(X\) be a random variable with PDF \[ f_X(x) = \left\{ \begin{array}{ll} \dfrac{x}{4} & \quad \text{if } 1 < x \leq 3 \\ 0 & \quad \text{otherwise} \end{array} \right. \]
and let \(A\) be the event \(\{ X \geq 2 \}\).
mean_lik()
for the log-likelihood of the sample mean. This function should takes a vector of guesses for the value of the sample mean and a vector of data for which we want to estimate the sample mean. The function should return the resulting log-likelihoods for a given guess of the sample mean.trade.csv
data file from the data/
folder.income
variable in the trade
data frame. Plot the results with the parameter guesses on the x-axis and the resulting log-likelihoods on the y-axis. Label the axes in the graph and provide a title.optim()
and your function mean_lik()
, find the maximum value of the log-likelihood function. At what parameter value is the log-likelihood of the sample mean of income
maximized?mean()
function on income
. Do you get the same estimate from optim()
and mean()
?Now let’s use this same basic approach to examine the likelihood for logistic regression, a type of regression sometimes employed instead of ordinary least squares (OLS) when a dependent variable is assumed to be drawn from a Bernoulli probability distribution.
Consider the following log-likelihood function for a logistic regression:
\[ \begin{align} L(\beta | y_i, x_{1i}, x_{2i}) &= \prod_{i=1}^n \left[ \left( \frac{1}{1 + \exp(-(\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}))} \right)^{y_i} \left(1 - \frac{1}{1 + \exp(-(\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}))} \right)^{1 - y_i} \right] \\ \log L(\beta | y_i, x_{1i}, x_{2i}) &= \sum_{i=1}^n \left[ -y_i \log(1 + \exp (-(\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}))) + (1 - y_i) \log \left(1 - \frac{1}{1 + \exp(-(\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}))} \right)^{1 - y_i} \right] \end{align} \]
where \(y\) is a vector containing the dependent variable, \(x_1\) is a vector containing the first independent variable, and \(x_2\) is a vector containing the second independent variable.
llik_logistic()
that accepts as arguments a vector of three parameter values, a data frame, and some method for identifying the dependent and independent variables in the model. The function should then return the log-likelihood for those parameter values. Be sure either in the function or prior to passing in the data frame to remove any rows with missing values.free_trade_support
is the dependent variable and income
and education
are the independent variables. Iterate over all possible combinations of the intercept seq(from = -1, 1, by = 0.05)
, the income parameter seq(from = 0.000005, to = 0.00001, by = 0.000001)
, and the education parameter seq(from = 0, to = 1, by = 0.05)
.optim()
function to calculate the parameters that maximize the log-likelihood for the the same variables as before.glm()
function with family = binomial
and the same model you used above. Compare the resulting estimates to what you obtained using your own function and both the grid search/optim()
approaches.Your assignment should be submitted as an R Markdown document rendered as an HTML/PDF document. Don’t know what an R Markdown document is? Read this! Or this! I have included starter files for you to modify to complete the assignment, so you are not beginning completely from scratch.
Follow instructions on homework workflow. As part of the pull request, you’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc.
This work is licensed under the CC BY-NC 4.0 Creative Commons License.