\[ \begin{eqnarray} f'(x) & = & - 2 x \\ 0 & = & - 2 x^{*} \\ x^{*} & = & 0 \end{eqnarray} \]
\[ \begin{eqnarray} f^{'}(x) & = & - 2x \\ f^{''}(x) & = & - 2 \end{eqnarray} \]
\[ \begin{eqnarray} f'(x) & = & 3 x^2 \\ 0 & = & 3 (x^{*})^2 \\ x^{*} & = & 0 \end{eqnarray} \]
\[ \begin{eqnarray} f^{''}(x) & = & 6x \\ f^{''}(0) & = & 0 \end{eqnarray} \]
Let \(f:X \rightarrow \Re\) with \(X \subset \Re^{n}\). A vector \(\boldsymbol{x}^{*} \in X\) is a global maximum if , for all other \(\boldsymbol{x} \in X\)
\[ \begin{eqnarray} f(\boldsymbol{x}^{*}) & > & f(\boldsymbol{x} ) \nonumber \end{eqnarray} \]
A vector \(\boldsymbol{x}^{\text{local}}\) is a local maximum if there is a neighborhood around \(\boldsymbol{x}^{\text{local}}\), \(Q \subset X\) such that, for all \(x \in Q\),
\[ \begin{eqnarray} f(\boldsymbol{x}^{\text{local} }) & > & f(\boldsymbol{x} ) \end{eqnarray} \]
Suppose \(f:X \rightarrow \Re^{n}\) with \(X \subset \Re^{1}\) is a differentiable function. Define the gradient vector of \(f\) at \(\boldsymbol{x}_{0}\), \(\nabla f(\boldsymbol{x}_{0})\) as
\[ \begin{eqnarray} \nabla f (\boldsymbol{x}_{0}) & = & \left(\frac{\partial f (\boldsymbol{x}_{0}) }{\partial x_{1} }, \frac{\partial f (\boldsymbol{x}_{0}) }{\partial x_{2} }, \frac{\partial f (\boldsymbol{x}_{0}) }{\partial x_{3} }, \ldots, \frac{\partial f (\boldsymbol{x}_{0}) }{\partial x_{n} } \right) \end{eqnarray} \]
It is the first partial derivatives for each variable \(x_n\) stored in a vector. So if \(\boldsymbol{a} \in X\) is a local extremum, then,
\[ \begin{eqnarray} \nabla f(\boldsymbol{a}) & = & \boldsymbol{0} \\ & = & (0, 0, \ldots, 0) \end{eqnarray} \]
\[ \begin{eqnarray} f(x,y) &=& x^2+y^2 \\ \nabla f(x,y) &=& (2x, \, 2y) \end{eqnarray} \]
\[ \begin{eqnarray} f(x,y) &=& x^3 y^4 +e^x -\log y \\ \nabla f(x,y) &=& (3x^2 y^4 + e^x, \, 4x^3y^3 - \frac{1}{y}) \end{eqnarray} \]
Suppose \(f:X \rightarrow \Re^{1}\) , \(X \subset \Re^{n}\), with \(f\) a twice differentiable function. We will define the Hessian matrix as the matrix of second derivatives at \(\boldsymbol{x}^{*} \in X\),
\[ \begin{eqnarray} \boldsymbol{H}(f)(\boldsymbol{x}^{*} ) & = & \begin{pmatrix} \frac{\partial^{2} f }{\partial x_{1} \partial x_{1} } (\boldsymbol{x}^{*} ) & \frac{\partial^{2} f }{\partial x_{1} \partial x_{2} } (\boldsymbol{x}^{*} ) & \ldots & \frac{\partial^{2} f }{\partial x_{1} \partial x_{n} } (\boldsymbol{x}^{*} ) \\ \frac{\partial^{2} f }{\partial x_{2} \partial x_{1} } (\boldsymbol{x}^{*} ) & \frac{\partial^{2} f }{\partial x_{2} \partial x_{2} } (\boldsymbol{x}^{*} ) & \ldots & \frac{\partial^{2} f }{\partial x_{2} \partial x_{n} } (\boldsymbol{x}^{*} ) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^{2} f }{\partial x_{n} \partial x_{1} } (\boldsymbol{x}^{*} ) & \frac{\partial^{2} f }{\partial x_{n} \partial x_{2} } (\boldsymbol{x}^{*} ) & \ldots & \frac{\partial^{2} f }{\partial x_{n} \partial x_{n} } (\boldsymbol{x}^{*} ) \\ \end{pmatrix} \nonumber \end{eqnarray} \]
Requires differentiating on the entire gradient with respect to each \(x_n\)
\[ \begin{eqnarray} f(x,y) &=& x^2+y^2 \\ \nabla f(x,y) &=& (2x, \, 2y) \\ \boldsymbol{H}(f)(x,y) &=& \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} \end{eqnarray} \]
\[ \begin{eqnarray} f(x,y) &=& x^3 y^4 +e^x -\log y \\ \nabla f(x,y) &=& (3x^2 y^4 + e^x, \, 4x^3y^3 - \frac{1}{y}) \\ \boldsymbol{H}(f)(x,y) &=& \begin{pmatrix} 6xy^4 + e^x & 12x^2y^3 \\ 12x^2y^3 & 12x^3y^2 + \frac{1}{y^2} \end{pmatrix} \end{eqnarray} \]
Consider \(n \times n\) matrix \(\boldsymbol{A}\). If, for all \(\boldsymbol{x} \in \Re^{n}\) where \(\boldsymbol{x} \neq 0\):
\[ \begin{eqnarray} \boldsymbol{x}^{'} \boldsymbol{A} \boldsymbol{x} & > & 0, \quad \text{ $\boldsymbol{A}$ is positive definite } \\ \boldsymbol{x}^{'} \boldsymbol{A} \boldsymbol{x} & < & 0, \quad \text{ $\boldsymbol{A}$ is negative definite } \end{eqnarray} \]
If \(\boldsymbol{x}^{'} \boldsymbol{A} \boldsymbol{x} >0\) for some \(\boldsymbol{x}\) and \(\boldsymbol{x}^{'} \boldsymbol{A} \boldsymbol{x}<0\) for other \(\boldsymbol{x}\), then we say \(\boldsymbol{A}\) is indefinite.
\[ \begin{eqnarray} \boldsymbol{H}(f)(\boldsymbol{a}) & = & \begin{pmatrix} A & B \\ B & C \\ \end{pmatrix} \end{eqnarray} \]
Suppose \(f:\Re^{2} \rightarrow \Re\) with
\[ \begin{eqnarray} f(x_{1}, x_{2}) & = & 3(x_1 + 2)^2 + 4(x_{2} + 4)^2 \nonumber \end{eqnarray} \]
Calculate gradient:
\[ \begin{eqnarray} \nabla f(\boldsymbol{x}) & = & (6 x_{1} + 12 , 8x_{2} + 32 ) \nonumber \\ \boldsymbol{0} & = & (6 x_{1}^{*} + 12 , 8x_{2}^{*} + 32 ) \nonumber \end{eqnarray} \]
We now solve the system of equations to yield
\[x_{1}^{*} = - 2, \quad x_{2}^{*} = -4\]
\[ \begin{eqnarray} \textbf{H}(f)(\boldsymbol{x}^{*}) & = & \begin{pmatrix} 6 & 0 \\ 0 & 8 \\ \end{pmatrix}\nonumber \end{eqnarray} \]
det\((\textbf{H}(f)(\boldsymbol{x}^{*}))\) = 48 and \(6>0\) so \(\textbf{H}(f)(\boldsymbol{x}^{*})\) is positive definite. \(\boldsymbol{x^{*}}\) is a local minimum.
\[ \begin{eqnarray} Y_{i} & \sim & \text{Normal}(\mu, \sigma^2) \\ \boldsymbol{Y} & = & (Y_{1}, Y_{2}, \ldots, Y_{n} ) \end{eqnarray} \]
Likelihood function
\[ \begin{eqnarray} L(\mu, \sigma^2 | \boldsymbol{Y} ) & \propto & \prod_{i=1}^{n} f(Y_{i}|\mu, \sigma^2) \\ &\propto & \prod_{i=1}^{N} \frac{\exp[ - \frac{ (Y_{i} - \mu)^2 }{2\sigma^2} ]}{\sqrt{2 \pi \sigma^2}} \\ & \propto & \frac{\exp[ -\sum_{i=1}^{n} \frac{(Y_{i} - \mu)^2}{2\sigma^2} ]}{ (2\pi)^{n/2} \sigma^{2n/2} } \end{eqnarray} \]
Log-likelihood
\[ \begin{eqnarray} \log L(\mu, \sigma^2|\boldsymbol{Y} ) & = & -\sum_{i=1}^{n} \frac{(Y_{i} - \mu)^2}{2\sigma^2} - \frac{n}{2} log(2 \pi) - \frac{n}{2} \log (\sigma^2) \end{eqnarray} \]
set.seed(1234)
y <- rnorm(n = 10000, mean = 0.25, sd = 10)
head(y)
## [1] -11.82 3.02 11.09 -23.21 4.54 5.31
# define the log-likelihood function for a normal distribution
log.like <- function(mu, sigma.2, y){
part1 <- - (1 / (2 * sigma.2)) * sum((y - mu)^2)
part2 <- - (length(y)/2) * log(sigma.2)
out <- part1 + part2
return(out)
}
# define search space
(search_space <- expand.grid(mu = seq(-2, 2, by = .01),
sigma2 = seq(8^2, 12^2, by = .1)) %>%
as_tibble)
## # A tibble: 321,201 x 2
## mu sigma2
## <dbl> <dbl>
## 1 -2 64
## 2 -1.99 64
## 3 -1.98 64
## 4 -1.97 64
## 5 -1.96 64
## 6 -1.95 64
## 7 -1.94 64
## 8 -1.93 64
## 9 -1.92 64
## 10 -1.91 64
## # ... with 321,191 more rows
# evaluate parameters
system.time((search_space <- search_space %>%
mutate(logLik = map2_dbl(mu, sigma2, log.like, y = y))))
## user system elapsed
## 11.87 3.59 15.65
# which parameter values maximize the likelihood function
filter(search_space, logLik == max(logLik))
## # A tibble: 1 x 3
## mu sigma2 logLik
## <dbl> <dbl> <dbl>
## 1 -1.55 142. Inf
# draw a heatmap of the log-likelihood values
ggplot(search_space, aes(mu, sigma2, fill = logLik)) +
geom_raster() +
geom_contour(aes(z = logLik)) +
scale_fill_continuous(type = "viridis") +
labs(x = expression(mu),
y = expression(sigma^2))
mean(y)
## [1] 0.311
var(y)
## [1] 97.5
\[ \begin{eqnarray} 0 & \cong & f(x_0) + \frac{f^{'}(x_0)}{1} (x_1 - x_0) \end{eqnarray} \]
\[ \begin{eqnarray} x_1 & \cong & x_0 - \frac{f(x_0)}{f'(x_0)} \end{eqnarray} \]
\[ x_{n+1} \cong x_n - \frac{f(x_n)}{f'(x_n)} \]
\[y = x^3 + 2x - 5\]
# define the function
func2 <- function(x) {
x^3 - 2 * x - 5
}
# draw a plot of the function
range <- data_frame(x = c(-5, 5))
f0 <- range %>%
ggplot(aes(x)) +
stat_function(fun = func2, size = .5, linetype = 2) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
labs(x = expression(x),
y = expression(f(x)))
f0
uniroot(func2, interval = c(2, 3))
## $root
## [1] 2.09
##
## $f.root
## [1] -0.000115
##
## $iter
## [1] 5
##
## $init.it
## [1] NA
##
## $estim.prec
## [1] 6.1e-05
# f - the function to optimize
# a - lower bound for the search
# b - upper bound for the search
# tol - tolerance (stopping criteria for the algorithm)
# n - maximum number of iterations to attempt. will not exceed even if
# tolerance is not achieved
newton_raphson <- function(f, a, b, tol = 1e-5, n_iter = 1000) {
require(numDeriv) # Package for computing f'(x)
x0 <- a # Set start value to supplied lower bound
k <- vector("numeric", length = n_iter) # Initialize for iteration results
# Check the upper and lower bounds to see if approximations result in 0
fa <- f(a)
if (fa == 0.0) {
return(a)
}
fb <- f(b)
if (fb == 0.0) {
return(b)
}
for (i in 1:n_iter) {
dx <- genD(func = f, x = x0)$D[1] # First-order derivative f'(x0)
x1 <- x0 - (f(x0) / dx) # Calculate next value x1
k[[i]] <- x1 # Store x1
# Once the difference between x0 and x1 becomes sufficiently small, output the results.
if (abs(x1 - x0) < tol) {
root.approx <- x1
res <- list('root approximation' = root.approx, 'iterations' = k[1:i])
return(res)
}
# If Newton-Raphson has not yet reached convergence set x1 as x0 and continue
x0 <- x1
}
print('Too many iterations in method')
}
newton_raphson(f = func2, a = 2, b = 3)
## $`root approximation`
## [1] 2.09
##
## $iterations
## [1] 2.10 2.09 2.09 2.09
Step size is proportional to the negative of the gradient of the function at the current point
\[\mathbf{a}_{n+1} = \mathbf{a}_n-\gamma\nabla F(\mathbf{a}_n)\]
\[\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n \nabla F(\mathbf{x}_n),\ n \ge 0\]
\[f(\mathbf{x}_0)\ge f(\mathbf{x}_1)\ge f(\mathbf{x}_2)\ge \cdots\]
Good for convex functions with easily calculated gradients
# define the function
obj_func <- function(x) {
x^4 - 3 * x^3 + 2
}
# draw a plot of the function
range <- data_frame(x = c(-1, 3.25))
f0 <- range %>%
ggplot(aes(x)) +
stat_function(fun = obj_func, size = .5, linetype = 2) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
labs(x = expression(x),
y = expression(f(x)))
f0
\[\nabla f(x) = 4x^3 - 9x^2 = 0\]
# define the gradient
gradient <- function(x) {
(4 * x^3) - (9 * x^2)
}
f0 <- range %>%
ggplot(aes(x)) +
stat_function(fun = gradient, size = .5, linetype = 2) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
labs(x = expression(x),
y = expression(f(x)))
f0
# func - the function to optimize
# grad - gradient of the function to optimize
# stepsize - size of each step
# tol - tolerance (stopping criteria for the algorithm)
# iter - maximum number of iterations to attempt. will not exceed even if
# tolerance is not achieved
grad_desc <- function(func, grad, stepsize = 0.003, tol = 1e-5, iter = 500){
# randomly initialize a value to x
set.seed(100)
x <- floor(runif(1)*10)
# create a vector to contain all xs for all steps
x.All <- vector("numeric", iter)
# gradient descent method to find the minimum
for(i in 1:iter){
x1 <- x - stepsize * grad(x)
x.All[[i]] <- x1
# Once the difference between x0 and x1 becomes sufficiently small, output the results.
if (abs(x - x1) < tol) {
root.approx <- x
res <- list('root approximation' = root.approx, 'iterations' = x.All[1:i])
return(res)
}
# If gradient descent has not yet reached convergence set x as x0 and continue
x <- x1
}
print('Too many iterations in method')
}
grad_desc(func = obj_func, grad = gradient)
## $`root approximation`
## [1] 2.25
##
## $iterations
## [1] 2.92 2.85 2.79 2.74 2.70 2.66 2.62 2.59 2.56 2.54 2.52 2.50 2.48 2.46
## [15] 2.45 2.43 2.42 2.41 2.40 2.39 2.38 2.37 2.36 2.35 2.35 2.34 2.33 2.33
## [29] 2.32 2.32 2.31 2.31 2.31 2.30 2.30 2.30 2.29 2.29 2.29 2.29 2.28 2.28
## [43] 2.28 2.28 2.28 2.27 2.27 2.27 2.27 2.27 2.27 2.27 2.27 2.26 2.26 2.26
## [57] 2.26 2.26 2.26 2.26 2.26 2.26 2.26 2.26 2.26 2.26 2.26 2.26 2.26 2.26
## [71] 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25
## [85] 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25
## [99] 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25
## [113] 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25 2.25
## [127] 2.25
optimize()
and optim()
optim()
optimize()
politician.utility <- function(policy.content){
politician.support <- -(policy.content - 1)^2 + 8
return(politician.support)
}
range <- data_frame(x = c(-2, 4))
range %>%
ggplot(aes(x)) +
stat_function(fun = politician.utility, size = .5, linetype = 2) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
labs(title = "Legislator's Utility Function",
x = "Policy Ideology",
y = "Legislator Utility From Policy")
optimize(f = politician.utility, interval = c(-2, 4), maximum = TRUE)
## $maximum
## [1] 1
##
## $objective
## [1] 8
politician.utility.2d <- function(params){
# split parameters
economic.content <- params[1]
social.content <- params[2]
# calculate utility
politician.support <- (-(economic.content - 1)^2 + 8 ) + (-(social.content + 2)^2 + 8)
return(politician.support)
}
# generate data values
expand.grid(economic.substance = seq(from=-8,to=8, by=.2),
social.substance = seq(from=-8, to=8, by=.2)) %>%
as_tibble %>%
mutate(utility = map2_dbl(economic.substance, social.substance,
~ politician.utility.2d(c(.x, .y)))) %>%
ggplot(aes(economic.substance, social.substance, fill = utility)) +
geom_raster() +
geom_contour(aes(z = utility)) +
scale_fill_continuous(type = "viridis") +
labs(title = "Utility Based on Policy Content on Two Dimensions",
x = "Economic Policy Content",
y = "Social Policy Content")
optim(par = c(-1, 0), fn = politician.utility.2d, control = list(fnscale = -1))
## $par
## [1] 1 -2
##
## $value
## [1] 16
##
## $counts
## function gradient
## 61 NA
##
## $convergence
## [1] 0
##
## $message
## NULL