Engineering Math

The Calculus of Variations

Thus far, we have been concerned with the optimization of functions, which have arguments that each correspond to a single value. The calculus of variations is a branch of mathematics that extends the concept of optimization to functions that have arguments that correspond to entire functions—called functionals. This is closely related to the field of functional analysis, which is concerned with the study of spaces of functions. The calculus of variations is used to find the function that minimizes or maximizes a given functional. In other words, the calculus of variations attempts to optimize an objective functional by finding the function that produces the optimal output value.

The most important objective functional in the calculus of variations is the definite integral of a function \(f(x, y(x), y'(x))\) over an interval \([a, b]\), where \(y(x)\) is the function to be optimized: \[ J[y] = \int_a^b f(x, y(x), y'(x)) \, dx. \qquad{(1)}\] We use the notation \(J[y]\) to denote the functional, which varies with function \(y(x)\). The goal of the calculus of variations is to find the function \(y(x)\) that minimizes or maximizes the integral \(J[y]\) subject to the boundary conditions \(y(a) = y_a\) and \(y(b) = y_b\).

The calculus of variations is used in many areas of mathematics and physics, including the study of geodesics in Riemannian geometry, the path of least time in optics, and the principle of least action in classical mechanics. The last example is particularly important, as it is the basis for the Lagrangian formulation and the Hamiltonian formulation of classical mechanics.

The Euler-Lagrange Equation

The key result in the calculus of variations is the Euler-Lagrange equation, which provides a necessary condition for a function to be an extremum of a functional. We will derive the Euler-Lagrange equation for the functional \(J[y]\) defined above.

Let \(y = \hat{y}(x)\) be a function that minimizes or maximizes the objective functional \(J[y]\) given in . We will consider a perturbation of the optimal solution \(\hat{y}\) constructed by adding a function \(\eta(x)\) scaled by a small parameter \(\epsilon\), \[ y(x) = \hat{y}(x) + \epsilon \eta(x), \] where \(\eta(x)\) is a smooth function that satisfies the boundary conditions \(\eta(a) = \eta(b) = 0\). The function \(\epsilon \eta(x)\) is called a variation of the function \(y(x)\). There are important details regarding the norm used to quantify the size of the variation (Kot 2014, sec. 2.3), but we will not delve into them here. We assume here that \(\eta(x)\) is independent of \(\epsilon\).

Thus we can consider the functional \(J[y]\) to be homologous to a function of the parameter \(\epsilon\): \[ J(\epsilon) \equiv J[\hat{y} + \epsilon \eta] = \int_a^b f(x, \hat{y} + \epsilon \eta, \hat{y}' + \epsilon \eta') \, dx. \] The function \(J(\epsilon)\) is minimized or maximized at \(\epsilon = 0\) (i.e., \(y = \hat{y}\)). The total variation is \[ \Delta J = J(\epsilon) - J(0) = \int_a^b f(x, \hat{y} + \epsilon \eta, \hat{y}' + \epsilon \eta') - f(x, \hat{y}, \hat{y}') \, dx. \] The Taylor expansion of the total variation is \[ \Delta J = \delta J + \frac{1}{2} \delta^2 J + \mathcal{O}(\epsilon^3), \] where1 \[ \delta J = \left. \frac{d J(\epsilon)}{d\epsilon} \right|_{\epsilon = 0} = \epsilon \int_a^b \left. f_y(x, \hat{y}, \hat{y}') \eta + f_{y'}(x, \hat{y}, \hat{y}') \eta' \right. \, dx \qquad{(2)}\] is called the first variation and \[ \delta^2 J = \left. \frac{d^2 J(\epsilon)}{d\epsilon^2} \right|_{\epsilon = 0} = \epsilon^2 \int_a^b \left. f_{yy}(x, \hat{y}, \hat{y}') \eta^2 + 2 f_{y y'}(x, \hat{y}, \hat{y}') \eta \eta' + f_{y' y'}(x, \hat{y}, \hat{y}') \eta'^2 \right. \, dx \] is called the second variation.

If the functional \(J[y]\) is minimized at \(\epsilon = 0\), for sufficiently small \(\epsilon\), the total variation must satisfy \[\Delta J \ge 0.\] In other words, \(\Delta J\) cannot be negative for any \(\epsilon\). However, the odd, linear first variation \(\delta J\) could be negative for \(\epsilon < 0\). Because, for sufficiently small \(\epsilon\), the linear term \(\delta J\) dominates \(\Delta J\), we conclude that \(\delta J = 0\). This condition is analogous to the first-derivative test for extrema in single-variable calculus. Moreover, for a minimum, the second variation \(\delta^2 J\) must be nonnegative for any \(\epsilon\), \[ \delta^2 J \ge 0. \] This condition is analogous to the second-derivative test for extrema in single-variable calculus.

Conversely, if the functional \(J[y]\) is maximized at \(\epsilon = 0\), the total variation must satisfy \[\delta J = 0\quad\text{and}\quad\delta^2 J \le 0.\]

We can summarize the first-variation results in the following lemma.

First Variation Condition

If \(y = \hat{y}(x)\) is an extremum of the functional \(J[y]\) defined in , then the first variation \(\delta J\) of the functional must satisfy the condition \[ \delta J = 0 \] for \(y = \hat{y}(x)\) and for any admissible variation \(\epsilon \eta(x)\) of the function \(y\). Note that this is a necessary condition for an extremum of the functional.

The first variation \(\delta J\) of is unwieldy. Integration by parts can simplify the expression to (Kot 2014, sec. 2.3.1) \[ \delta J = \eta \int_a^b M(x) \eta(x) \, dx, \] where \[ M(x) = f_y(x, \hat{y}, \hat{y}') - \frac{d}{dx} f_{y'}(x, \hat{y}, \hat{y}'). \qquad{(3)}\] The first variation condition of requires that \(\delta J = 0\). Although it is not obvious, this implies that \(M(x)\) must be zero for all \(x \in [a, b]\), as is stated in the following lemma.

Fundamental Lemma of the Calculus of Variations

If all of the following conditions are met:

  • \(M(x)\) is a continuous real-valued function on \([a, b]\)
  • \(\eta(x)\) and \(\eta'(x)\) are continuous functions on \([a, b]\)
  • \(\eta(a) = \eta(b) = 0\)
  • \(\int_a^b M(x) \eta(x) \, dx = 0,\)

then \(M(x) = 0\) for all \(x \in [a, b]\).

The truth of this lemma is not immediately obvious, but is well established (Kot 2014, sec. 2.3.1).

An important consequence of is the Euler-Lagrange equation, which is the main result of the calculus of variations. From , we have that \(M(x) = 0\) for all \(x \in [a, b]\). So can be set to zero. This is most commonly written as \[ \frac{\partial f}{\partial y} - \frac{d}{dx} \frac{\partial f}{\partial y'} = 0. \qquad{(4)}\] This is the Euler-Lagrange equation for the functional \(J[y]\) defined in . We summarize this result in the following lemma.

Euler-Lagrange Equation

If \(y = \hat{y}(x)\) is an extremum of the functional \(J[y]\) defined in , then the function \(\hat{y}(x)\) must satisfy the Euler-Lagrange equation \[ \frac{\partial f}{\partial y} - \frac{d}{dx} \frac{\partial f}{\partial y'} = 0. \] This is a necessary condition for an extremum of the functional.

In this form, the Euler-Lagrange equation can be applied to functionals of the form of to find a necessary condition for the function that minimizes or maximizes the functional. Let us consider a simple example to illustrate the application of the Euler-Lagrange equation.

Example 9.3

Consider the problem of finding the shortest path between two points p1 = (x1,y1) and p2 = (x2,y2) in the plane. We know intuitively that the shortest path is a straight line. However, we can use the calculus of variations to prove this fact.

For a sufficiently small segment of the path, the length ds of the path is straight and given by the Pythagorean theorem: $$\begin{align} ds = \sqrt{dx^2 + dy^2}, \label{eq:shortest-path-ds} \end{align}$$ where dx and dy are the corresponding changes in x and y along the segment. The total length of the path is the integral of ds over the path: $$\begin{align} J[y] = \int_{p_1}^{p_2} \, ds. \label{eq:shortest-path-functional-1} \end{align}$$ We have called the functional J[y] because it is the functional we would like to minimize (i.e., it is the objective functional). Using the chain rule, $$ dy = \frac{dy}{dx} dx = y'(x) dx, $$ so we can rewrite such that the integrand is a function of x, y, and y′ = dy/dx: $$ ds = \sqrt{1 + y'(x)^2} \, dx. $$ So the functional can be written from as $$\begin{align} J[y] = \int_{x_1}^{x_2} \sqrt{1 + y'(x)^2} \, dx. \label{eq:shortest-path-functional-2} \end{align}$$ We can apply the Euler-Lagrange equation to this functional to find the function y(x) that minimizes the path length. Identifying $f(x, y, y') = \sqrt{1 + y'(x)^2}$, we have $$ \frac{\partial f}{\partial y} = 0 \quad\text{and}\quad \frac{\partial f}{\partial y'} = \frac{y'}{\sqrt{1 + y'(x)^2}}. $$ So the Euler-Lagrange equation becomes $$ \frac{d}{dx} \frac{y'}{\sqrt{1 + y'(x)^2}} = 0. $$ This implies that for some real constant C1, $$ \frac{y'}{\sqrt{1 + y'(x)^2}} = C_1, $$ which can be solved for y′(x) to show that y′(x) is a real constant C2. Thus the function y(x) is a straight line, y(x) = C2x + C3, for real constants C2 and C3. This is the equation of a straight line, as we expected.

Note that this result is just a necessary condition for the shortest path. More is required to show that this is also a sufficient condition.

For more more complex spaces, such as the surface of a toroid, the shortest path is not a straight line, but a geodesic. The same method can be used to find the geodesic on a surface.

A Multivariate, Parametric Euler-Lagrange Equations

For many problems, there are multiple functions that must be optimized simultaneously in a functional. Consider the optimization of a functional \(J[q_1, q_2, \ldots, q_n]\) that depends on \(n\) functions \(q_1(t)\), \(q_2(t)\), \(\ldots\), \(q_n(t)\), where \(t\) is a parameter. The functional \(J\) is defined as \[ J[q_1, q_2, \ldots, q_n] = \int_{t_1}^{t_2} L(t, q_1, q_2, \ldots, q_n, \dot{q}_1, \dot{q}_2, \ldots, \dot{q}_n) \, dt, \] where \(\dot{q}_i = dq_i/dt\) and \(L\) is a function of \(t\), \(q_i\), and \(\dot{q}_i\). The goal is to find the functions \(q_i(t)\) that minimize or maximize the functional \(J\).

The Euler-Lagrange equation for this multivariate, parametric functional is derived in a similar manner to the single-variable case.

Euler-Lagrange Equations for Multivariate, Parametric Functionals

If \(q_i = \hat{q}_i(t)\) are extrema of the functional \(J[q_1, q_2, \ldots, q_n]\) defined above, then the functions \(\hat{q}_i(t)\) must satisfy the Euler-Lagrange equations \[ \frac{\partial L}{\partial q_i} - \frac{d}{dt} \frac{\partial L}{\partial \dot{q}_i} = 0 \] for \(i = 1, 2, \ldots, n\). These are necessary conditions for extrema of the functional.

Application to Lagrangian Mechanics

The Euler-Lagrange equation is a fundamental tool in the study of Lagrangian mechanics, which is a reformulation of Newtonian classical mechanics that uses the concept of action to describe the motion of particles. We begin with a few definitions.

Generalized Coordinates

The generalized coordinates \(q_i\) of a mechanical system are a set of coordinates that describe the position and orientation of the system. The number of generalized coordinates is equal to the number of degrees of freedom of the system. Generalized coordinates are not necessarily Cartesian, independent, or inertial.

Generalized coordinates are used to define the kinetic and potential energy of a mechanical system. They enter into the following definitions. We can think of the set of generalized coordinates \(q_i\) as comprising a space called the configuration space. The time evolution of the system is described by the path \(\bm{q}(t)\) in configuration space. The velocity of the system through configuration space is given by the vector \(\dot{\bm{q}}(t)\).

Lagrangian

The Lagrangian \(L\) of a mechanical system is a function of time \(t\), a position vector \(\bm{q}\), and the velocity vector \(\dot{\bm{q}}\) of the system, and equals the difference between the kinetic energy \(T\) and the potential energy \(U\) of the system: \[ L(t, \bm{q}, \dot{\bm{q}}) = T(\bm{q}, \dot{\bm{q}}) - U(\bm{q}). \]

Action

The action \(S\) of a system moving through its configuration space on a path \(\bm{q}(t)\) from time \(t_1\) to time \(t_2\) is defined as \[ S[\bm{q}] = \int_{t_1}^{t_2} L(t, \bm{q}, \dot{\bm{q}}) \, dt, \] for Lagrangian \(L\).

We can now state the principle of least action.

Principle of Least Action

For a mechanical system with only energy conserving forces, the path \(\bm{q}(t)\) the system takes through its configuration space is such that the action \(S[\bm{q}]\) is minimized.

The principle of least action is related to Hamilton’s principle.

Hamilton's Principle

For a mechanical system with only energy conserving forces, the path \(\bm{q}(t)\) the system takes through its configuration space is such that first variation of the action \(S[\bm{q}]\) is zero: \[ \delta S = 0. \]

A consequence of Hamilton’s principle is that the Euler-Lagrange equations for the Lagrangian \(L\) are satisfied by the path \(\bm{q}(t)\) that the particle takes.

Euler-Lagrange Equations in Lagrangian Mechanics

For a mechanical system with only energy conserving forces, the path \(\bm{q}(t)\) the system takes through its configuration space must satisfy the Euler-Lagrange equations \[ \frac{\partial L}{\partial q_i} - \frac{d}{dt} \frac{\partial L}{\partial \dot{q}_i} = 0 \] for \(i = 1, 2, \ldots, n\).

We are now ready to apply the Euler-Lagrange equations to a mechanical system.

Example 9.4

Consider a flexible pendulum of unstretched length with mass m affixed to its end, free to move in a vertical plane (i.e., the pendulum is subject to gravity). The pendulum is sufficiently stiff transverse to its length that it can be treated as straight. Its length is given by ℓ + λ(t), where λ(t) is the displacement of the pendulum from its unstretched length. The pendulum is also described by the angle θ(t) that the pendulum makes with the downward vertical.

Derive the equations of motion for the flexible pendulum. Numerically integrate the equations of motion to find the motion of the pendulum for slightly different initial conditions. Does the pendulum exhibit chaotic motion?

asdf

Kot, M. 2014. A First Course in the Calculus of Variations. Student Mathematical Library. American Mathematical Society.

  1. The notation \(f_y\) and \(f_{y'}\) denotes the partial derivatives of \(f\) with respect to \(y\) and \(y'\), respectively.↩︎

Online Resources for Section 9.4

No online resources.