An Introduction to Stochastic Processes (1)

Ito's lemma: definition and application

Jun 22, 2022

17 min read

In this post, the main topic is Itô’s lemma, which plays an important role in financial mathematics and is a useful tool for dealing with stochastic processes. A lot of articles and documents can be found about this topic, but very few of them include the introduction of the background, such as the Wiener process, filtration, and so on. They usually assume that the readers are already familiar with those concepts. Of course, it is easy to just search for the related context, but it takes quite a while to put all the things together. This is what this post tries to help with – connect Itô’s lemma to the related concepts about stochastic processes.

👁 100 simulated trajectories of the Wiener process (image by author)

100 simulated trajectories of the Wiener process (image by author)

Prerequisite

Wiener process

The first important concept we are introducing is the Wiener process, which is part of the Itô process (we will see that in the Itô process it serves as an integrator). It is defined as a stochastic process (or random process, a collection of random variables ordered by an index set [4]) with the following four properties:

The initial value W(0) = 0.
The Wiener process is almost surely continuous (but not differentiable): with probability 1, the function t → W(t) is continuous in t. (We say "almost", since there can be a set of t with Lebesgue measure of 0, at which the function is not continuous.) It applies that absolute continuity implies almost surely continuity. For an example of a Wiener process that is almost surely continuous but not continuous, see this StackExchange answer.
The process _{W(t)}_ₜ≥₀ __ has stationary, independent increments.
The increments W(t+s)-W(s) have the normal distribution N(0, s) (We use this to denote normal distribution with mean μ=0 and variance σ²=t). It is important to note that the increments are independent, only when they do not overlap. And it can be trivially derived that the distribution of a Wiener process W(t) (starting from W(0)) is N(0, t).

Filtration and martingale

The definition of filtration is very straightforward. Considering a probability space _(Ω, 𝓕, P) (_see here, for the basics about probability space and the meaning of the notation), f_iltration i_s a sequence of σ-algebras _𝓕₀, 𝓕₁, …, 𝓕ₙ, wit_h the property 𝓕₀ ⊂ _𝓕₁ ⊂ … ⊂ 𝓕ₙ. Howev_er, it seems very abstract. Intuitively, a filtration contains all the information up to time t. To u_n_derstand it better, we can use the following example (It’s lengthy but definitely worth reading because without understanding what is filtration, it’s not possible to understand the other definitions built upon it):

We toss two dice, and let the S = {1, 2, 3, 4, 5, 6}, and let the sample space be Ω=S×S, which is the set of all the possible outcomes of two tosses (every outcome is an ordered pair). At time zero, we don’t toss any dice, the σ-algebra is 𝓕₀={∅, Ω}, because, in this state, no additional information (the word "information" refers to "what can happen" at this step) is added to the σ_-algebra, we only know that (1) any outcome is possible – Ω ∈ 𝓕₀,_ where Ω _i_s the event (we know that an event is an element in a σ-_al_gebra), and obviously P(Ω) = 1; (2) none of the outcome— ∅∈ _𝓕₀, w_hich is the complement of Ω (_w_e know that in a σ-a_lg_ebra if A ∈ _Ω, t_hen Ā ∈ Ω). Since if the dice are thrown, there will for sure be some outcome, the event ∅ has a probability of zero, P(∅) = 0.

At the time of the first toss, information of the first toss will be added to the _σ-_algebra – 𝓕₁ contains all the possible events after the first toss. For example, {2, 4, 6}×S ∈ 𝓕₁, because it might be the case that the outcome of the first toss is even, which corresponds to the event {2, 4, 6}, as for the second toss, it can be anything in S. __ Analogously, also {1, 3, 5}×S ∈ 𝓕₁, in this case, the outcome of the first toss is odd and corresponds to the event {1, 3, 5}. In the same way, we can get the conclusion that 𝓕₁ _= 𝒫(S) ×S (2⁶_ e_le_ments), wh_e_re 𝒫(S) _deno_tes the power set of S, which means all the possible events of the first toss.

At the time of the second toss, there is information on both the first and second toes. 𝓕₂ contains all the possible events after the second toss. This means 𝓕₁ = 𝒫(_S×S) (2³_⁶ _elements). [5_]

The natural filtration or the generated filtration related to a random process X(t) is a filtration, such that at each time t, the random process X(t) is 𝓕ₜ-measurable. Formally,

👁 Prop 1.0 𝓕ₜ-measurable

Prop 1.0 𝓕ₜ-measurable

here the Borel algebra is used since the state parameter of the random process is usually the real line, whose σ-algebra is the Borel algebra. And we say that the random process X(t) is adapted to the filtration 𝓕ˣ. An important fact is that

a stochastic process X is always adapted to its natural filtration.

Also, 𝓕ₜ is the smallest σ_-_algebra, such that Property 1.0 is satisfied. Intuitively, 𝓕ₜ represents all the information available at time t, which means the filtration 𝓕ˣ _represents the evolution of the information (this is explained via the example above) of the stochastic process with time and the value of X(t) de_pends only on the evolution prior to t.__

Since we know what is filtration and adapted processes, it’s easy to understand what is martingale now. Consider an adapted stochastic process X(t), 0 ≤ t≤ T. The process X(t) is a martingale if it has no tendency to rise or fall. Formally,

👁 Eq 1.1 martingale

Eq 1.1 martingale

And we have the following observation:

The Wiener process is a martingale with respect to its natural filtration.

This can be shown as follows

👁 Proof showing that the Wiener process is a martingale.

Proof showing that the Wiener process is a martingale.

Another possible way to show that a process is a martingale is via Itô’s lemma, which we will see later when we get to the application of Itô’s lemma.

Quadratic variation

The quadratic variation is a kind of variation of a random process (an example of a different variation is the linear variation). Let X(t) be any process, the quadratic variation of X(t) again a random process and is defined as

👁 Eq 1.2 Quadratic variation.

Eq 1.2 Quadratic variation.

where p = {0=t₀ < t₁ < … < tₙ = T} is a partition, which ranges over the partition of the time interval [0, T], and ||p|| denotes the mesh which is the largest interval in the partition:

👁 Def 1.3 Mesh.

Def 1.3 Mesh.

It can be shown that the quadratic variation of a Wiener process W,W = t almost surely.

Itô integral

Firstly, we will introduce what is Itô integral and how it is constructed. It is just like the normal integral, but the integration is stochastic. It generalizes the Riemann-Stieltjes integral, which is a generalization of the Riemann integral.

To be more precise: The generalization of Riemann integration happens in the way that the "integrator" is replaced by a function, which was originally an infinitesimal number (I actually haven’t seen the term "integrator" being used in the context of Rieman integral, but Fig 2.1 will make things clear); And the Itô integral is a stochastic generalization of the Riemann–Stieltjes integral.

👁 Fig 2.1 Integrand and integrator.

Fig 2.1 Integrand and integrator.

The definition of Itô integral

What the Itô integral actually is, is shown in the following definition:

Let f ∈ 𝒱(S, T). Then the Itô integral of f __ (from S to T) is defined by

👁 Eq. 2.2 The definition of Itô integral.

Eq. 2.2 The definition of Itô integral.

where {ϕₙ} is a sequence of simple functions such that

👁 Exp. 2.3 Property of {ϕₙ} which should be satisfied in Eq. 2.1.

Exp. 2.3 Property of {ϕₙ} which should be satisfied in Eq. 2.1.

Note that Wₜ, denoting the Wiener process, is sometimes replaced by Bₜ, denoting the Brownian motion. [1] We don’t need to worry about this since they are the same thing. In the rest of the article, we will take S=0, which is usually the case – the process starts from t=0. The simple function is mentioned in this article, which is sometimes also called the "elementary function". The definition of a simple function is given in [2]:

A function _ϕ∈ 𝒱 i_s _c_alled simple if it has the form

👁 Def 2.4 Elementary function.

Def 2.4 Elementary function.

χ denotes the characteristic(indicator) function. It is not completely clear, whether "simple function" and "elementary function" are the same. According to [3], they are the same thing; According to this, for elementary functions, the conditions are more relaxed on the sets, on which χ is defined: there are a countably infinite number of them, but for simple functions, there can be only a finite number of them. But this doesn’t cause any trouble for us, because in Def 2.4, there are countably infinite many sets [tⱼ, tⱼ₊₁), both terms will work, and we will stick with the "simple function".

The 𝒱 in the definition stands for the class of functions

👁 Image

satisfying the following conditions:

(t, ω) → f(t, ω) is _𝓑×𝓕-measurable, wh_ere 𝓑 _denotes the Borel σ-algebra on [0, ∞)._
f(t, ω) is _𝓕ₜ-_adapted.
f(t, ω) is an L² -function: it is square-integrable and the integral infinite.

👁 f(t, ω) should be an L² -function

f(t, ω) should be an L² -function

Expression 2.3 might seem to be strange but it intuitively means that {ϕₙ} is __ a good approximation of f(t, ω) – imagine ϕₙ to be a step function, when the steps become infinitesimally small, ϕₙ is the same as f(t, ω). Mathematically, Expression 2.3 means that the Itô integral is defined as the L²-limit of the sequence of the Itô integral of the simple functions.

Itô process and diffusion process

An Itô process or stochastic integral is a stochastic process on (Ω, 𝓕, P) adapted to 𝓕ₜ, which can be written in the form

👁 Eq. 3.1 Itô process.

Eq. 3.1 Itô process.

where functions U, V ∈ 𝓛₂. We can see that the first part – integration of function U is deterministic. And it is a Riemann integral. The second part – integration of function V is stochastic and is the only source of noise in process Xₜ, which is an Itô integral.

Equation 3.1 can be also written in the shorted form, in terms of stochastic differential:

👁 Eq. 3.2 Itô process in terms of stochastic differential.

Eq. 3.2 Itô process in terms of stochastic differential.

And the stochastic differential equation (SDE) for a stochastic process X(t) is given by

👁 Eq 3.3 Formulation of the stochastic differential equation.

Eq 3.3 Formulation of the stochastic differential equation.

An SDE given in the form of Equation 3.3 is known as an Itô type SDE. (Surely there are also other different types of SDE, such as the Stratonovich SDE.) And a solution to such an SDE is called Itô diffusion, which is a special case of diffusion processes. A process is a diffusion process if

The time variable t is continuous. Usually, we assume that t is defined for all the non-negative real numbers.
It is a Markov process with a transition cumulative distribution function

👁 Eq 3.3.1 Transition cumulative distribution function.

Eq 3.3.1 Transition cumulative distribution function.

and transition probability density, which is defined as the partial derivative (if exists) of Equation 3.3.1

👁 Eq 3.3.2 Transition probability density function.

Eq 3.3.2 Transition probability density function.

Xₜ is a continuous function of t.

The properties of the diffusion processes are completely determined by two characteristics: the infinitesimal mean and infinitesimal variance, together with the possible boundary conditions. The infinitesimal mean is defined by

👁 Eq 3.3.3 Infinitesimal mean (the first infinitesimal moment)

Eq 3.3.3 Infinitesimal mean (the first infinitesimal moment)

and the infinitesimal variance is defined by

👁 Eq 3.3.4 Infinitesimal variance (the second infinitesimal moment).

Eq 3.3.4 Infinitesimal variance (the second infinitesimal moment).

Both of those characteristics can be directly read off from Equation 3.3.

Examples of SDE

One very important example of SDE is as follows

👁 Eq 3.4 The formula for geometric Brownian motion.

Eq 3.4 The formula for geometric Brownian motion.

The solution of X(t) given by Equation 3.4 is geometric Brownian motion. It is solved by Itô’s lemma, which will be shown in the second example of the application of Itô’s lemma. The solution to Equation 3.4 with the initial condition X(0) = x₀ is given by

👁 Eq 3.5 Explicit expression of geometric Brownian motion.

Eq 3.5 Explicit expression of geometric Brownian motion.

Another important example of SDE is the Ornstein-Uhlenbeck process, which is given by

👁 Eq 3.5.1 SDE of the Ornstein-Uhlebeck process.

Eq 3.5.1 SDE of the Ornstein-Uhlebeck process.

where m, α, σ ∈ ℝ are some constants. The intuitive physical meaning of Equation 3.6 is the motion of a particle that drifts towards the level m with the noise of intensity σ. The drift is positive when Xₜ > m, and negative when Xₜ <m. Due to this feature, the Ornstein-Uhlebeck process is also a mean-reverting process. In finance, the Ornstein-Uhlebeck process is used in the Vasiček model for the instantaneous interest rate, where σ is the volatility, α is the speed of reversion and m is the long-term mean level. The solution is

👁 Eq 3.5.2 The solution to the O-U process.

Eq 3.5.2 The solution to the O-U process.

assuming that the process starts from t=0.

The Itô isometry

The idea of the definition of Itô integral was to define the integral using a simple class of functions first and then extend it to the whole class𝒱. For this purpose, we can use Itô isometry.

The lemma of the Itô isometry is formulated as follows: if ϕ(t, ω) is a simple function and bounded then

👁 Eq 3.6 The Itô isometry.

Eq 3.6 The Itô isometry.

The steps of the construction are talked about in detail in [2], but here we will only show the ideas of the proof: we want to prove that the sequence of the stochastic integrals of simple functions

👁 Eq 3.7 a sequence of stochastic integrals of simple functions

Eq 3.7 a sequence of stochastic integrals of simple functions

forms a Cauchy sequence (the elements become arbitrarily close to each other as the sequence goes on) in the L²-space of random variables with the finite second moment. Since the L²-space is complete (we mention this since, in a complete metric space, every Cauchy sequence converges to an element in that metric space), the sequence given by Equation 3.7 converges to a random variable in the L²-space. Now we try to prove that the limit is Cauchy:

👁 Eq 3.8

Eq 3.8

From here, the property

👁 Image

we can immediately see that

👁 Eq 3.9

Eq 3.9

combing Equation 3.8 and Equation 3.9, it’s proved that the sequence given in Equation 3.7 is a Cauchy sequence. [7]

Another important application of Itô isometry is the calculation of the variance of a random variable, which is given as Itô integrals. We will show one example here: find the variance of the following Itô process:

👁 Eq 3.10 what's the variance of this?

Eq 3.10 what’s the variance of this?

Using the properties of the Wiener process, it is easy to see that the expected value of X(t) given by Equation 3.10 is zero. And note that since we are calculating the variance for each time t, t is considered to be a constant here. The variance is calculated as follows using Itô isometry:

👁 The result.

The result.

Itô’s lemma

Itô’s lemma gives us a way to find the differential of a stochastic equation (note that we use "differential" instead of "derivative" since a stochastic process is not differentiable), in some literature, we can see that it is described as "a stochastic counterpart of the chain rule in calculus". The motivation is that for a non-stochastic composed function of time u = f ∘ g = f(g(t)), t ≥ 0, where f and g are differentiable, calculating the derivative of u is easy, we just need to apply the chain rule:

👁 Eq. 4.1 Chain rule.

Eq. 4.1 Chain rule.

However, this will not work for stochastic equations. In this case, what we need is Itô’s lemma, which is given as follows:

Let {X(t)} be an Itô process given by stochastic differential equation (SDE) Equation 3.2. Let g(t, x):[0, ∞)× ℝ → ℝ be a twice continuously differentiable function. Then the random process

👁 Image

is an Itô process with stochastic differential

👁 Eq. 4,2 SDE representation of Y(t) by Itô's lemma

Eq. 4,2 SDE representation of Y(t) by Itô’s lemma

and the following rules apply:

👁 Image

In Equation 4.2, the first two terms are from the ordinary chain rule, which applies if X(t) is a second-order differentiable function. In the meanwhile, the last term is a correction term and is new to the stochastic process, and (dX(t))² is the quadratic variation of the process X(t).

Examples of applying Itô’s lemma

To understand how Itô’s lemma works, we can look at the following examples.

Finding the SDE of a stochastic process. Considering the stochastic process

👁 Image

The method of derivating a non-stochastic function will not work. We need to apply Itô’s lemma here. How do we do it? The first step is to construct the helper function (this is a very informal term) g(X(t), t). Here we use

👁 Eq. 4.2 The "helper function".

Eq. 4.2 The "helper function".

We wrote g(x, t) instead of g(X(t), t), just to make it look nicer because when we calculate the partial derivative of g(x, t), we consider it as a function with two variables x and t and forget about the fact that x is a function of t for a while. The next step is as follows

👁 The derivatives of Equation 4.2.

The derivatives of Equation 4.2.

where gₓ denotes the first-order derivative of g(x, t) with regard to the parameter x and gₓₓ is the second-order derivative of g(x, t) with regard to x, and so on. And in this case, the role x plays is actually W(t). Plugging the derivatives into Equation 4.2, we easily get

👁 The Itô's formula.

The Itô’s formula.

This means

👁 Image

Transforming one Itô process into another. This example is a little bit more complicated than the previous one because we need to do some substitutions. Suppose the stock price S(t) follows a geometric Brownian motion:

👁 Eq 4.3 Model of stock price

Eq 4.3 Model of stock price

find SDE for ln Sₜ.

Which helper function should we choose here is obvious, we let g(x, t) = ln x and calculate the necessary derivatives

👁 Image

Applying Itô’s lemma, we get

👁 Eq 4.4 After applying Itô's lemma to Equation 4.3

Eq 4.4 After applying Itô’s lemma to Equation 4.3

and we can substitute dS(t) to deal with the part in the red square. From Equation 4.3 we can easily get

👁 Eq 4.5

Eq 4.5

Substituting Equation 4.5 into Equation 4.4, we focus on the part in the red square: we expand it, and then the rules stated in Itô’s lemma can be used:

👁 Image

plugging this expression back to Equation 4.4 gets us

👁 Eq 4.6 The result.

Eq 4.6 The result.

where the expression in the blue square is directly from Equation 4.3. (Some of the steps of this substitution are simplified). After integrating both sides of Equation 4.6, we get the solution of the geometric Brownian motion. Additionally, from Equation 4.3, suppose the initial stock price is S₀, then we have

👁 Eq 4.7

Eq 4.7

where d ln S(t) is replaced by the difference between the log return at time t and the initial value. (Recall that "d" means difference.) We can see that the logarithm of a geometric Brownian motion is a Wiener process with drift. Equation 4.7 tells us the conditional distribution (under the condition of the initial price) of log returns is normal. Using the properties of the Wiener process, we can easily read off the conditional mean and variance of the distribution

👁 Eq 4.8 The conditional distribution of the log return.

Eq 4.8 The conditional distribution of the log return.

which means the returns have lognormal distribution.

Using Itô’s lemma to identify martingale. Consider the process

👁 Eq 4.9

Eq 4.9

Is S(t) a martingale? The first step is to find the SDE and its integral form of it. The helper function here is

👁 Image

which has the same form as S(t) and is a straightforward choice. Then we calculate the necessary derivatives and do the routine of plugging into the Itô formula:

👁 Eq 4.10 The SDE of Equation 4.9

Eq 4.10 The SDE of Equation 4.9

And the integral form is

👁 Eq 4.11 The integral form of Equation 4.10.

Eq 4.11 The integral form of Equation 4.10.

In Equation 4.11, the integral on the right is a martingale, but the one on the left is not (for the reason, see here). Therefore the stochastic process S(t) cannot be a martingale.

Summary

In this post, we show the definition of Itô’s lemma along with the context: Itô integrals, Itô processes, stochastic differential equations, and some prerequisites, which are filtration, adapted the process, martingale, and quadratic variation. Three applications of the Itô lemma are shown at the end, which are: 1. finding the SDE of a stochastic process; 2. transforming one stochastic process into another; 3. checking whether a random process is a martingale.

References

[1] Etheridge, A., & Baxter, M. (2002). A course in financial calculus. Cambridge University Press.

[2] Øksendal, B. (2003). _Stochastic differential equations_. In Stochastic differential equations (pp. 65–84). Springer, Berlin, Heidelberg.

[3] Florescu, I. (2014). Probability and stochastic processes. John Wiley & Sons.

[4] _Stochastic processes and their classification_. Accessed on 20 May 2022.

[5] Example of filtration in probability theory. Accessed on 12 June 2022.

[6] Whitt, W. (2007). A Quick Introduction to Stochastic Calculus. Accessed on 13 June 2022.

[7] Construction of the Itô-integral in Øksendals book. Accessed on 16 June 2022.

[8] Diffusion theory. Accessed on 18 June.

[9] Stochastic integration. Accessed on 19 June.

Written By

Xichu Zhang

See all from Xichu Zhang

Brownian Motion, Deep Dives, Ito, Mathematical Analysis, Stochastic Process

Share This Article

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

URL: https://towardsdatascience.com/an-introduction-to-stochatic-processes-8c0b51ca73a9/