An Introduction to Stochastic Processes (1)
Ito's lemma: definition and application
In this post, the main topic is Itô’s lemma, which plays an important role in financial mathematics and is a useful tool for dealing with stochastic processes. A lot of articles and documents can be found about this topic, but very few of them include the introduction of the background, such as the Wiener process, filtration, and so on. They usually assume that the readers are already familiar with those concepts. Of course, it is easy to just search for the related context, but it takes quite a while to put all the things together. This is what this post tries to help with – connect Itô’s lemma to the related concepts about stochastic processes.
Prerequisite
Wiener process
The first important concept we are introducing is the Wiener process, which is part of the Itô process (we will see that in the Itô process it serves as an integrator). It is defined as a stochastic process (or random process, a collection of random variables ordered by an index set [4]) with the following four properties:
- The initial value W(0) = 0.
- The Wiener process is almost surely continuous (but not differentiable): with probability 1, the function t → W(t) is continuous in t. (We say "almost", since there can be a set of t with Lebesgue measure of 0, at which the function is not continuous.) It applies that absolute continuity implies almost surely continuity. For an example of a Wiener process that is almost surely continuous but not continuous, see this StackExchange answer.
- The process _{W(t)}_ₜ≥₀ __ has stationary, independent increments.
- The increments W(t+s)-W(s) have the normal distribution N(0, s) (We use this to denote normal distribution with mean μ=0 and variance σ²=t). It is important to note that the increments are independent, only when they do not overlap. And it can be trivially derived that the distribution of a Wiener process W(t) (starting from W(0)) is N(0, t).
Filtration and martingale
The definition of filtration is very straightforward. Considering a probability space _(Ω, 𝓕, P) (_see here, for the basics about probability space and the meaning of the notation), f_iltration i_s a sequence of σ-algebras _𝓕₀, 𝓕₁, …, 𝓕ₙ, wit_h the property 𝓕₀ ⊂ _𝓕₁ ⊂ … ⊂ 𝓕ₙ. Howev_er, it seems very abstract. Intuitively, a filtration contains all the information up to time t. To u_n_derstand it better, we can use the following example (It’s lengthy but definitely worth reading because without understanding what is filtration, it’s not possible to understand the other definitions built upon it):
We toss two dice, and let the S = {1, 2, 3, 4, 5, 6}, and let the sample space be Ω=S×S, which is the set of all the possible outcomes of two tosses (every outcome is an ordered pair). At time zero, we don’t toss any dice, the σ-algebra is 𝓕₀={∅, Ω}, because, in this state, no additional information (the word "information" refers to "what can happen" at this step) is added to the σ_-algebra, we only know that (1) any outcome is possible – Ω ∈ 𝓕₀,_ where Ω _i_s the event (we know that an event is an element in a σ-_al_gebra), and obviously P(Ω) = 1; (2) none of the outcome— ∅∈ _𝓕₀, w_hich is the complement of Ω (_w_e know that in a σ-a_lg_ebra if A ∈ _Ω, t_hen Ā ∈ Ω). Since if the dice are thrown, there will for sure be some outcome, the event ∅ has a probability of zero, P(∅) = 0.
At the time of the first toss, information of the first toss will be added to the _σ-_algebra – 𝓕₁ contains all the possible events after the first toss. For example, {2, 4, 6}×S ∈ 𝓕₁, because it might be the case that the outcome of the first toss is even, which corresponds to the event {2, 4, 6}, as for the second toss, it can be anything in S. __ Analogously, also {1, 3, 5}×S ∈ 𝓕₁, in this case, the outcome of the first toss is odd and corresponds to the event {1, 3, 5}. In the same way, we can get the conclusion that 𝓕₁ _= 𝒫(S) ×S (2⁶_ e_le_ments), wh_e_re 𝒫(S) _deno_tes the power set of S, which means all the possible events of the first toss.
At the time of the second toss, there is information on both the first and second toes. 𝓕₂ contains all the possible events after the second toss. This means 𝓕₁ = 𝒫(_S×S) (2³_⁶ _elements). [5_]
The natural filtration or the generated filtration related to a random process X(t) is a filtration, such that at each time t, the random process X(t) is 𝓕ₜ-measurable. Formally,
here the Borel algebra is used since the state parameter of the random process is usually the real line, whose σ-algebra is the Borel algebra. And we say that the random process X(t) is adapted to the filtration 𝓕ˣ. An important fact is that
a stochastic process X is always adapted to its natural filtration.
Also, 𝓕ₜ is the smallest σ_-_algebra, such that Property 1.0 is satisfied. Intuitively, 𝓕ₜ represents all the information available at time t, which means the filtration 𝓕ˣ _represents the evolution of the information (this is explained via the example above) of the stochastic process with time and the value of X(t) de_pends only on the evolution prior to t.__
Since we know what is filtration and adapted processes, it’s easy to understand what is martingale now. Consider an adapted stochastic process X(t), 0 ≤ t≤ T. The process X(t) is a martingale if it has no tendency to rise or fall. Formally,
And we have the following observation:
The Wiener process is a martingale with respect to its natural filtration.
This can be shown as follows
Another possible way to show that a process is a martingale is via Itô’s lemma, which we will see later when we get to the application of Itô’s lemma.
Quadratic variation
The quadratic variation is a kind of variation of a random process (an example of a different variation is the linear variation). Let X(t) be any process, the quadratic variation of X(t) again a random process and is defined as
where p = {0=t₀ < t₁ < … < tₙ = T} is a partition, which ranges over the partition of the time interval [0, T], and ||p|| denotes the mesh which is the largest interval in the partition:
It can be shown that the quadratic variation of a Wiener process W,W = t almost surely.
Itô integral
Firstly, we will introduce what is Itô integral and how it is constructed. It is just like the normal integral, but the integration is stochastic. It generalizes the Riemann-Stieltjes integral, which is a generalization of the Riemann integral.
To be more precise: The generalization of Riemann integration happens in the way that the "integrator" is replaced by a function, which was originally an infinitesimal number (I actually haven’t seen the term "integrator" being used in the context of Rieman integral, but Fig 2.1 will make things clear); And the Itô integral is a stochastic generalization of the Riemann–Stieltjes integral.
The definition of Itô integral
What the Itô integral actually is, is shown in the following definition:
Let f ∈ 𝒱(S, T). Then the Itô integral of f __ (from S to T) is defined by
where {ϕₙ} is a sequence of simple functions such that
Note that Wₜ, denoting the Wiener process, is sometimes replaced by Bₜ, denoting the Brownian motion. [1] We don’t need to worry about this since they are the same thing. In the rest of the article, we will take S=0, which is usually the case – the process starts from t=0. The simple function is mentioned in this article, which is sometimes also called the "elementary function". The definition of a simple function is given in [2]:
A function _ϕ∈ 𝒱 i_s _c_alled simple if it has the form
χ denotes the characteristic(indicator) function. It is not completely clear, whether "simple function" and "elementary function" are the same. According to [3], they are the same thing; According to this, for elementary functions, the conditions are more relaxed on the sets, on which χ is defined: there are a countably infinite number of them, but for simple functions, there can be only a finite number of them. But this doesn’t cause any trouble for us, because in Def 2.4, there are countably infinite many sets [tⱼ, tⱼ₊₁), both terms will work, and we will stick with the "simple function".
The 𝒱 in the definition stands for the class of functions
satisfying the following conditions:
- (t, ω) → f(t, ω) is _𝓑×𝓕-measurable, wh_ere 𝓑 _denotes the Borel σ-algebra on [0, ∞)._
- f(t, ω) is _𝓕ₜ-_adapted.
- f(t, ω) is an L² -function: it is square-integrable and the integral infinite.
Expression 2.3 might seem to be strange but it intuitively means that {ϕₙ} is __ a good approximation of f(t, ω) – imagine ϕₙ to be a step function, when the steps become infinitesimally small, ϕₙ is the same as f(t, ω). Mathematically, Expression 2.3 means that the Itô integral is defined as the L²-limit of the sequence of the Itô integral of the simple functions.
Itô process and diffusion process
An Itô process or stochastic integral is a stochastic process on (Ω, 𝓕, P) adapted to 𝓕ₜ, which can be written in the form
where functions U, V ∈ 𝓛₂. We can see that the first part – integration of function U is deterministic. And it is a Riemann integral. The second part – integration of function V is stochastic and is the only source of noise in process Xₜ, which is an Itô integral.
Equation 3.1 can be also written in the shorted form, in terms of stochastic differential:
And the stochastic differential equation (SDE) for a stochastic process X(t) is given by
An SDE given in the form of Equation 3.3 is known as an Itô type SDE. (Surely there are also other different types of SDE, such as the Stratonovich SDE.) And a solution to such an SDE is called Itô diffusion, which is a special case of diffusion processes. A process is a diffusion process if
- The time variable t is continuous. Usually, we assume that t is defined for all the non-negative real numbers.
- It is a Markov process with a transition cumulative distribution function
and transition probability density, which is defined as the partial derivative (if exists) of Equation 3.3.1
- Xₜ is a continuous function of t.
The properties of the diffusion processes are completely determined by two characteristics: the infinitesimal mean and infinitesimal variance, together with the possible boundary conditions. The infinitesimal mean is defined by
and the infinitesimal variance is defined by
Both of those characteristics can be directly read off from Equation 3.3.
Examples of SDE
One very important example of SDE is as follows
The solution of X(t) given by Equation 3.4 is geometric Brownian motion. It is solved by Itô’s lemma, which will be shown in the second example of the application of Itô’s lemma. The solution to Equation 3.4 with the initial condition X(0) = x₀ is given by
Another important example of SDE is the Ornstein-Uhlenbeck process, which is given by
where m, α, σ ∈ ℝ are some constants. The intuitive physical meaning of Equation 3.6 is the motion of a particle that drifts towards the level m with the noise of intensity σ. The drift is positive when Xₜ > m, and negative when Xₜ <m. Due to this feature, the Ornstein-Uhlebeck process is also a mean-reverting process. In finance, the Ornstein-Uhlebeck process is used in the Vasiček model for the instantaneous interest rate, where σ is the volatility, α is the speed of reversion and m is the long-term mean level. The solution is
assuming that the process starts from t=0.
The Itô isometry
The idea of the definition of Itô integral was to define the integral using a simple class of functions first and then extend it to the whole class𝒱. For this purpose, we can use Itô isometry.
The lemma of the Itô isometry is formulated as follows: if ϕ(t, ω) is a simple function and bounded then
The steps of the construction are talked about in detail in [2], but here we will only show the ideas of the proof: we want to prove that the sequence of the stochastic integrals of simple functions
forms a Cauchy sequence (the elements become arbitrarily close to each other as the sequence goes on) in the L²-space of random variables with the finite second moment. Since the L²-space is complete (we mention this since, in a complete metric space, every Cauchy sequence converges to an element in that metric space), the sequence given by Equation 3.7 converges to a random variable in the L²-space. Now we try to prove that the limit is Cauchy:
From here, the property
we can immediately see that
combing Equation 3.8 and Equation 3.9, it’s proved that the sequence given in Equation 3.7 is a Cauchy sequence. [7]
Another important application of Itô isometry is the calculation of the variance of a random variable, which is given as Itô integrals. We will show one example here: find the variance of the following Itô process:
Using the properties of the Wiener process, it is easy to see that the expected value of X(t) given by Equation 3.10 is zero. And note that since we are calculating the variance for each time t, t is considered to be a constant here. The variance is calculated as follows using Itô isometry:
Itô’s lemma
Itô’s lemma gives us a way to find the differential of a stochastic equation (note that we use "differential" instead of "derivative" since a stochastic process is not differentiable), in some literature, we can see that it is described as "a stochastic counterpart of the chain rule in calculus". The motivation is that for a non-stochastic composed function of time u = f ∘ g = f(g(t)), t ≥ 0, where f and g are differentiable, calculating the derivative of u is easy, we just need to apply the chain rule:
However, this will not work for stochastic equations. In this case, what we need is Itô’s lemma, which is given as follows:
Let {X(t)} be an Itô process given by stochastic differential equation (SDE) Equation 3.2. Let g(t, x):[0, ∞)× ℝ → ℝ be a twice continuously differentiable function. Then the random process
is an Itô process with stochastic differential
and the following rules apply:
In Equation 4.2, the first two terms are from the ordinary chain rule, which applies if X(t) is a second-order differentiable function. In the meanwhile, the last term is a correction term and is new to the stochastic process, and (dX(t))² is the quadratic variation of the process X(t).
Examples of applying Itô’s lemma
To understand how Itô’s lemma works, we can look at the following examples.
- Finding the SDE of a stochastic process. Considering the stochastic process
The method of derivating a non-stochastic function will not work. We need to apply Itô’s lemma here. How do we do it? The first step is to construct the helper function (this is a very informal term) g(X(t), t). Here we use
We wrote g(x, t) instead of g(X(t), t), just to make it look nicer because when we calculate the partial derivative of g(x, t), we consider it as a function with two variables x and t and forget about the fact that x is a function of t for a while. The next step is as follows
where gₓ denotes the first-order derivative of g(x, t) with regard to the parameter x and gₓₓ is the second-order derivative of g(x, t) with regard to x, and so on. And in this case, the role x plays is actually W(t). Plugging the derivatives into Equation 4.2, we easily get
This means
- Transforming one Itô process into another. This example is a little bit more complicated than the previous one because we need to do some substitutions. Suppose the stock price S(t) follows a geometric Brownian motion:
find SDE for ln Sₜ.
Which helper function should we choose here is obvious, we let g(x, t) = ln x and calculate the necessary derivatives
Applying Itô’s lemma, we get
and we can substitute dS(t) to deal with the part in the red square. From Equation 4.3 we can easily get
Substituting Equation 4.5 into Equation 4.4, we focus on the part in the red square: we expand it, and then the rules stated in Itô’s lemma can be used:
plugging this expression back to Equation 4.4 gets us
where the expression in the blue square is directly from Equation 4.3. (Some of the steps of this substitution are simplified). After integrating both sides of Equation 4.6, we get the solution of the geometric Brownian motion. Additionally, from Equation 4.3, suppose the initial stock price is S₀, then we have
where d ln S(t) is replaced by the difference between the log return at time t and the initial value. (Recall that "d" means difference.) We can see that the logarithm of a geometric Brownian motion is a Wiener process with drift. Equation 4.7 tells us the conditional distribution (under the condition of the initial price) of log returns is normal. Using the properties of the Wiener process, we can easily read off the conditional mean and variance of the distribution
which means the returns have lognormal distribution.
- Using Itô’s lemma to identify martingale. Consider the process
Is S(t) a martingale? The first step is to find the SDE and its integral form of it. The helper function here is
which has the same form as S(t) and is a straightforward choice. Then we calculate the necessary derivatives and do the routine of plugging into the Itô formula:
And the integral form is
In Equation 4.11, the integral on the right is a martingale, but the one on the left is not (for the reason, see here). Therefore the stochastic process S(t) cannot be a martingale.
Summary
In this post, we show the definition of Itô’s lemma along with the context: Itô integrals, Itô processes, stochastic differential equations, and some prerequisites, which are filtration, adapted the process, martingale, and quadratic variation. Three applications of the Itô lemma are shown at the end, which are: 1. finding the SDE of a stochastic process; 2. transforming one stochastic process into another; 3. checking whether a random process is a martingale.
References
[1] Etheridge, A., & Baxter, M. (2002). A course in financial calculus. Cambridge University Press.
[2] Øksendal, B. (2003). _Stochastic differential equations_. In Stochastic differential equations (pp. 65–84). Springer, Berlin, Heidelberg.
[3] Florescu, I. (2014). Probability and stochastic processes. John Wiley & Sons.
[4] _Stochastic processes and their classification_. Accessed on 20 May 2022.
[5] Example of filtration in probability theory. Accessed on 12 June 2022.
[6] Whitt, W. (2007). A Quick Introduction to Stochastic Calculus. Accessed on 13 June 2022.
[7] Construction of the Itô-integral in Øksendals book. Accessed on 16 June 2022.
[8] Diffusion theory. Accessed on 18 June.
[9] Stochastic integration. Accessed on 19 June.
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS