Description
Recent years have seen an explosion in the amount of financial high frequency data. These are the records of transactions and quotes for stocks, bonds, currencies, options, and other financial instruments.
The Econometrics of High Frequency Data
?
Per A. Mykland and Lan Zhang
This version: February 22, 2009
?
Financial support from the National Science Foundation under grants DMS 06-04758 and SES 06-31605 is grate-
fully acknowledged. We would also like to thank Hong Kong University of Science and Technology, where part of the
manuscript was written.
The Econometrics of High Frequency Data 1
1 Introduction
1.1 Overview
This is a course on estimation in high frequency data. It is intended for an audience that includes
interested people in ?nance, econometrics, statistics, probability and ?nancial engineering.
There has in recent years been a vast increase in the amount of of high frequency data available.
There has also been an explosion in the literature on the subject. In this course, we start from
scratch, introducing the probabilistic model for such data, and then turn to the estimation question
in this model. We shall be focused on the (for this area) emblematic problem of estimating volatil-
ity. Similar techniques to those we present can be applied to estimating leverage e?ects, realized
regressions, semivariances, do analyses of variance, detect jumps, measure liquidity by measuring
the size of the microstructure noise, and many other objects of interest.
The applications are mainly in ?nance, ranging from risk management to options hedging (see
Section 2.6 below), execution of transactions, portfolio optimization (Fleming, Kirby, and Ostdiek
(2001, 2003)), and forecasting. The latter literature has been particularly active, with contributions
including Andersen and Bollerslev (1998), Andersen, Bollerslev, Diebold, and Labys (2001, 2003),
Andersen, Bollerslev, and Meddahi (2005), Dacorogna, Gen¸cay, M¨ uller, Olsen, and Pictet (2001),
Meddahi (2001). Methodologies based on high frequency data can also be found in neural science.
The purpose of this article, however, is not so much to focus on the applications as on the
probabilistic setting and the estimation methods. The theory was started, on the probabilistic side,
by Jacod (1994) and Jacod and Protter (1998), and on the econometric side by Foster and Nelson
(1996) and Comte and Renault (1998). The econometrics of integrated volatility was pioneered
in Andersen, Bollerslev, Diebold, and Labys (2001, 2003), Barndor?-Nielsen and Shephard (2002,
2004b) and Dacorogna, Gen¸cay, M¨ uller, Olsen, and Pictet (2001). The authors of this article
started to work in the area through Zhang (2001), Zhang, Mykland, and A¨?t-Sahalia (2005), and
Mykland and Zhang (2006). For further references, see Section 5.5.
This article is meant to be a moderately self-contained course into the basics of this material.
The introduction assumes some degree of statistics/econometric literacy, but at a lower level than
the standard probability text. Some of the material is research front and not published elewhere.
This is not meant as a full review of the area. Readers with a good probabilistic background can
skip most of Section 2, and occasional other sections.
The text also mostly overlooks the questions that arise in connection with multidimensional
processes. For further literature in this area, one should consult Barndor?-Nielsen and Shephard
(2004a), Hayashi and Yoshida (2005) and Zhang (2005).
The Econometrics of High Frequency Data 2
1.2 High Frequency Data
Recent years have seen an explosion in the amount of ?nancial high frequency data. These are
the records of transactions and quotes for stocks, bonds, currencies, options, and other ?nancial
instruments.
A main source of such data is the Trades and Quotes (TAQ) database, which covers the stocks
traded on the New York Stock Exchange (NYSE). For example, here is an excerpt of the transactions
for Monday, April 4, 2005, for the pharmaceutical company Merck (MRK):
MRK 20050405 9:41:37 32.69 100
MRK 20050405 9:41:42 32.68 100
MRK 20050405 9:41:43 32.69 300
MRK 20050405 9:41:44 32.68 1000
MRK 20050405 9:41:48 32.69 2900
MRK 20050405 9:41:48 32.68 200
MRK 20050405 9:41:48 32.68 200
MRK 20050405 9:41:51 32.68 4200
MRK 20050405 9:41:52 32.69 1000
MRK 20050405 9:41:53 32.68 300
MRK 20050405 9:41:57 32.69 200
MRK 20050405 9:42:03 32.67 2500
MRK 20050405 9:42:04 32.69 100
MRK 20050405 9:42:05 32.69 300
MRK 20050405 9:42:15 32.68 3500
MRK 20050405 9:42:17 32.69 800
MRK 20050405 9:42:17 32.68 500
MRK 20050405 9:42:17 32.68 300
MRK 20050405 9:42:17 32.68 100
MRK 20050405 9:42:20 32.69 6400
MRK 20050405 9:42:21 32.69 200
MRK 20050405 9:42:23 32.69 3000
MRK 20050405 9:42:27 32.70 8300
MRK 20050405 9:42:29 32.70 5000
MRK 20050405 9:42:29 32.70 1000
MRK 20050405 9:42:30 32.70 1100
“Size” here refers to the number of stocks that changed hands in the given transaction. This is
often also called “volume”.
There are 6302 transactions recorded for Merck for this day. On the same day, Microsoft
The Econometrics of High Frequency Data 3
(MSFT) had 80982 transactions. These are massive amounts of data. What can we do with such
data? This course is about how to approach this question.
1.3 A First Model for Financial Data: The GBM
Finance theory suggests the following description of prices, that they must be so-called semimartin-
gales. We defer a discussion of the general concept until later (see also Delbaen and Schachermayer
(1995)), and go instead to the most commonly used such semimartingale: the Geometric Brownian
Motion (GBM).
Set
X
t
= log S
t
= the logarithm of the stock price S
t
at time t. (1)
The GBM model is now that
X
t
= X
0
+µt +?W
t
, (2)
where µ and ? are constants, and W
t
is a Brownian Motion (BM), a concept we now de?ne. The
“time zero” is an arbitrary reference time.
De?nition 1. The process (W
t
)
0?t?T
is a Brownian motion provided
(1) W
0
= 0;
(2) t ?W
t
is a continuous function of t;
(3) W has independent increments: if t > s > u > v, then W
t
?W
s
is independent of W
u
?W
v
;
(4) for t > s, W
t
?W
s
is normal with mean zero and variance t ?s (N(0,t-s)).
1.4 Estimation in the GBM model
It is instructive to consider estimation in this model. We take time t = 0 to be the beginning of
the trading day, and time t = T to be the end of the day.
Let’s assume that there are n observations of the process (transactions). We suppose for right
now that the transactions are spaced equally in time, so that an observation is had every ?t
n
= T/n
units of time. This assumption is quite unrealistic, but it helps a straightforward development which
can then be modi?ed later.
The observations (log transaction prices) are therefore X
t
n,i
, where t
n,i
= i?t
n
. If we take
di?erences, we get observations
?X
t
n,i+1
= X
t
n,i+1
?X
t
n,i
, i = 0, ..., n ?1. (3)
The ?X
t
n,i+1
are independent and identically distributed (iid) with law N(µ?t
n
, ?
2
?t
n
). The
The Econometrics of High Frequency Data 4
natural estimators are:
ˆ µ
n
=
1
n?t
n
n?1
i=0
?X
t
n,i+1
= (X
T
?X
0
)/T both MLE and UMVU; and
ˆ ?
2
n,MLE
=
1
n?t
n
n?1
i=0
(?X
t
n,i+1
??X
tn
)
2
MLE; or (4)
ˆ ?
2
n,UMV U
=
1
(n ?1)?t
n
n?1
i=0
(?X
t
n,i+1
??X
tn
)
2
UMVU.
Here, MLE is the maximum likelihood estimator, and UMVU is the uniformly minimum variance
unbiased estimator (see Lehmann (1983) or Rice (2006)). Also, ?X
tn
=
1
n
n?1
i=0
?X
t
n,i+1
= ˆ µ
n
?t
n
.
The estimators (4) clarify some basics. First of all, µ cannot be consistently estimated for ?xed
length T of time interval. In fact, the ˆ µ
n
does not depend on n, but only on T and the value of
the process at the beginning and end of the time period. This is reassuring from a common sense
perspective. If we could estimate µ for actual stock prices, we would be very rich!!! – Of course, if
T ??, then µ can be estimated consistently.
It is perhaps more surprising that ?
2
can be estimated consistently for ?xed T, as n ? ?. In
other words, ˆ ?
2
n
p
??
2
as n ??. Set U
n,i
= ?X
t
n,i
/(??t
1/2
n
). Then the U
n,i
are iid with distribution
N((µ/?)?t
1/2
n
, 1). It follows from standard considerations for normal random variables that
n?1
i=0
(U
n,i
?
¯
U
n,·
)
2
is ?
2
distributed with n ?1 degrees of freedom. Hence, for the UMVU estimator,
ˆ ?
2
n
= ?
2
?t
n
1
(n ?1)?t
n
n?1
i=0
(U
n,i
?
¯
U
n,·
)
2
= ?
2
?
2
n?1
n ?1
.
It follows that
E(ˆ ?
2
n
) = ?
2
and Var(ˆ ?
2
n
) =
?
4
n ?1
, (5)
since E?
2
m
= m and Var(?
2
m
) = 2m. Hence ˆ ?
2
n
is consistent for ?
2
: ˆ ?
2
n
? ?
2
in probability as
n ??.
Similarly, since ?
2
n?1
is the sum of n ?1 iid ?
2
1
random variables, by the central limit theorem
we have the following convergence in law:
?
2
n?1
?E?
2
n?1
_
Var(?
2
n?1
)
=
?
2
n?1
?(n ?1)
_
2(n ?1)
L
? N(0, 1), (6)
The Econometrics of High Frequency Data 5
and so
n
1/2
(ˆ ?
2
n
??
2
) ? (n ?1)
1/2
(ˆ ?
2
n
??
2
)
=
?
2?
2
?
2
n?1
?(n ?1)
_
2(n ?1)
L
? ?
2
N(0, 2) = N(0, 2?
4
). (7)
This provides an asymptotic distribution which permits the setting of intervals. For example,
?
2
= ˆ ?
2
n
±1.96
?
2ˆ ?
2
n
would be an asymptotic 95 % con?dence interval for ?
2
.
Since ˆ ?
2
n,MLE
=
n?1
n
ˆ ?
2
n,UMV U
, the same asymptotics apply to the MLE.
1.5 Behavior of Non-Centered Estimators
The above discussion of ˆ ?
2
n,UMV U
and ˆ ?
2
n,MLE
is exactly the same as in the classical case of es-
timating variance on the basis of iid observations. More unusually, for high frequency data, the
mean is often not removed in estimation. The reason is as follows. Set
ˆ ?
2
n,nocenter
=
1
n?t
n
n?1
i=0
(?X
t
n,i+1
)
2
. (8)
Now note that for the MLE version of ˆ ?
n
,
ˆ ?
2
n,MLE
=
1
n?t
n
n?1
i=0
(?X
t
n,i+1
??X
tn
)
2
=
1
n?t
n
_
n?1
i=0
(?X
t
n,i+1
)
2
?n(?X
tn
)
2
_
= ˆ ?
2
n,nocenter
??t
n
ˆ µ
2
n
= ˆ ?
2
n,nocenter
?
T
n
ˆ µ
2
n
.
Since ˆ µ
2
n
does not depend on n, it follows that
n
1/2
_
ˆ ?
2
n,MLE
? ˆ ?
2
n,nocenter
_
p
? 0.
Hence, ˆ ?
2
n,nocenter
is consistent and has the same asymptotic distribution as ˆ ?
2
n,UMV U
and ˆ ?
2
n,MLE
.
It can therefore also be used to estimate variance. This is quite common for high frequency data.
1.6 GBM and the Black-Scholes-Merton formula
The GBM model is closely tied in to other parts of ?nance. In particular, following the work of
Black and Scholes (1973), Merton (1973), Harrison and Kreps (1979), and Harrison and Pliska
The Econometrics of High Frequency Data 6
(1981), precise option prices can be calculated in this model. See also Du?e (1996), Neftci (2000),
Øksendal (2003), or Shreve (2004) for book sized introductions to the theory.
In the case of the call option, the price is as follows. A European call option on stock S
t
with
maturity (expiration) time T and strike price K is the option to buy one unit of stock at price K
at time T. It is easy to see that the value of this option at time T is (S
T
?K)
+
, where x
+
= x if
x ? 0, and x
+
= 0 otherwise.
If we make the assumption that S
t
is a GBM, which is to say that it follows (1)-(2), and also
the assumption that the short term interest rate r is constant (in time), then the price at time t,
0 ? t ? T of this option must be
price = C(S
t
, ?
2
(T ?t), r(T ?t)), (9)
where
C(S, ?, R) = S?(d
1
) ?K exp(?R)?(d
2
), where
d
1,2
= (log(S/K) +R ±?/2) /
?
? and (10)
?(x) = P(N(0, 1) ? x) the standard normal cdf.
This is the Black-Scholes-Merton formula.
We shall see later on how high frequency estimates can be used in this formula. For the moment,
note that the price only depends on quantities that are either observed (the interest rate r) or nearly
so (the volatility ?
2
). It does not depend on µ. Unfortunately, the assumption of constant r and
?
2
is unrealistic, as we shall discuss in the following.
The GBM model is also heavily used in portfolio optimization
1.7 Our Problem to be Solved: Inadequacies in the GBM Model
We here give a laundry list of questions that arise and have to be dealt with.
1.7.1 The Volatility Depends on t
It is empirically the case that ?
2
depends on t. We shall talk about the instantaneous volatility ?
2
t
.
This concept will be de?ned carefully in Section 2.
1.7.2 The Volatility is Random; Leverage E?ect
Returns are usually assumed to be non-normal. Such behavior can for the most part be modeled
as ?
2
t
having random evolution. It is also usually assumed that ?
2
t
can be correlated with the (log)
The Econometrics of High Frequency Data 7
stock price. This is often referred to as Leverage E?ect. More about this in Section 2.
1.7.3 Jumps
The GBM model assumes that the log stock price X
t
is continuous as a function of t. The evolution
of the stock price, however, is often thought to have a jump component. The treatment of jumps
is largely not covered in this article, though there is some discussion in Section 6.4.1, which also
gives some references. Note that jumps and random volatility are often confounded, since any
martingale can be embedded in a Brownian motion (Dambis (1965), Dubins and Schwartz (1965),
see also Mykland (1995) for a review and further discussion).
1.7.4 Non-Normal Returns
Most non-normal behavior can be explained though random volatility and/or jumps. It would be
unusual to need more extensive modeling.
1.7.5 Microstructure Noise
An important feature of actual transaction prices is the existence of microstructure noise. Transac-
tion prices, as actually observed are typically best modeled on the form Y
t
= log S
t
= the logarithm
of the stock price S
t
at time t, where for transaction at time t
i
,
Y
t
i
= X
t
i
+ noise, (11)
and X
t
is a semimartingale. This is often called the hidden semimartingale model. This issue is an
important part of our narrative, and is further discussed in Section 5, see also Section 6.4.2.
1.7.6 Unequally Spaced Observations
In the above, we assumed that the transaction times t
i
are equally spaced. A quick glance at the
data snippet in Section 1.2 reveal that this is typically not the case. This leads to questions that
will be addressed as we go along.
1.8 A Note on Probability Theory
We will extensively use probability theory in these notes. To avoid making a long introduction on
stochastic processes, we will de?ne concepts as we need them, but not always in the greatest depth.
We will also omit other concepts and many basic proofs. As a compromise between the rigorous
The Econometrics of High Frequency Data 8
and the intuitive, we follow the following convention, that the notes will (except when the opposite
is clearly stated) use mathematical terms as they are de?ned in Jacod and Shiryaev (2003). Thus,
in case of doubt, this work can be consulted.
Other recommended reference books on stochastic process theory are Karatzas and Shreve
(1991), Øksendal (2003), Protter (2004), and Shreve (2004). For introduction to measure theoretic
probability, one can consult Billingsley (1995).
2 A More General Model: Time varying Drift and Volatility
2.1 Stochastic Integrals, Itˆo-Processes
We here make some basic de?nitions. We consider a process X
t
, where the time variable t ? [0, T].
We mainly develop the univariate case here.
2.1.1 Information Sets, ?-?elds, Filtrations
Information is usually described with so-called ?-?elds. The setup is as follows. Our basic space
is (?, T), where ? is the set of all possible outcomes ?, and T is the collection of subsets A ? ?
that will eventually be decidable (it will be observed whether they occured or not). All random
variables are thought to be a function of the basic outcome ? ? ?.
We assume that T is a so-called ?-?eld. In general,
De?nition 2. A collection / of subsets of ? is a ?-?eld if
(i) ?, ? ? /;
(ii) if A ? /, then A
c
= ? ?A ? /; and
(iii) if A
n
, n = 1, 2, ... are all in /, then ?
?
n=1
A
n
? /.
If one thinks of / as a collection of decidable sets, then the interpretation of this de?nition is
as follows:
(i) ?, ? are decidable (? didn’t occur, ? did);
(ii) if A is decidable, so is the complement A
c
(if A occurs, then A
c
does not occur, and vice versa);
(iii) if all the A
n
are decidable, then so is the event ?
?
n=1
A
n
(the union occurs if and only if at least
one of the A
i
occurs).
A random variable X is called /-measurable if the value of X can be decided on the basis of the
information in /. Formally, the requirement is that for all x, the set ¦X ? x¦ = ¦? ? ? : X(?) ?
x¦ be decidable (? /).
The Econometrics of High Frequency Data 9
The evolution of knowledge in our system is described by the ?ltration (or sequence of ?-?elds)
T
t
, 0 ? t ? T. Here T
t
is the knowledge available at time t. Since increasing time makes more sets
decidable, the family (T
t
) is taken to satisfy that if s ? t, then T
s
? T
t
.
Most processes will be taken to be adapted to (T
t
): (X
t
) is adapted to (T
t
) if for all t ? [0, T],
X
t
is T
t
-measurable. A vector process is adapted if each component is adapted.
We de?ne the ?ltration (T
X
t
) generated by the process (X
t
) as the smallest ?ltration to which
X
t
is adapted. By this we mean that for any ?ltration T
t
to which (X
t
) is adapted, T
X
t
? T
t
for
all t. (Proving the existence of such a ?ltration is left as an exercise for the reader).
2.1.2 Wiener Processes
A Wiener process is Brownian motion relative to a ?ltration. Speci?cally,
De?nition 3. The process (W
t
)
0?t?T
is an (T
t
)-Wiener process if it is adpted to (T
t
) and
(1) W
0
= 0;
(2) t ?W
t
is a continuous function of t;
(3) W has independent increments relative to the ?ltration (T
t
): if t > s, then W
t
? W
s
is inde-
pendent of T
s
;
(4) for t > s, W
t
?W
s
is normal with mean zero and variance t ?s (N(0,t-s)).
Note that a Brownian motion (W
t
) is an (T
W
t
)-Wiener process.
2.1.3 Predictable Processes
For de?ning stochastic integrals, we need the concept of predictable process. “Predictable” here
means that one can forecast the value over in?nitesimal time intervals. The most basic example
would be a “simple process”. This is given by considering break points 0 = s
0
? s
1
< t
1
? s
2
<
t
2
< ... ? s
n
< t
n
? T, and random variables H
(i)
, observable (measurable) with respect to T
s
i
.
H
t
=
_
H
(0)
if t = 0
H
(i)
if s
i
< t ? t
i
(12)
In this case, at any time t (the beginning time t = 0 is treated separately), the value of H
t
is known
before time t.
De?nition 4. More generally, a process H
t
is predictable if it can be written as a limit of simple
functions H

t
. This means that H

t
(?) ?H
t
(?) as n ??, for all (t, ?) ? [0, T] ?.
All adapted continuous processes are predictable. More generally, this is also true for adapted
processes that are left continuous (c`ag, for continue `a gauche). (Proposition I.2.6 (p. 17) in Jacod
and Shiryaev (2003)).
The Econometrics of High Frequency Data 10
2.1.4 Stochastic Integrals
We here consider the meaning of the expression
_
T
0
H
t
dX
t
. (13)
The ingredients are the integrand H
t
, which is assumed to be predictable, and the integrator X
t
,
which will generally be a semi-martingale (to be de?ned below in Section 2.3.5).
The expression (13) is de?ned for simple process integrands as
i
H
(i)
(X
t
i
?X
s
i
) (14)
For predictable integrands H
t
that are bounded and limits of simple processes H

t
, the integral
(13) is the limit in probability of
_
T
0
H

t
dX
t
. This limit is well de?ned, i.e., independent of the
sequence H

t
.
If X
t
is a Wiener process, the integral can be de?ned for any predictable process H
t
satisfying
_
T
0
H
2
t
dt < ?. (15)
It will always be the case that the integrator X
t
is right continuous with left limits (c`adl`ag, for
continue `a droite, limites `a gauche).
The integral process
_
t
0
H
s
dX
s
=
_
T
0
H
s
I¦s ? t¦dX
s
(16)
can also be taken to be c`adl`ag. If (X
t
) is continuous, the integral is then automatically continuous.
2.1.5 Itˆo Processes
We now come to our main model, the Itˆo process. X
t
is an Itˆo process relative to ?ltration (T
t
)
provided (X
t
) is (T
t
) adapted; and if there is an (T
t
)-Wiener process (W
t
), and (T
t
)-adapted
processes (µ
t
) and (?
t
), with
_
T
0
[µ
t
[dt < ?, and (17)
_
T
0
?
2
t
dt < ? (18)
so that
X
t
= X
0
+
_
t
0
µ
s
ds +
_
t
0
?
s
dW
s
. (19)
The Econometrics of High Frequency Data 11
The process is often written on di?erential form:
dX
t
= µ
t
dt +?
t
dW
t
. (20)
We note that the Itˆo process property is preserved under stochastic integration. If H
t
is bounded
and predictable, then
_
t
0
H
s
dX
s
=
_
t
0
H
s
µ
s
dt +
_
t
0
H
s
?
s
dW
s
. (21)
It is clear from this formula that predictable processes H
t
can be used for integration w.r.t. X
t
provided
_
T
0
[H
t
µ
t
[dt < ? and (22)
_
T
0
(H
t
?
t
)
2
dt < ?. (23)
2.2 Two Interpretations of the Stochastic Integral
One can use the stochastic integral in two di?erent ways: as model, or as a description of trading
pro?t and loss (P/L).
2.2.1 Stochastic Integral as Trading Pro?t and Loss
Suppose that X
t
is the value of a security. Let H
t
be the number of this stock that is held at time
t. In the case of a simple process (12), this means that we hold H
(i)
units of X from time s
i
to
time t
i
. The trading P/L is then given by the stochastic integral (14). In this description, it is
quite clear that H
(i)
must be known at time s
i
, otherwise we would base the portfolio on future
information. More generally, for predictable H
t
, we similarly avoid using future information.
2.2.2 Stochastic Integral as Model
This is a di?erent genesis of the stochastic integral model. One simply uses (19) as a model, in
the hope that this is a su?ciently general framework to capture most relevant processes. The
advantage of using predictable integrands come from the simplicity of connecting the model with
trading gains.
For simple µ
t
and ?
2
t
, the integral
i
µ
(i)
(t
i
?s
i
) +
i
?
(i)
(W
t
i
?W
s
i
) (24)
The Econometrics of High Frequency Data 12
is simply a sum of contitionally normal random variables, with mean µ
(i)
(t
i
? s
i
) and variance
(?
(i)
)
2
(t
i
?s
i
). The sum need not be normal, since µ and ?
2
can be random.
It is worth noting that in this model,
_
T
0
µ
t
dt is the sum of instantaneous means (drift), and
_
T
0
?
2
t
dt is the sum of intstantaneous variances. In fact, in the model (19), one can show the
following: Let Var([T
t
) be the conditional variance given the information at time t. If X
t
is an Itˆo
process, and if 0 = t
n,0
< t
n,i
< ... < t
n,n
= T, then
i
Var(X
t
n,i+1
?X
t
n,i
[T
t
n,i
)
p
?
_
T
0
?
2
t
dt (25)
when
max
i
[t
n,i+1
?t
n,i
[ ? 0. (26)
If the µ
t
and ?
2
t
processes are nonrandom, then X
t
is a Gaussian process, and X
T
is normal
with mean X
0
+
_
T
0
µ
t
dt and variance
_
T
0
?
2
t
dt.
2.2.3 The Heston model
A popular model for volatility is due to Heston (1993). In this model, the process X
t
is given by
dX
t
= µdt +?
t
dW
t
d?
2
t
= ?(? ??
2
t
)dt +??
t
dZ
t
, with (27)
Z
t
= ?W
t
+ (1 ??
2
)
1/2
B
t
(28)
where (W
t
) and (B
t
) are two independent Wiener processes, ? > 0, and [?[ ? 1.
2.3 Semimartingales
2.3.1 Conditional Expectations
Denote by E([T
t
) the conditional expectation given the information available at time t. Formally,
this concept is de?ned as follows:
Theorem 1. Let / be a ?-?eld, and let X be a random variable so that E[X[ < ?. There is a
/-measurable random variable Z so that for all A ? /,
EZI
A
= EXI
A
, (29)
where I
A
is the indicator function of A. Z is unique “almost surely”, which is that if Z
1
and Z
2
satisfy the two criteria above, then P(Z
1
= Z
2
) = 1.
The Econometrics of High Frequency Data 13
We thus de?ne
E(X[/) = Z (30)
where Z is given in the theorem. The conditional expectation is well de?ned “almost surely”.
For further details and proof of theorem, see Section 34 (p. 445-455) of Billingsley (1995).
This way of de?ning conditional expectation is a little counterintuitive if unfamiliar. In partic-
ular, the conditional expectation is a random variable. The heuristic is as follows. Suppose that
Y is a random variable, and that / carries the information in Y . Introductory textbooks often
introduce conditional expectation as a non-random quantity E(X[Y = y). To make the connection,
set
f
= E(X[Y = y). (31)
The conditional expectation we have just de?ned then satis?es
E(X[/) = f(Y ). (32)
2.3.2 Properties of Conditional Expectations
• Linearity: for constant c
1
, c
2
:
E(c
1
X
1
+c
2
X
2
[ /) = c
1
E(X
1
[ /) +c
2
E(X
2
[ /)
• Conditional constants: if Z is /-measurable, then
E(ZX[/) = ZE(X[/)
• Law of iterated expectations (iterated conditioning, tower property): if /
? /, then
E[E(X[/)[/
] = E(X[/
)
• Independence: if X is independent of /:
E(X[/) = E(X)
• Jensen’s inequality: if g : x ?g(x) is convex:
E(g(X)[/) ? g(E(X[/))
Note: g is convex if g(ax +(1 ?a)y) ? ag(x) +(1 ?a)g
for 0 ? a ? 1. For example: g(x) = e
x
,
g(x) = (x ?K)
+
. Or g
exists and is continuous, and g
(x) ? 0.
The Econometrics of High Frequency Data 14
2.3.3 Martingales
An (T
t
) adapted process M
t
is called a martingale if E[M
t
[ < ?, and if, for all s < t,
E(M
t
[T
s
) = M
s
. (33)
This is a central concept in our narrative. A martingale is also known as a fair game, for the
following reason. In a gambling situation, if M
s
is the amount of money the gambler has at time
s, then the gambler’s expected wealth at time t > s is also M
s
. (The concept of martingale applies
equally to discrete and continuous time axis).
Example 1. A Wiener process is a martingale. To wit, for t > s, since W
t
?W
s
is N(0,t-s) given
T
s
, we get that
E(W
t
[T
s
) = E(W
t
?W
s
[T
s
) +W
s
= E(W
t
?W
s
) +W
s
by independence
= W
s
. (34)
A useful fact about martingales is the representation by ?nal value: M
t
is a martingale for
0 ? t ? T if and only if one can write
M
t
= E(X[T
t
) for all t ? [0, T] (35)
(only if by de?nition (X = M
T
), if by Tower property). Note that for T = ? (which we do not
consider here), this property may not hold. (For a full discussion, see Chapter 1.3.B (p. 17-19) of
Karatzas and Shreve (1991)).
Example 2. If H
t
is a bounded predictable process, and for any martingale X
t
,
M
t
=
_
t
0
H
s
dX
s
(36)
is a martingale. To see this, consider ?rst a simple process (12), for which H
t
= H
(i)
when
s
i
< t ? t
i
. For given t, if s
i
> t, by the properties of conditional expectations,
E
_
H
(i)
(X
t
i
?X
s
i
)[T
t
_
= E
_
E(H
(i)
(X
t
i
?X
s
i
)[T
s
i
)[T
t
_
= E
_
H
(i)
E(X
t
i
?X
s
i
[T
s
i
)[T
t
_
= 0, (37)
and similarly, if t
i
? t ? s
i
, then
E
_
H
(i)
(X
t
i
?X
s
i
)[T
t
_
= H
(i)
(X
t
?X
s
i
) (38)
The Econometrics of High Frequency Data 15
so that
E(M
T
[T
t
) = E
_
i
H
(i)
(X
t
i
?X
s
i
)[T
t
_
=
i:t
i
0,
? = inf¦t ? 0 : X
t
= A¦. (42)
One can show that P(? < T) = 1. De?ne the modi?ed integral by
Y
t
=
_
t
0
1
?
T ?s
I¦s ? ?¦dW
s
= X
??t
, (43)
where
s ? t = min(s, t). (44)
The Econometrics of High Frequency Data 16
The process (43) has the following trading interpretation. Suppose that W
t
is the value of a
security at time t (the value can be negative, but that is possible for many securities, such as
futures contracts). We also take the short term interest rate to be zero. The process X
t
comes
about as the value of a portfolio which holds 1/
?
T ?t units of this security at time t. The process
Y
t
is obtained by holding this portfolio until such time that X
t
= A, and then liquidating the
portfolio.
In other words, we have displayed a trading strategy which starts with wealth Y
0
= 0 at time
t = 0, and end with wealth Y
T
= A > 0 at time t = T. In trading terms, this is an arbitrage. In
mathematical terms, this is a stochastic integral w.r.t. a martingale which is no longer a martingale.
We note that from (41), the condition (15) for the existence of the integral is satis?ed.
For trading, the lesson we can learn from this is that some condition has to be imposed to make
sure that a trading strategy in a martingale cannot result in arbitrage pro?t. The most popular
approach to this is to require that the traders wealth at any time cannot go below some ?xed
amount ?K. This is the so-called credit constraint. (So strategies are required to satisfy that the
integral never goes below ?K). This does not quite guarantee that the integral w.r.t. a martingale
is a martingale, but it does prevent arbitrage pro?t. The technical result is that the integral is a
super-martingale (see the next section).
For the purpose of characterizing the stochastic integral, we need the concept of a local martin-
gale. For this, we ?rst need to de?ne:
De?nition 5. A stopping time is a random variable ? satisfying ¦? ? t¦ ? T
t
, for all t.
The requirement in this de?nition is that we must be able to know at time t wether ? occurred
or not. The time (42) given above is a stopping time. On the other hand, the variable ? =
inf¦t : W
t
= max
0?s?T
W
s
¦ is not a stopping time. Otherwise, we would have a nice investment
strategy.
De?nition 6. A process M
t
is a local martingale for 0 ? t ? T provided there is a sequence of
stopping times ?
n
so that
(i) M
?n?t
is a martingale for each n
(ii) P(?
n
= T) ?1 as n ??.
The basic result for stochastic integrals is now that the integral with respect to a local martingale
is a local martingale, cf. result I.4.34(b) (p. 47) in Jacod and Shiryaev (2003).
2.3.5 Semimartingales
X
t
is a semimaringtale if it can be written
X
t
= X
0
+M
t
+A
t
, 0 ? t ? T, (45)
The Econometrics of High Frequency Data 17
where X
0
is T
0
-measurable, M
t
is a local martingale, and A
t
is a process of ?nite variation, i.e.,
sup
i
[X
t
i+1
?X
t
i
[ < ?, (46)
where the supremum is over all grids 0 = t
0
< t
1
< ... < t
n
= T, and all n.
In particular, an Itˆo process is a semimartingale, with
M
t
=
_
t
0
?
t
dW
t
and
A
t
=
_
t
0
µ
t
dt. (47)
A supermartingale is semimartingale for which A
t
is nonincreasing. A submartingale is a semi-
martingale for which A
t
is nondecreasing.
2.4 Quadratic Variation of a Semimartingale
2.4.1 De?nitions
We start with some notation. A grid of observation times is given by
( = ¦ t
0
, t
1
, ..., t
n
¦, (48)
where we suppose that
0 = t
0
< t
1
< ... < t
n
= T. (49)
Set
?(() = max
1?i?n
(t
i
?t
i?1
). (50)
For any process X, we de?ne its quadratic variation relative to grid ( by
[X, X]
G
t
=
t
i+1
?t
(X
t
i+1
?X
t
i
)
2
. (51)
One can more generally de?ne the quadratic covariation
[X, Y ]
G
t
=
t
i+1
?t
(X
t
i+1
?X
t
i
)(Y
t
i+1
?Y
t
i
). (52)
An important theorem of stochastic calculus now says that
Theorem 2. For any semimartingale, there is a process [X, Y ]
t
so that
[X, Y ]
G
t
p
?[X, Y ]
t
for all t ? [0, T], as ?(() ?0. (53)
The limit is independent of the sequence of grids (.
The Econometrics of High Frequency Data 18
The result follows from Theorem I.4.47 (p. 52) in Jacod and Shiryaev (2003). In fact, the
t
i
can even be stopping times. (In our further development, the t
i
will typically be irregular but
nonrandom).
For an Itˆo process,
[X, X]
t
=
_
t
0
?
2
s
ds. (54)
(Cf Thm I.4.52 (p. 55) and I.4.40(d) (p. 48) of Jacod and Shiryaev (2003)).
The process [X, X]
t
is usually referred to as the quadratic variation of the semimartingale (X
t
).
This is an important concept, as seen in Section 2.2.2. The theorem asserts that this quantity can
be estimated consistently from data.
2.4.2 Properties
Important properties are as follows:
(1) Bilinearity: [X, Y ]
t
is linear in each of X and Y .
(2) If (W
t
) and (B
t
) are two independent Wiener processes, then
[W, B]
t
= 0. (55)
Example 3. For the Heston model in Section 2.2.3, one gets from ?rst principles that
[W, Z]
t
= ?[W, W]
t
+ (1 ??
2
)
1/2
[W, B]
t
= ?t, (56)
since [W, W]
t
= t and [W, B]
t
= 0.
(3) For stochastic integrals over Itˆo processes X
t
and Y
t
,
U
t
=
_
t
0
H
s
dX
s
and V
t
=
_
t
0
K
s
dY
s
, (57)
then
[U, V ]
t
=
_
t
0
H
s
K
s
d[X, Y ]
s
. (58)
This is often written on “di?erential form” as
d[U, V ]
t
= H
t
K
t
d[X, Y ]
t
. (59)
by invoking the same results that led to (54).
(4) For any Itˆo process X, [X, t] = 0.
The Econometrics of High Frequency Data 19
Example 4. (Leverage E?ect in the Heston model).
d[X, ?
2
] = ??
2
t
d[W, Z]
t
= ??
2
?dt. (60)
(5) Invariance under discounting by the short term interest rate. Discounting is important in
?nance theory. The typical discount rate is the risk free short term interest rate r
t
. Recall that
S
t
= exp¦X
t
¦. The discounted stock price is then given by
S
?
t
= exp¦?
_
t
0
r
s
ds¦S
t
. (61)
The corresponding process on the log scale is X
?
t
= X
t
?
_
t
0
r
s
ds, so that if X
t
is given by (20), then
dX
?
t
= (µ
t
?r
t
)dt +?
t
dW
t
. (62)
The quadratic variation of X
?
t
is therefore the same as for X
t
.
It should be emphasized that while this result remains true for certain other types of discounting
(such as those incorporating cost-of-carry), it is not true for many other relevant types of discount-
ing. For example, if one discounts by the zero coupon bond ?
t
maturing at time T, the discounted
log price becomes X
?
t
= X
t
?log ?
t
. Since the zero coupon bond will itself have volatility, we get
[X
?
, X
?
]
t
= [X, X]
t
+ [log ?, log ?]
t
?2[X, log ?]
t
. (63)
2.4.3 Variance and Quadratic Variation
Quadratic variation has a representation in terms of variance. The main result concerns martingales.
For E(X
2
) < ?, de?ne the conditional variance by
Var(X[/) = E((X?E(X[/))
2
[A) = E(X
2
[/) ?E(X[/)
2
. (64)
and similarly Cov(X, Y[/) = E((X?E(X[/))(Y ?E(Y[/)[A).
Theorem 3. Let M
t
be a martingale, and assume that E[M, M]
T
< ?. Then, for all s < t,
Var(M
t
[T
s
) = E((M
t
?M
s
)
2
[T
s
) = E([M, M]
t
?[M, M]
s
[T
s
). (65)
A quick argument for this is as follows. Let ( = ¦ t
0
, t
1
, ..., t
n
¦, and suppose for simplicity that
s, t ? (. Then, for s ? t
i
< t
j
,
E((M
t
i+1
?M
t
i
)(M
t
j+1
?M
t
j
)[T
t
j
) = (M
t
i+1
?M
t
i
)E((M
t
j+1
?M
t
j
)[T
t
j
)
= 0, (66)
The Econometrics of High Frequency Data 20
so that by the Tower rule (since T
s
? T
t
j
)
Cov(M
t
i+1
?M
t
i
, M
t
j+1
?M
t
j
[T
s
) = E((M
t
i+1
?M
t
i
)(M
t
j+1
?M
t
j
)[T
s
) = 0. (67)
If follows that
Var(M
t
?M
s
[T
s
) =
s?t
i
0, and set
?
n
= inf¦ t ? [0, T] : n
?2
i
(t
i+1
? t ?t
i
? t)
4
> ¦.
Then
E[M
(2)
, M
(2)
]
?n
? n
?2
1
28
C
8
8
?
8
+
(122)
By assumption, n
?2
i
(t
i+1
? t ?t
i
? t)
4
? ?(()n
?2
i
(t
i+1
?t
i
)
3
p
? 0, and hence
P(?
n
,= T) ?0 as n ??. (123)
Hence, for any ? > 0,
P(n sup
0?t?T
[M
(2)
t
[ > ?) ? P(n sup
0?t??n
[M
(2)
t
[ > ?) +P(?
n
,= T)
?
1
?
2
E
_
n sup
0?t??n
[M
(2)
t
[
_
2
+P(?
n
,= T) (Chebychev)
?
1
?
2
C
2
2
n
2
E[M
(2)
, M
(2)
]
?n
+P(?
n
,= T) (Burkholder-Davis-Gundy)
?
1
?
2
C
2
2
1
28
C
8
8
?
8
+
+P(?
n
,= T) (from (122))
?
1
?
2
C
2
2
1
28
C
8
8
?
8
+
as n ?? (from (123)). (124)
Hence Proposition 1 has been shown.
3.7 Quadratic Variation of the Error Process: When Observation Times are
Independent of the Process
3.7.1 Main Approximation
We here assume that the observation times are independent of the process X. The basic insight
for the following computation is that over small intervals, (X
t
?X
t?
)
2
? [X, X]
t
?[X, X]
t?
. To the
The Econometrics of High Frequency Data 31
extent that this approximation is valid, it follows from (104) that
[M, M]
t
= 4
t
i+1
?t
_
t
i+1
t
i
([X, X]
s
?[X, X]
t
i
)d[X, X]
s
+ 4
_
t
t?
([X, X]
s
?[X, X]
t?
)d[X, X]
s
= 2
t
i+1
?t
([X, X]
t
i+1
?[X, X]
t
i
)
2
+ 2([X, X]
t
?[X, X]
t?
)
2
. (125)
We shall use this device several times in the following, and will this ?rst time do it rigorously.
Proposition 2. Assume (98), and that ?
2
t
is continuous in mean square:
sup
0?t?s??
E(?
2
t
??
2
s
)
2
?0 as ? ?0. (126)
Also suppose that the grids (
n
are nonrandom, or independent of the process X
t
. Also suppose that,
as n ?0, ?(() = o
p
(n
?1/2
), and assume (109). Then
[M, M]
t
= 2
t
i+1
?t
([X, X]
t
i+1
?[X, X]
t
i
)
2
+ 2([X, X]
t
?[X, X]
t?
)
2
+o
p
(n
?1
). (127)
If ?
t
is continuous, it is continuous in mean square (because of (98)). More generally, ?
t
can,
for example, also have Poisson jumps.
In the rest of this Section, we shall write all expectations implicitly as conditional on the times.
To show Proposition 2, we need some notation and a lemma, as follows:
Lemma 1. Let N
t
be an Itˆo process martingale, for which (for a, b > 0), for all t,
d
dt
E[N, N]
t
? a(t ?t
?
)
b
. (128)
Let H
t
be a predictable process, satisfying [H
t
[ ? H
+
for some constant H
+
. Set
R(()
v
=
_
n
i=0
(t
i+1
?t
i
)
v
_
. (129)
Then
[[
t?t
i+1
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds +
_
t
t?
(N
s
?N
t?
)H
s
ds[[
1
?
_
H
2
+
a
b + 3
R
b+3
(()
_
1/2
+R
(b+3)/2
(()
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
(130)
Proof of Proposition 2 Set N
t
= M
t
and H
t
= ?
2
t
. Then
d[M, M]
t
= 4(X
t
?X
t
i
)
2
d[X, X]
t
= 4([X, X]
t
?[X, X]
t
i
)d[X, X]
t
+ 4((X
t
?X
t
i
)
2
?([X, X]
t
?[X, X]
t
i
))d[X, X]
t
= 4([X, X]
t
?[X, X]
t
i
)d[X, X]
t
+ 2(N
t
?N
t
i
)?
2
t
dt. (131)
The Econometrics of High Frequency Data 32
Thus, the approximation error in (127) is exactly of the form of the left hand side in (130). We
note that
Ed[N, N]
t
= E(X
t
?X
t
i
)d[X, X]
t
= E(X
t
?X
t
i
)?
2
+
dt
= (t ?t
i
)?
4
+
dt (132)
hence the conditions of Lemma 1 are satis?ed with a = ?
4
+
and b = 1. The result follows from(117).
3.7.2 Proof of Lemma 1 (Technical Material, can be omitted)
Decompose the original problem as follows:
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds =
_
t
i+1
t
i
(N
s
?N
t
i
)H
t
i
ds +
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds. (133)
For the ?rst term, from Itˆo’s formula, d(t
i+1
? s)(N
s
? N
t
i
) = ?(N
s
? N
t
i
)ds + (t
i+1
? s)dN
s
, so
that
_
t
i+1
t
i
(N
s
?N
t
i
)H
t
i
ds = H
t
i
_
t
i+1
t
i
(t
i+1
?s)dN
s
(134)
hence
t
i+1
?t
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds =
t
i+1
?s
H
t
i
_
t
i+1
t
i
(t
i+1
?t)dN
s
+
t
i+1
?t
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds.
(135)
The ?rst term is the end point of a martingale. For each increment,
E
__
t
i+1
t
i
(N
s
?N
t
i
)H
t
i
ds
_
2
= E
_
H
t
i
_
t
i+1
t
i
(t
i+1
?s)dN
s
_
2
? H
2
+
E
__
t
i+1
t
i
(t
i+1
?s)dN
s
_
2
= H
2
+
E
__
t
i+1
t
i
(t
i+1
?s)
2
d[N, N]
s
_
= H
2
+
_
t
i+1
t
i
(t
i+1
?s)
2
dE[N, N]
s
= H
2
+
_
t
i+1
t
i
(t
i+1
?s)
2
d
ds
E[N, N]
s
ds
= H
2
+
_
t
i+1
t
i
(t
i+1
?s)
2
a(s ?t
i
)
b
ds
= H
2
+
a
b + 3
(t
i+1
?t
i
)
b+3
(136)
The Econometrics of High Frequency Data 33
and so, by the uncorrelatedness of martingale increments,
E
_
_
t
i+1
?t
H
t
i
_
t
i+1
t
i
(t
i+1
?t)dN
s
_
_
2
? H
2
+
a
b + 3
_
_
t
i+1
?t
(t
i+1
?t
i
)
3
_
_
? H
2
+
a
b + 3
R
b+3
(() (137)
On the other hand, for the second term in (135),
[[(N
s
?N
t
i
)(H
s
?H
t
i
)[[
1
? [[N
s
?N
t
i
[[
2
[[H
s
?H
t
i
[[
2
?
_
E(N
s
?N
t
i
)
2
_
1/2
[[H
s
?H
t
i
[[
2
= (E([N, N]
s
?[N, N]
t
i
))
1/2
[[H
s
?H
t
i
[[
q
=
__
s
t
i
d
du
E[N, N]
u
du
_
1/2
[[H
s
?H
t
i
[[
2
?
__
s
t
i
a(u ?t
i
)
b
du
_
1/2
[[H
s
?H
t
i
[[
2
=
_
a
b + 1
(s ?t
i
)
b+1
_
1/2
[[H
s
?H
t
i
[[
2
= (s ?t
i
)
(b+1)/2
_
a
b + 1
(s ?t
i
)
b+1
_
1/2
[[H
s
?H
t
i
[[
2
, (138)
and from this
[[
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds[[
1
?
_
t
i+1
t
i
[[(N
s
?N
t
i
)(H
s
?H
t
i
)[[
1
ds
?
_
t
i+1
t
i
(s ?t
i
)
(b+1)/2
ds
_
a
b + 1
_
1/2
sup
t
i
?s?t
i+1
[[H
s
?H
t
i
[[
2
= (t
i+1
?t
i
)
(b+3)/2
2
b + 3
_
a
b + 1
_
1/2
sup
t
i
?s?t
i+1
[[H
s
?H
t
i
[[
2
(139)
Hence, ?nally, for the second term in (135),
[[
t?t
i+1
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)dt[[
s
?
_
_
t?t
i+1
(t
i+1
?t
i
)
(b+3)/2
_
_
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
= R
(b+3)/2
(()
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
. (140)
The Econometrics of High Frequency Data 34
Hence, for the overall sum (135), from (137) and (140) and
[[
t?t
i+1
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds[[
1
? [[
t
i+1
?s
H
t
i
_
t
i+1
t
i
(t
i+1
?t)dN
s
[[
1
+[[
t
i+1
?t
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds[[
1
? [[
t
i+1
?s
H
t
i
_
t
i+1
t
i
(t
i+1
?t)dN
s
[[
2
+[[
t
i+1
?t
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds[[
1
?
_
H
2
+
a
b + 3
R
b+3
(()
_
1/2
+R
(b+3)/2
(()
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
.
(141)
The part from t
?
to t can be included similarly, showing the result.
3.7.3 Quadratic Variation of the Error Process, and Quadratic Variation of Time
To give the ?nal form to this quadratic variation, de?ne the “Asymptotic Quadratic Variation of
Time” (AQVT), given by
H(t) = lim
n??
n
T
t
n,j+1
?t
(t
n,j+1
?t
n,j
)
2
, (142)
provided that the limit exists. From Example 6, we know that dividing by n is the right order. We
now get
Proposition 3. Assume the conditions of Proposition 2, and that the AQVT exists. Then
n[M, M]
t
p
? 2T
_
t
0
?
4
s
dH
s
. (143)
The proof is a straight exercise in analysis. The heuristic for the result is as follows. From
(127),
[M, M]
t
= 2
t
i+1
?t
([X, X]
t
i+1
?[X, X]
t
i
)
2
+ 2([X, X]
t
?[X, X]
t?
)
2
+o
p
(n
?1
)
= 2
t
i+1
?t
(
_
t
i+1
t
i
?
2
s
ds)
2
+ 2(
_
t
t?
?
2
s
ds)
2
+o
p
(n
?1
)
= 2
t
i+1
?t
((t
i+1
?t
i
)?
2
t
i
)
2
+ 2((t ?t
?
)?
2
t?
)
2
+o
p
(n
?1
)
= 2
T
n
_
t
0
?
4
s
dH
s
+o
p
(n
?1
). (144)
The Econometrics of High Frequency Data 35
Example 8. We here give a couple of examples of the AQVT:
(i) When the times are equidistant: t
i+1
?t
i
= T/n, then
H(t) ?
n
T
t
n,j+1
?t
_
T
n
_
2
=
T
n
#¦t
i+1
? t¦
= T fraction of t
i+1
in [0, t]
? T
t
T
= t. (145)
(ii) When the times follow a Poisson process with parameter ?, we proceed as in case (ii) in Example
6. We condition on the number of sampling points n, and get t
i
= TU
(i)
(for 0 < i < n), where
U
(i)
is the i’th order statistic of U
1
, ..., U
n
, which are iid U[0,1]. Hence (again taking U
(0)
= 0 and
U

= 1)
H(t) ?
n
T
t
n,j+1
?t
(t
i+1
?t
i
)
2
= T
2
n
T
t
n,j+1
?t
(U
(i)
?U
(i?1)
)
2
= T
2
n
T
t
n,j+1
?t
EU
2
(1)
(1 +o
p
(1))
= T
2
n
T
#¦t
i+1
? t¦EU
2
(1)
(1 +o
p
(1))
= Tn
2
t
T
EU
2
(1)
(1 +o
p
(1))
= 2t(1 +o
p
(1)) (146)
by the law of large numbers, since the spacings have identical distribution [again, verify this], and
since EU
2
(1)
= 2/(n + 1)(n + 2). Hence H(t) = 2t.
3.7.4 The Quadratic Variation of Time in the General Case
We now go back to considering the times as possibly dependent with the process X. Note that by
using the Burkholder-Davis-Gundy Inequality conditionally, we obtain that
c
4
4
E((X
t
i+1
?X
t
i
)
4
[ T
t
i
) ? E(([X, X]
t
i+1
?[X, X]
t
i
)
2
[ T
t
i
) ? C
4
4
E((X
t
i+1
?X
t
i
)
4
[ T
t
i
), (147)
where c
4
and C
4
are as in Section 3.6.1. In the typical law of large numbers setting, [X, X, X, X]
t
?
i
E((X
t
i+1
?X
t
i
)
4
[ T
t
i
) is a martingale which is of lower order than [X, X, X, X]
t
itself, and the
same goes for
i
_
([X, X]
t
i+1
?[X, X]
t
i
)
2
?E(([X, X]
t
i+1
?[X, X]
t
i
)
2
[ T
t
i
)
¸
. In view of Propo-
sition 3, therefore, it follows that under suitable regularity conditions, if n[X, X, X, X]
t
p
? U
t
as
The Econometrics of High Frequency Data 36
n ? ?, and if the AQVT H
t
is absolutely continuous in t, then U
t
is also absolutely continuous,
and
c
4
4
2T?
4
t
H
t
? U
t
? C
4
4
2T?
4
t
H
t
. (148)
This is of some theoretic interest in that it establishes the magnitude of the limit of n[X, X, X, X]
t
.
However, it should be noted that C
4
4
= 2
18
/3
6
? 359.6, so the bounds are of little practical interest.
3.8 Quadratic Variation, Variance, and Asymptotic Normality
We shall later see that n
1/2
([X, X]
G
t
? [X, X]
t
) is approximately normal. In the simplest case,
where the times are independent of the process, the normal distribution has mean zero and variance
[M, M]
t
? 2
T
n
_
t
0
?
4
s
dH
s
. From standard central limit considerations, this is unsurprising when the
?
t
process is nonrandom, or more generally independent of the W
t
process. (In the latter case, one
simply conditions on the ?
t
process).
What is surprising, and requires more concepts, is that the normality result also holds when ?
t
process has dependence with the W
t
process. For this we shall need new concepts, to be introduced
in Section 4.
4 Asymptotic Normality
4.1 Stable Convergence
In order to de?ne convergence in law, we need to deal with the following issue. Suppose
ˆ
?
n
is an
estimator of ?, say,
ˆ
?
n
= [X, X]
Gn
T
and ? = [X, X]
T
=
_
T
0
?
2
t
dt. As suggested in Section 3.7.3, the
variance of Z
n
= n
1/2
(
ˆ
?
n
? ?) converges to 2T
_
t
0
?
4
s
dH
s
. What we shall now go on to show the
following convergence in law:
n
1/2
(
ˆ
?
n
??)
L
? U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
. (149)
where U is a standard normal random variable, independent of the ?
2
t
process. In order to show
this, we need to be able to bring along prelimiting information into the limit: U only exists in
the limit, while as argued in Section 3.5.1, the asymptotic variance 2T
_
t
0
?
4
s
dH
s
can be estimated
consistently, and so is a limit in probability of a prelimiting quantity.
To operationalize the concept in our setting, we need the ?ltration (T
t
) to which all relevant
processes (X
t
, ?
t
, etc) are adapted. We shall assume that Z
n
(the quantity that is converging in
law) to be measurable with respect to a ?-?eld ?, T
T
? ?. The reason for this is that it is often
convenient to exclude microstructure noise from the ?ltration T
t
. Hence, for example, the TSRV
(in Section 5 below) is not T
T
-measurable.
The Econometrics of High Frequency Data 37
De?nition 8. Let Z
n
be a sequence of ?-measurable random variables, T
T
? ?. We say that Z
n
converges T
T
-stably in law to Z as n ?? if Z is measurable with respect to an extension of ? so
that for all A ? T
T
and for all bounded continuous g, EI
A
g(Z
n
) ?EI
A
g(Z) as n ??.
The de?nition means, up to regularity conditions, that Z
n
converges jointly in law with all
T
T
measurable random variables. This intuition will be imprortant in the following. For further
discussion of stable convergence, see R´enyi (1963), Aldous and Eagleson (1978), Chapter 3 (p. 56)
of Hall and Heyde (1980), Rootz´en (1980) and Section 2 (p. 169-170) of Jacod and Protter (1998).
We now move to the main result.
4.2 Asymptotic Normality
We shall be concerned with a sequence of martingales M
n
t
, 0 ? t ? T, n = 1, 2, ..., and how it
converges to a limit M
t
. We consider here only continuous martingales, which are thought of as
random variables taking values in the set C of continuous functions [0, T] ?R.
To de?ne weak, and stable, convergence, we need a concept of continuity. We say that g is a
continuous function C ?R if:
sup
0?t?T
[x
n
(t) ?x(t)[ ?0 implies g(x
n
) ?g(x). (150)
We note that if (M
n
t
)
L
? (M
t
) in this process sense, then, for example, M
n
T
L
?M
T
as a random
variable. This is because the function x ? g(x) = x(T) is continuous. The reason for going via
process convergence is (1) sometimes this is really the result one needs, and (2) since our theory
is about continuous processes converging to a continuous process, one does not need asymptotic
negligibility conditions `a la Lindeberg (these kinds of conditions are in place in the usual CLT
precisely to avoid jumps is the asymptotic process).
In order to show results about continuous martingales, we shall use the following assumption
Assumption 1. There are Brownian motions W
(1)
t
, ..., W
(p)
t
(for some p) that generate (T
t
).
It is also possible to proceed with assumptions under which there are jumps in some processes,
but for simplicity, we omit any discussion of this here.
Under Assumption 1, it follows from Lemma 2.1 (p. 270) in Jacod and Protter (1998) that stable
convergence in law of a local martingale M
n
to a process M is equivalent to (straight) convergence
in law of the process (W
(1)
, ..., W
(p)
, M
n
) to the process (W
(1)
, ..., W
(p)
, M). This result does not
extend to all processes and spaces, cf. the discussion in the cited paper.
Another main fact about stable convergence is that limits and quadratic variation can be inter-
changed:
The Econometrics of High Frequency Data 38
Proposition 4. (Interchangeability of limits and quadratic variation). Assume that M
n
is a se-
quence of continuous local martingales which converges stably to a process M. Then: (M
n
, [M
n
, M
n
])
converges stably to (M, [M, M]).
For proof, we refer to Corollary VI.6.30 (p. 385) in Jacod and Shiryaev (2003), which also
covers the case of bounded jumps. More generally, consult ibid., Chapter VI.6.
We now state the main central limit theorem (CLT).
Theorem 6. Assume Assumption 1. Let (M
n
t
) be a sequence of continuous local martingales on
[0, T], each adapted to (T
t
), with M
n
0
= 0. Suppose that there is an (T
t
) adapted process f
t
so that
[M
n
, M
n
]
t
p
?
_
t
0
f
2
s
ds for each t ? [0, T]. (151)
Also suppose that, for each i = 1, .., p,
[M
n
, W
(i)
]
t
p
? 0 for each t ? [0, T] (152)
There is then an extension (T
t
) of (T
t
), and an (T
t
)-martingale M
t
so that (M
n
t
) converges stably
to (M
t
). Furthermore, there is a Brownian motion (W
t
) so that (W
(1)
t
, ..., W
(p)
t
, W
t
) is an (T
t
)-
Wiener process, and so that
M
t
=
_
t
0
f
s
dW
s
. (153)
It is worth while to understand the proof of this result, and hence we give it here. The proof
follows more or less verbatim that of Theorem B.4 in Zhang (2001) (p. 65-67), which is slightly more
general. (It has also been updated to re?ect the new edition of the work by Jacod and Shiryaev. A
similar result, involving predictable quadratic variations, is given in Theorem IX.7.28 (p. 590-591)
of Jacod and Shiryaev (2003).
Proof of Theorem 6. Since [M
n
, M
n
]
t
is a non-decreasing process and has non-decreasing con-
tinuous limit, the convergence (151) is also in law in D(R) by Theorem VI.3.37 (p. 354) in Jacod
and Shiryaev (2003). Thus, in their terminology (ibid., De?nition VI.3.25, p. 351), [M
n
, M
n
]
t
is
C-tight. From this fact, ibid., Theorem Vi.4.13 (p. 358) yields that the sequence M
n
is tight.
From this tightness, it follows that for any subsequence M
n
k
, we can ?nd a further subsequence
M
n
k
l
which converges in law (as a process) to a limit M, jointly with W
(1)
, ..., W
(p)
(in other
words, (W
(1)
, ..., W
(p)
, M
n
k
l
) converges in law to (W
(1)
, ..., W
(p)
, M). This M is a local martingale
by ibid., Proposition IX.1.17 (p. 526), using the continuity of M
n
t
. Using Proposition 4 above,
(M
n
k
l
, [M
n
k
l
, M
n
k
l
]) converge jointly in law (and jointly with the W
(i)
’s) to (M, [M, M]). From
(151) this means that [M, M]
t
=
_
t
0
f
2
s
ds. The continuity of [M, M]
t
assures that M
t
is continuous.
Also, from (152), [M, W
(i)
] ? 0 for each i = 1, .., p. Now let W
t
=
_
t
0
f
?1/2
s
dM
s
(if f
t
is zero
on a set of Lebesgue measure greater than zero, follow the alternative construction in Volume
The Econometrics of High Frequency Data 39
III of Gikhman and Skorohod (1969). By Property (3) in Section 2.4.2, [W
, W
]
t
= t, while
[W
, W
(i)
] ? 0. By the multivariate version of L´evy’s Theorem (Section 2.4.4), it therefore follows
that (W
(1)
t
, ..., W
(p)
t
, W
t
) is a Wiener process. The equality (153) follows by construction. Hence
the Theorem is shown for subsequence M
n
k
l
. Since the subsequence M
n
k
was arbitrary, Theorem
6 follows.
4.3 Application to Realized Volatility
4.3.1 Independent Times
We now turn our attention to the simplest application: the estimator from Section 3. Consider the
normalized (by
?
n) error process
M
n
t
= 2n
1/2
t
i+1
?t
_
t
i+1
t
i
(X
s
?X
t
i
)dX
s
+ 2n
1/2
_
t
t?
(X
s
?X
t?
)dX
s
. (154)
From Section 3.7.3, we have that Condition (151) of Theorem 6 is satis?ed, with
f
2
t
= 2T?
4
t
H
t
. (155)
It now remains to check Condition (152). Note that
d[M
n
, W
(i)
]
t
= 2n
1/2
(X
t
?X
t?
)d[X, W
(i)
]
t
(156)
We can now apply Lemma 1 with N
t
= X
t
and H
t
= (d/dt)[X, W
(i)
]
t
. From the Cauchy-Shwartz
inequality (in this case known as the Kunita-Watanabe inequality)
[[X, W
(i)
]
t+h
?[X, W
(i)
]
t
[ ?
_
[X, X]
t+h
?[X, X]
t
_
[W
(i)
, W
(i)
]
t+h
?[W
(i)
, W
(i)
]
t
?
_
?
2
+
h
?
h = ?
+
h (157)
(recall that the quadratic variation is a limit of sums of squares), so we can take H
+
= ?
+
. On the
other hand, (d/dt)E[N, N]
t
? ?
2
+
= a(t ?t
?
)
b
with a = ?
2
+
and b = 0.
Thus, from Lemma 1,
[[[M
n
, W
(i)
]
t
[[
1
= 2n
1/2
[[
t?t
i+1
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds +
_
t
t?
(N
s
?N
t?
)H
s
ds[[
1
? 2n
1/2
_
H
2
+
a
b + 3
R
b+3
(()
_
1/2
+R
(b+3)/2
(()
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
= O
p
(n
1/2
R
3
(()
1/2
) +O
p
(n
1/2
R
3/2
(() sup
0?t?s??(G)
[[H
s
?H
t
[[)
= o
p
(1) (158)
The Econometrics of High Frequency Data 40
under the conditions of Proposition 2, since R
v
(() = O
p
(n
1?v
) from (117), and since sup
0?t?s??(G)
[[H
s
?
H
t
[[ = o
p
(1) (The latter fact is somewhat complex. One shows that one can take W
(1)
= W by a
use of L´evy’s theorem, and the result follows).
We have therefore shown:
Theorem 7. Assume Assumption 1, as well as the conditions of Proposition 2, and also that the
AQVT H(t) exists and is absolutely continuous. Let M
n
t
be given by (154). Then (M
n
t
) converges
stably in law to M
t
, given by
M
t
= 2T
_
t
0
?
2
s
_
H
s
dW
s
. (159)
As a special case:
Corollary 1. Under the conditions of the above theorem, for ?xed t,
?
n
_
[X, X]
Gn
t
?[X, X]
t
_
L
? U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
. (160)
where U is a standard normal random variable independent of T
T
.
Similar techniques can now be used on other common estimators, such as the TSRV. We refer
to Section 5.
In the context of equidistant times, this result goes back to Jacod (1994), Jacod and Protter
(1998), and Barndor?-Nielsen and Shephard (2002). We emphasize that the method of proof in
Jacod and Protter (1998) quite di?erent from the one used here, and gives rise to weaker conditions.
The reason for our di?erent treatment is that we have found the current framework more conducive
to generalization to other observation time structures and other estimators. In the long run, it is
an open question which general framework is the most useful.
4.3.2 Endogenous Times
The assumption of independent sampling times is not necessary for a limit result, though a weak-
ening of conditions will change the result. It see what happens, we follow the development
in Li, Mykland, Renault, Zhang, and Zheng (2009), and de?ne the tricicity by [X, X, X]
G
t
=
t
i+1
?t
(X
t
i+1
?X
t
i
)
3
+ (X
t
?X
t?
)
3
, and assume that
n[X, X, X, X]
G
t
p
? U
t
and n
1/2
[X, X, X]
G
t
p
? V
t
, (161)
By the reasoning in Section 3.7.4, n and n
1/2
are the right rates for [X, X, X, X]
G
and [X, X, X]
G
,
respectively. Hence U
t
and V
t
will exist under reasonable regularity conditions. Also, from Section
3.7.4, if the AQVT exists and is absolutely continuous, then so are U
t
and V
t
. We shall use
U
t
=
_
t
0
u
s
ds and V
t
=
_
t
0
v
s
ds. (162)
The Econometrics of High Frequency Data 41
Triticity is handled in much the same way as quarticity. In analogy to the development in
Section 3.5.1, observe that
d(X
t
?X
t
i
)
3
= 3(X
t
?X
t
i
)
2
dX
t
+ 3(X
t
?X
t
i
)d[X, X]
t
= 3(X
t
?X
t
i
)
2
dX
t
+
3
2
d[M, X]
t
,
since d[M, M]
t
= 4(X
t
?X
t
i
)
2
d[X, X]
t
. It follows that if we set
M
(3/2)
t
=
t
i+1
?t
_
t
i+1
t
i
(X
s
?X
t
i
)
3
dX
s
+
_
t
t?
(X
s
?X
t?
)
3
dX
s
we get
[X, X, X]
G
t
=
3
2
[M, X]
t
+ 3M
(3/2)
t
.
In analogy with Proposition 1, we hence obtain:
Proposition 5. Assume the conditions of Proposition 1. Then
sup
0?t?T
[ [M, X]
t
?
2
3
[X, X, X]
G
t
[ = o
p
(n
?1/2
) as n ??. (163)
It follows that unless V
t
? 0, the condition (152) is Theorem 6 will not hold. To solve this
problem, de?ne an auxiliary martingale
˜
M
n
t
= M
n
t
?
_
t
0
g
s
dX
s
, (164)
where g is to be determined. We now see that
[
˜
M
n
, X]
t
= [M
n
, X]
t
?
_
t
0
g
s
d[X, X]
s
p
?
_
t
0
(
2
3
v
s
?g
s
?
2
s
)ds and
[
˜
M
n
,
˜
M
n
] = [M
n
, M
n
] +
_
t
0
g
2
s
d[X, X]
s
?2
_
t
0
g
s
d[M
n
, X]
p
?
_
t
0
(
2
3
u
s
+g
2
s
?
2
s
?2
2
3
g
s
v
s
)ds.
Hence, if we chose g
t
= 2v
t
/3?
2
t
, we obtain that [
˜
M
n
, X]
t
p
? 0 and [
˜
M
n
,
˜
M
n
]
p
?
_
t
0
(u
s
?v
s
?
?2
s
)ds.
By going through the same type of arguments as above, we obtain:
Theorem 8. Assume Assumption 1, as well as the conditions of Proposition 2. Also assume that
(161) holds for each t ? [0, T], and that the absolute continuity (162) holds. Then (M
n
t
) converges
stably in law to M
t
, given by
M
t
=
2
3
_
t
0
v
t
?
2
t
dX
t
+
_
t
0
_
2
3
u
s
?
4
9
v
s
?
2
s
_
1/2
dW
s
,
where W
is independent of W
(1)
, ..., W
(p)
.
The Econometrics of High Frequency Data 42
It is clear from this that the assumption of independent sampling times implies that v
t
? 0.
A similar result was shown in Li, Mykland, Renault, Zhang, and Zheng (2009), where implica-
tions of this result are discussed further.
4.4 Statistical Risk Neutral Measures
We have so far ignored the drift µ
t
. We shall here provide a trick to reinstate the drift in any analysis,
without too much additional work. It will turn out that stable convergence is a key element in the
discussion. Before we go there, we need to introduce the concept of absolute continuity.
We refer to a probability where there is no drift as a “statistical” risk neurtal measure. This
is in analogy to the use of equivalent measures in asset pricing. See, in particular, Ross (1976),
Harrison and Kreps (1979), Harrison and Pliska (1981), Delbaen and Schachermayer (1995), and
Du?e (1996).
4.4.1 Absolute Continuity
We shall in the following think about having two di?erent probabilities on the same observables.
For example, P can correspond to the system
dX
t
= ?
t
dW
t
, X
0
= x
0
, (165)
while Q can correspond to the system
dX
t
= µ
t
dt +?
t
dW
Q
t
, X
0
= x
0
. (166)
In this case, W
t
is a Wiener process under P, and W
Q
t
is a Wiener process under Q. Note that
since we are modeling the process X
t
, this process is the observable quantity whose distribution
we seek. Hence, the process X
t
does not change from P to Q, but its distribution changes. If we
equate (165) and (166), we get
µ
t
dt +?
t
dW
Q
t
= ?
t
dW
t
, (167)
or
µ
t
?
t
dt +dW
Q
t
= dW
t
. (168)
As we discussed in the constant µ and ? case, when carrying out inference for observations in a
?xed time interval [0, T], the process µ
t
cannot be consistently estimated. A precise statement to
this e?ect (Girsanov’s Theorem) is given below.
The fact that µ cannot be observed means that one cannot fully distinguish between P and Q,
even with in?nite data. This concept is captured in the following de?nition:
The Econometrics of High Frequency Data 43
De?nition 9. For a given ?-?eld /, two probabilities P and Q are mutually absolutely continuous
(or equivalent) if, for all A ? /, P(A) = 0 Q(A) = 0. More generally, Q is absolutely
continuous with respect to P if , for all A ? /, P(A) = 0 => Q(A) = 0.
We shall see that P and Q from (165) and (166) are, indeed, mutually absolutely continuous.
4.4.2 The Radon-Nikodym Theorem, and the Likelihood Ratio
Theorem 9. (Radon-Nikodym) Suppose that Q is absolutely continuous under P on ?-?eld /.
Then there is a random variable (/ measurable) dQ/dP so that for all A ? /,
Q(A) = E
P
_
dQ
dP
I
A
_
. (169)
For proof and a more general theorem, see Theorem 32.2 (p. 422) in Billingsley (1995).
The quantity dQ/dP is usually called either the Radon-Nikodym derivative or the likelihood
ratio, It is easy to see that dQ/dP is unique “almost surely” (in the same way as the conditional
expectation).
Example 9. The simplest case of a Radon-Nikodym derivative is where X
1
, X
2
, ..., X
n
are iid,
with two possible distributions P and Q. Suppose that X
i
has density f
P
and f
Q
under P and Q,
respectively. Then
dQ
dP
=
f
Q
(X
1
)f
Q
(X
n
)...f
Q
(X
n
)
f
P
(X
1
)f
P
(X
n
)...f
P
(X
n
)
(170)
Likelihood ratios are of great importance in statistical inference generally.
4.4.3 Properties of Likelihood Ratios
• P(
dQ
dP
? 0) = 1
• If Q is equivalent to P: P(
dQ
dP
> 0) = 1
• E
P
_
dQ
dP
_
= 1
• For all /-measurable Y : E
P
(Y ) = E
P
_
Y
dQ
dP
_
• If Q is equivalent to P:
dP
dQ
=
_
dQ
dP
_
?1
The Econometrics of High Frequency Data 44
4.4.4 Girsanov’s Theorem
We now get to the relationship between P and Q in systems (165) and (166). To give the generality,
we consider the vector process case (where µ is a vector, and ? is a matrix). The superscript “T”
here stands for “transpose”.
Theorem 10. (Girsanov). Subject to regularity conditions, P and Q are mutually absolutely
continuous, and
dP
dQ
= exp
_
?
_
T
0
?
?1
t
µ
t
dW
Q
t
?
1
2
_
T
0
µ
T
t
(?
t
?
T
t
)
?1
µ
t
dt
_
, (171)
The regularity conditons are satis?ed if ?
?
? ?
t
? ?
+
, and [µ
t
[ ? µ
+
, but they also cover much
more general situations. For a more general statement, see, for example, Chapter 5.5 of Karatzas
and Shreve (1991)).
4.4.5 How to get rid of µ: Interface with Stable Convergence
The idea is borrowed from asset pricing theory. We think that the true distribution is Q, but we
prefer to work with P since then calculations are much simpler.
Our plan is the following: carry out the analysis under P, and adjust results back to Q using
the likelihood ratio (Radon-Nikodym derivative) dP/dQ. Speci?cally suppose that ? is a quantity
to be estimated (such as
_
T
0
?
2
t
dt,
_
T
0
?
4
t
dt, or the leverage e?ect). An estimator
ˆ
?
n
is then found
with the help of P
?
, and an asymptotic result is established whereby, say,
n
1/2
(
ˆ
?
n
??)
L
?N(b, a
2
) stably (172)
under P. It then follows directly from the measure theoretic equivalence that n
1/2
(
ˆ
?
n
? ?) also
converges in law under Q. In particular, consistency and rate of convergence is una?ected by the
change of measure. We emphasize that this is due to the ?nite (?xed) time horizon T.
The asymptotic law may be di?erent under P and Q. While the normal distribution remains,
the distributions of b and a
2
(if random) may change.
The technical result is as follows.
Proposition 6. Suppose that Z
n
is a sequence of random variables which converges stably to
N(b, a
2
) under P. By this we mean that N(b, a
2
) = b + aN(0, 1), where N(0, 1) is a standard
normal variable independent of T
T
, also a and b are T
T
measurable. Then Z
n
converges stably in
law to b +aN(0, 1) under P, where N(0, 1) remains independent of T
T
under Q.
Proof of Proposition. E
Q
I
A
g(Z
n
) = E
P
dQ
dP
I
A
g(Z
n
) ? E
P
dQ
dP
I
A
g(Z) = E
Q
I
A
g(Z) by uniform
integrability of
dQ
dP
I
A
g(Z
n
).
The Econometrics of High Frequency Data 45
Proposition 6 substantially simpli?es calculations and results. In fact, the same strategy will be
helpful for the localization results that come next in the paper. It will turn out that the relationship
between the localized and continuous process can also be characterized by absolute continuity and
likelihood ratios.
Remark 1. It should be noted that after adjusting back from P to Q, the process µ
t
may show
up in expressions for asymptotic distributions. For instances of this, see Sections 2.5 and 4.3 of
Mykland and Zhang (2007). One should always keep in mind that drift most likely is present, and
may a?ect inference. 2
Remark 2. As noted, our device is comparable to the use of equivalent martingale measures in
options pricing theory (Ross (1976), Harrison and Kreps (1979), Harrison and Pliska (1981), see
also Du?e (1996)) in that it a?ords a convenient probability distribution with which to make
computations. In our econometric case, one can always take the drift to be zero, while in the
options pricing case, this can only be done for discounted securities prices. In both cases, however,
the computational purpose is to get rid of a nuisance “dt term”.
The idea of combining stable convergence with measure change appears to go back to Rootz´en
(1980). 2
4.5 Unbounded ?
t
We have so far assumed that ?
2
t
? ?
2
+
. With the help of stable convergence, it is also easy to
weaken this assumption. One can similarly handle restrictions µ
t
, and on ?
2
t
being bounded away
from zero.
The much weaker requirement is that ?
t
be locally bounded. This is to say that there is a
sequence of stopping times ?
m
and of constants ?
m,+
so that
P(?
m
? T) ?0 as m ?? and
?
2
t
? ?
2
m,+
for 0 ? t ? ?
m
. (173)
For example, this is automatically satis?ed if ?
t
is a continuous process.
As an illustration of how to incorporate such local boundedness in existing results, take Corollary
1. If we replace the condition ?
2
t
? ?
2
+
by local boundedness, the corollary continues to hold (for
?xed m) with ?
?n?t
replacing ?
t
. On the other hand we note that [X, X]
Gn
is the same for ?
?n?t
and ?
t
on the set ¦?
n
= T¦. Thus, the corollary tells us that for any set A ? T
T
, and for any
bounded continuous function g,
EI
A?{?m=T}
g
_
?
n
_
[X, X]
Gn
t
?[X, X]
t
__
?EI
A?{?m=T}
g
_
U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
_
(174)
The Econometrics of High Frequency Data 46
as n ?? (and for ?xed m), where U has the same meaning as in the corollary. Hence,
[EI
A
g
_
?
n
_
[X, X]
Gn
t
?[X, X]
t
__
?EI
A
g
_
U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
_
[
? [EI
A?{?m=T}
g
_
?
n
_
[X, X]
Gn
t
?[X, X]
t
__
?EI
A?{?m=T}
g
_
U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
_
[
+ 2 max [g(x)[P(?
m
,= T)
?2 max [g(x)[P(?
m
,= T) (175)
as n ??. By choosing m large, the right hand sice of this expression can be made as small as we
wish. Hence, the left hand side actually converges to zero. We have shown:
Corollary 2. Theorem 7, Corollary 1, and Theorem 8 all remain true if the condition ?
2
t
? ?
2
+
is
replaced by a requirement that ?
2
t
be locally bounded.
5 Microstructure
5.1 The Problem
The basic problem is that the semimartingale X
t
is actually contaminated by noise. One observes
Y
t
i
= X
t
i
+
i
. (176)
We do not right now take a position on the structure of the
i
s.
The reason for going to this structure is that the convergence (consistency) predicted by The-
orem 2 manifestly does not hold. To see this, in addition to (, we also use subgrids of the form
H
k
= ¦0, t
k
, t
K+k
, t
2K+k
, ...¦. This given rise to the Average Realized Voltatility (ARV)
ARV (Y, (, K) =
1
K
K
k=1
[Y, Y ]
H
k
. (177)
Note that ARV (Y, (, 1) = [Y, Y ]
G
. If one believes Theorem 2, then the ARV (Y, (, K) should be
close for small K. In fact, the convergence in the theorem should be visible as K decreases to 1.
Figure 1 looks at the ARV (Y, (, K) for Alcoa Aluminun (AA) for january 4, 2001. As can be seen
in the ?gure, the actual data behaves quite di?erently from what the theory predicts. It follows
that the semimartingale assumption does not hold, and we have to move to a model like (176).
The Econometrics of High Frequency Data 47
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• • •
• •
dependence of estimated volatility on number of subgrids
K: # of subgrids
v
o
l
a
t
i
l
i
t
y
o
f
A
A
,
J
a
n
4
,
2
0
0
1
,
a
n
n
u
a
l
i
z
e
d
,
s
q
.
r
o
o
t
s
c
a
l
e
5 10 15 20
0
.
6
0
.
8
1
.
0
1
.
2
Figure 1. RV as One Samples More Frequently. The plot gives ARV (Y, (, K) for
K = 1, ..., 20 for Alcoa Aluminum for the transcations on January 4, 2001. It is clear
that consistency does not hold for the quadratic variation. The semimartingale
model, therefore, does not hold.
The Econometrics of High Frequency Data 48
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• • •
• •
dependence of estimated volatility on sampling frequency
sampling frequency (seconds)
v
o
l
a
t
i
l
i
t
y
o
f
A
A
,
J
a
n
4
,
2
0
0
1
,
a
n
n
u
a
l
i
z
e
d
,
s
q
.
r
o
o
t
s
c
a
l
e
50 100 150 200
0
.
6
0
.
8
1
.
0
1
.
2
Figure 2. RV as One Samples More Frequently. This is the same figure as Fig 1,
but the x axis is the average number of observations between each transaction for
each ARV (Y, (, K). There is one transaction each ca. 50 seconds in this particular
data.
5.2 An Initial Approach: Sparse Sampling
Plots of the type given in Figure 1 and 2 were ?rst considered by Andersen, Bollerslev, Diebold, and
Labys (2000) and called signature plots. The authors concluded that the most correct values for
the volatility were the lower ones on the left and side of the plot, based mainly on the stabilization
of the curve in this region. On the basis of this, the authors recommended to estimate volatility
using [Y, Y ]
H
, where H is a sparsely sampled subgrid of (. In this early literature, the standard
approach was so subsample about every ?ve minutes.
The philosophy behind this approach is that the size of the noise is very small, and if there
are not too many sampling points, the e?ect of noise will be limited. While true, this uses the data
ine?ciently, and we shall see that better methods can be found. The basic subsampling scheme
does, however, provide some guidance on how to proceed to more complex schemes. For this reason,
The Econometrics of High Frequency Data 49
we shall analyze its properties.
The model used for most analysis is that
i
is independent of X, and iid. One can still, however,
proceed under weaker conditions. For example, if the
i
have serial dependence, a similar analysis
will go through.
The basic decomposition is
[Y, Y ]
H
= [X, X]
H
+ [, ]
H
+ 2[X, ]
H
, (178)
where the cross term is usually (but not always) ignorable. Thus, if the ’s are independent of X,
and E() = 0, we get
E([Y, Y ]
H
[X process) = [X, X]
H
+E[, ]
H
. (179)
If the are identically distributed, then
E[, ]
H
= n
sparse
E(
K
?
0
)
2
, (180)
where n
sparse
= number of points in H, ?1. Smaller n
sparse
gives smaller bias, but bigger variance.
At this point, if you would like to follow this line of development, please consult the discussion
in Section 2 in (Zhang, Mykland, and A¨?t-Sahalia (2005)). This shows that there is an opti-
mal subsampling frequency, given by equation (31) (p 1399) in the paper. A similar analysis for
ARV (Y, (, K) is carried out in Section 3.1-3.3 of the paper.
5.3 Two Scales Realized Volatility (TSRV)
To get a consistent estimator, we go to the two scales realized volatility (TSRV). The TRSV is
de?ned as follows.
[X, X]
(tsrv)
T
= a
n
ARV (Y, (, K) ?b
n
ARV (Y, (, J) (181)
where we shall shortly ?x a
n
and b
n
. It will turn out to be meaningful to use
b
n
= a
n
¯ n
K
¯ n
J
, (182)
where ¯ n
K
= (n?K +1)/K. For asymptotic purposes, we can take a
n
= 1, but more generally will
assume that a
n
?1 as n ??.
This estimator is discussed in Section 4 in Zhang, Mykland, and A¨?t-Sahalia (2005), though
only in the case where J = 1. In the more general case, J is not necessarily 1, but J 0. Note that in the representation
(193), (192) becomes
?
0
=
3
2
_
T
0
_
f
t
?
t
_
2
dt and ?
1
= 2
_
T
0
_
f
2
t
+g
2
t
?
2
t
_
dt (194)
Example 10. In the case of a Heston model (Section 2.2.3), we obtain that
?
0
=
3
8
(??)
2
_
T
0
?
?2
t
dt and ?
1
=
1
4
?
2
(M ?1)
_
T
0
?
?2
t
dt. (195)
Remark 4. (One step discretization). Let P
?
n
be the measure Q
n
which arises when the block
length is M = 1. Observe that even with this one-step discretization, dP/dP
?
n
does not necessarily
converge to unity. In this case, ?
1
= 0, but ?
0
does not vanish when there is leverage e?ect.
2
6.2 Moving windows
The paper so far has considered chopping n data up into non-overlapping windows of size M each.
We here show by example that the methodology can be adapted to the moving window case. We
consider the estimation of ? =
_
T
0
[?
t
[
p
dt, as in Section 4.1 of Mykland and Zhang (2007). It should
be noted that the moving window is close to the concept of a moving kernel, and this may be a
promising avenue of further investigation. See, in particular, Linton (2007).
We use block length M, and we use for simplicity
˜ ?
2
?
n,i
=
1
?t
n
M
n
t
n,j
?(?
n,i
,?
n,i+1
]
(?X
t
n,j+1
)
2
, (196)
as estimator of ?
2
?
n,i
. The moving window estimate of ? is now
˜
?
MW
n
= (?t)
n?M
i=0
[?
t
n,i
[
r
.
It is easy to see that
˜
?
MW
n
=
1
M
M
m=1
˜
?
n,m
+O
p
(n
?1
),
where
˜
?
n,m
is the non-overlapping block estimator, with block number one starting at t
n,m
. In
view of this representation, it is once again clear from su?ciency considerations that the moving
The Econometrics of High Frequency Data 57
window estimator will have an asymptotic variance which is smaller (or, at least, no larger) than
the estimator based on non-overlapping blocks. We now carry out the precise asymptotic analysis.
To analyze this estimator, let / > M, and let A
n
= ¦i = 0, ..., n ? M : [t
n,i
, t
n,i+M
] ?
[k/, (k + 1)/] for some k¦, with B
n
= ¦0, ..., n ?M¦ ?A
n
. Write
n
1/2
(
˜
?
MW
n
??) = n
1/2
?t
k
i:[t
n,i
,t
n,i+M
]?[kM/n,(k+1)M/n]
(
[?
t
n,i
[
r
?[?
t
kM
[
r
)
+n
1/2
?t
i?Bn
(
[?
t
n,i
[
r
?[?
t
n,i
[
r
) +O
p
(n
?1/2
). (197)
Now apply our methodology from Section 6.1, with block size /, to the ?rst term in (197). Under
this block approximation, the inner sum in the ?rst term is based on conditionally i.i.d. observations,
in fact, for [t
n,i
, t
n,i+M
] ? [k//n, (k + 1)//n], ˜ ?
2
t
n,i
= ?
2
kM/n
S
i
, in law, where
S
i
= M
?1
i+M?1
j=i
U
2
j
, U
0
, U
1
, U
2
, ... iid standard normal. (198)
As in Section 4.1 of Mykland and Zhang (2007), there is no adjustment (`a la Remark 3) due
to covariation with the asymptotic likelihood ratios, and so the ?rst term in (197) converges stably
to a mixed normal with random variance as the limit of n?t
2
n
k
[?[
r
kM/n
Var
_
c
?1
M,r
M?M
i=0
S
r/2
i
_
,
which is
Tc
?2
M,r
1
/
Var
_
M?M
i=0
S
r/2
i
_
_
T
0
[?[
r
t
dt. (199)
Similarly, one can apply the same technique to the second term in (197), but now with the k’th
block (k ? 2) starting at k/?M. This analysis yields that the second term is also asymptotically
mixed normal, but with a variance what is of order o
p
(1) as /??. (In other words, once again,
?rst send n to in?nity, and then, afterwards, do the same to /). This yields that, overall, and in
the sense of stable convergence,
n
1/2
(
˜
?
MW
n
??)
L
? N (0, 1)
_
c
?2
M,r
?
M,r
T
_
T
0
[?[
r
t
dt
_
1/2
, (200)
where, from (199), ?
M,r
= lim
M??
Var
_
M?M
i=0
S
r/2
i
_
//, i.e.,
?
M,r
= Var(S
r/2
0
) + 2
M?1
i=1
Cov(S
r/2
0
, S
r/2
i
),
where the S
i
are given in (198).
The Econometrics of High Frequency Data 58
6.3 Multivariate and Asynchronous data
The results discussed in Section 6.1 also apply to vector processes (see Mykland and Zhang (2007)
for details). Also, for purposes of analysis, asynchronous data does not pose any conceptual di?-
culty when applying the results. One includes all observation times when computing the likelihood
ratios in the contiguity theorems. It does not matter that some components of the vector are not
observed at all these times. In a sense, they are just treated as missing data. Just as in the case
of irregular times for scalar processes, this does not necessarily mean that it is straightforward to
write down sensible estimators.
For example, consider a bivariate process (X
(1)
t
, X
(2)
t
). If process (X
(r)
t
) is observed at times :
(
(r)
n
= ¦0 ? t
(r)
n,0
< t
(r)
n,1
< ... < t
(r)
n,n
1
? T¦, (201)
one would normally use the grid (
n
= (
(1)
n
?(
(2)
n
?¦0, T¦ to compute the likelihood ratio dP/dQ
n
.
To focus the mind with an example, consider the estimation of covariation under asynchronous
data. It is shown in Mykland (2006) that the Hayashi-Yoshida estimator (Hayashi and Yoshida
(2005)) can be seen as a nonparametric maximum likelihood estimator (MLE). We shall here see
that blocking induces an additional class of local likelihood based MLEs. The di?erence between
the former and the latter depends on the continuity assumptions made on the volatility process,
and is a little like the di?erence between the Kaplan-Meier (Kaplan and Meier (1958)) and Nelson-
Aalen (Nelson (1969), Aalen (1976, 1978)) estimators in survival analysis. (Note that the variance
estimate for the Haysahi-Yoshida estimator from Section 5.3 of Mykland (2006) obviously also
remains valid in the setting of this paper).
For simplicity, work with a bivariate process, and let the grid (
n
be given by (201). For now,
let the block dividers ? be any subset of (
n
. Under the approximate measure Q
n
, note that for
?
n,i?1
? t
(1)
n,j?1
< t
(1)
n,j
? ?
n,i
and ?
n,i?1
? t
(2)
k?1
< t
(2)
k
? ?
n,i
(202)
the set of returns X
(1)
t
(1)
n,j
? X
(1)
t
(1)
n,j?1
and X
(2)
t
(2)
n,k
? X
(2)
t
(2)
n,k?1
are conditionally jointly normal with mean
zero and covariances
Cov
Qn
(X
(r)
t
(r)
n,j
?X
(r)
t
(r)
n,j?1
, X
(s)
t
(s)
n,k
?X
(s)
t
(s)
n,k?1
) [ T
?
n,i?1
) = (?
?
n,i?1
)
r,s
d¦(t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
)¦ (203)
where d is length (Lebesgue measure). Set ?
r,s,j,k
= ?d¦(t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
)¦. The Q
n
log
likelihood ratio based on observations fully in block (?
n,i?1
, ?
n,i
] is therefore given as
(?) = ?
1
2
lndet(?) ?
1
2
r,s,j,k
?
r,s;j,k
(X
(r)
t
(r)
n,j
?X
(r)
t
(r)
n,j?1
)(X
(s)
t
(s)
n,k
?X
(s)
t
(s)
n,k?1
) ?
N
i
2
ln(2?), (204)
where ?
r,s;j,k
are the elements of the matrix inverse of (?
r,s;j,k
), and N
i
is a measure of block
sample size. The sum in (j, k) is over all intersections (t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
) with positive
The Econometrics of High Frequency Data 59
length satisfying (202). Call the number of such terms
m
(r,s)
n,i
= # nonempty intersections (t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
) satisfying (202) . (205)
The “parameter” ? corresponds to ?
?
n,i?1
. The block MLE is thus given as
ˆ
?
(r,s)
?
n,i?1
=
1
m
(r,s)
n,i
j,k
(X
(r)
t
(r)
n,j
?X
(r)
t
(r)
n,j?1
)(X
(s)
t
(s)
n,k
?X
(s)
t
(s)
n,k?1
)
d¦(t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
)¦
(206)
where the sum is over j, k satisfying (202) for which the denominator in the summand is nonzero.
The overall estimate of covariation is thus
¸X
(r)
, X
(s)
)
T
=
i
ˆ
?
(r,s)
?
n,i?1
(?
n,i
??
n,i?1
). (207)
We suppose, of course, that each block is large enough for m
(r,s)
n,i
to be always greater than zero.
Under Q
n
, E
Qn
(
ˆ
?
?
n,i?1
[T
?
n,i?1
) = ?
?
n,i?1
, and
Var
Qn
(
ˆ
?
(r,s)
?
n,i?1
[T
?
n,i?1
) =
_
1
m
(r,s)
n,i
_
2
_
_
?
(r,r)
?
n,i?1
?
(s,s)
?
n,i?1
j,k
(t
(r)
n,j
?t
(r)
n,j?1
)(t
(s)
n,k
?t
(s)
n,k?1
)
d¦(t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
)¦
2
+ (?
(r,s)
?
n,i?1
)
2
j
1
,j
2
,k
1
,k
2
d¦(t
(r)
n,j
1
?1
, t
(r)
n,j
1
) ? (t
(s)
n,k
2
?1
, t
(s)
n,k
2
)¦d¦(t
(r)
n,j
2
?1
, t
(r)
n,j
2
) ? (t
(s)
n,k
1
?1
, t
(s)
n,k
1
)¦
d¦(t
(r)
n,j
1
?1
, t
(r)
n,j
1
) ? (t
(s)
n,k
1
?1
, t
(s)
n,k
1
)¦d¦(t
(r)
n,j
2
?1
, t
(r)
n,j
2
) ? (t
(s)
n,k
2
?1
, t
(s)
n,k
2
)¦
_
_
,
(208)
The ?rst sum is over the same (j, k) as in (206), and the second sum is over all j
1
, j
2
, k
1
, k
2
satisfying
(202), again for which the denominator in the summand is nonzero.
It is therefore easy to see that subject to conditions on the observation times t
(r)
n,i
and t
(s)
n,i
,
n
1/2
(
¸X
(r)
, X
(s)
)
T
? ¸X
(r)
, X
(s)
)
T
) converges stably (under Q
n
), to a mixed normal distribution
with variance as the limit of
n
i
Var
Qn
(
ˆ
?
(r,s)
?
n,i?1
[T
?
n,i?1
)(?
n,i
??
n,i?1
)
2
. (209)
It is straightforward to see that there is no adjustment from Q
n
to P. A formal asymptotic analysis
would be tedious, and has therefore been omitted. In any case, to estimate the asymptotic variance,
one would use (208)-(209), with
ˆ
?
?
n,i?1
replacing ?
?
n,i?1
in (208).
Remark 5. An important di?erence from the Hayashi-Yoshida estimator is that (206) depends
on the observation times. This is in many instances undesirable, and the choice of estimator will
depend on the degree to which these times are trusted. The Hayashi-Yoshida estimator is also
aesthetically more pleasing. We note, however, that from likelihood considerations, the estimator
The Econometrics of High Frequency Data 60
(206) will have an asymptotic variance which, as the block size tends to in?nity, converges to a
limit which corresponds to the e?cient minimum for constant volatility matrix.
This phenomenon can be best illustrated for a scalar process (so there is no asynchronicity). In
this case, our estimator (206) of ¸X, X)
T
becomes (for block size M ?xed)
¸X, X)
T
=
i
(?
n,i
??
n,i?1
)
1
M
j: ?
n,i?1
0 a.s.
The Econometrics of High Frequency Data 62
The contiguity question is then addressed as follows. Let P
?
n
be the measure from Remark 4
(corresponding to block length M = 1). Recall that
log
dR
n
dP
= log
dR
n
dQ
n
+ log
dQ
n
dP
?
n
+ log
dP
?
n
dP
(214)
De?ne
B
n,j
=
_
?t
n,j+1
_
??
n,i
M
i
_
?1
?1
_
(215)
Theorem 11. (Asymptotic relationship between P
?
n
, Q
n
and R
n
). Assume the conditions of The-
orem 4 in Mykland and Zhang (2007), and let Z
(1)
n
and M
(1)
n
be as in that theorem (see (231) and
(234) in Section 7.3). Assume that the following limits exist:
?
2
=
p
2
lim
n??
j
B
2
n,j
and ?
3
=
p
2
lim
n??
j
log(1 +B
n,j
). (216)
Set
Z
(2)
n
=
1
2
i
t
n,j
?(?
n,i?1
,?
n,i
]
?X
T
t
n,j
((??
T
)
?1
?
n,i?1
)?X
t
n,j
_
?t
?1
n,j+1
?
_
??
n,i
M
i
_
?1
_
, (217)
and let M
(2)
n
be the end point of the martingale part of Z
(2)
n
(see (232) and (234) in Section 7.3 for
the explicit formula). Then, as n ??, (M
(1)
n
, M
(2)
n
) converges stably in law under P
?
to a normal
distribution with mean zero and diagonal variance matrix with diagonal elements ?
1
and ?
2
. Also,
under P
?
,
log
dR
n
dQ
n
= M
(2)
n
+ ?
3
+o
p
(1). (218)
The theorem can be viewed from the angle of contiguity:
Corollary 3. Under regularity conditions, the following statements are equivalent, as n ??:
(i) R
n
is contiguous to P.
(ii) R
n
is contiguous to Q
n
.
(iii) The following relationship holds:
?
3
= ?
1
2
?
2
. (219)
As we shall see, the requirement (219) is a substantial restriction. Corollary 3 says that unlike
the case of Q
n
, inference under R
n
may not give rise to desired results. Part of the probability
mass under Q
n
(and hence P
?
) is not preserved under R
n
.
To understand the requirement (219), note that
p
2
j
log(1 +B
n,j
) = ?
p
4
j
B
2
n,j
+
p
6
j
B
3
n,j
?... (220)
The Econometrics of High Frequency Data 63
since
j
B
n,j
= 0. Hence, (219) will, for example, be satis?ed if max
j
[B
n,j
[ ? 0 as n ? ?. One
such example is
t
n,j
= f(j/n) and f is continuously di?erentiable. (221)
However, (221) will not hold in more general settings, as we shall see from the following examples.
Example 11. (Poisson sampling.) Suppose that the sampling time points follow a Poisson
process with parameter ?. If one conditions on the number of sampling points n, these points behave
like the order statistics of n uniformly distributed random variables (see, for example, Chapter 2.3
in Ross (1996)). Consider the case where M
i
= M for all but (possibly) the last interval in
H
n
. In this case, K
n
is the smallest integer larger than or equal to n/M. Let Y
i
be the M-tuple
(B
j
, ?
i?1
? t
j
< ?
i
).
We now obtain, by passing between the conditional and unconditional, that Y
1
, ..., Y
Kn?1
are iid,
and the distribution can be described by
Y
1
= M(U
(1)
, U
(2)
?U
(1)
, ..., U
(M?1)
?U
(M?2)
, 1 ?U
(M?1)
) ?1, (222)
where U
(1)
, ..., U
(M?1)
is the order statistic of M ? 1 independent uniform random variables on
(0, 1). It follows that
j
B
2
n,j
=
n
M
(M
2
EU
2
(1)
?1) +o
p

j
log(1 +B
n,j
) =
n
M
E log(MU
(1)
) +o
p
(223)
since EU
2
(1)
= 2/(M+1)(M+2). Hence, both ?
2
and ?
3
are in?nite. The contiguity between R
n
and
the other probabilities fails. On the other hand all our assumptions up to Section ?? are satis?ed,
and so P, P
?
n
and Q
n
are all contiguous. The AQVT (equation (142)) is given by H(t) = 2t . Also,
if the block size is constant (size M), the ADD is K(t) = (M ?1)t .
Example 12. (Systematic irregularity.) Let be a small positive number, and let ?t
n,j
=
(1 + )T/n for odd j and ?t
n,j
= (1 ? )T/n for even j (with ?t
n,j
= T/n for odd n). Again,
all our assumptions up to Section 6 are satis?ed. The AQVT is given by H(t) = t(1 +
2
). If we
suppose that all M
i
= 2, the ADD becomes K(t) = t. On the other hand, B
n,j
= ±, so that, again,
both ?
2
and ?
3
are in?nite. The contiguity between R
n
and the other probabilities thus fails in the
same radical fashion as in the case of Poisson sampling.
7.2 Irregular Spacing and Subsampling
We here return to a more direct study of the e?ect of irregular spacings. We put ourselves in the
situation from Section 4.3.1, where observation times are independent of the process. As stated
The Econometrics of High Frequency Data 64
in equation (160), the limit law for the realized volatility (for
?
n
_
[X, X]
Gn
t
?[X, X]
t
_
) is mixed
normal with (random) variance
2T
_
t
0
?
4
s
dH
s
, (224)
where H is the asymptotic quadratic variation of time (AQVT). When observations are equidistant,
H
(t) ? 1. From the preceeding section, we also know that if times are on the form (221), the
asymptotic variance is una?ected. It is worth elaborating on this in direct computation. Set
F(t) = lim
n??
1
n
#¦t
n,i+1
? t¦ (225)
this quantity exists, if necessary by going through subsequences (Helly’s Theorem, see, for example,
p. 336 in Billingsley (1995)). Set
u
n,i
= F(t
n,i
). (226)
Asymptotically, the u
n,i
are equispaced:
1
n
#¦u
n,i+1
? t¦ =
1
n
#¦t
n,i+1
? F
(?1)
(t)¦ ?F(F
(?1)
(t)) = t (227)
Inference is invariant to this transformation: Observing the process X
t
at times t
n,i
is the same
as observing the process Y
t
= X
F
(?1)
(t)
at times u
n,i
. If we set | = ¦u
n,j
, j = 0, ..., n¦, then
[X, X]
G
T
= [Y, Y ]
U
T
. Also, in the limit, [X, X]
T
= [Y, Y ]
T
. Finally, the asymptotic distribution the
same in these two cases
If the u
n,i
have AQVT U(t), the mixed normal variance transforms
2T
_
T
0
H
(u)(¸X, X)
t
)
2
dt = 2
_
1
0
U
(u)(¸Y, Y )
t
)
2
dt. (228)
The transformation (226) regularizes spacing. It means that without loss of generality, one can
take T = 1, F
= 1 and and U = H. Also, the transformation (226) regularizes spacing de?ned by
(221), and in this case, U
(t) ? 1.
Example 13. On the other hand, it is clear from Example 11 that it is possible for U
(t) to take
other values than 1. The example shows that for Poisson distributed observation times, H
= U
? 2,
while, indeed F
(t) ? 1/T.
The general situation can be expressed as follows:
Proposition 7. Assume that F exists and is monotonely increasing. Also assume that H exists.
Then U exists. For all s ? t, U(t) ? U(s) ? t ? s. In particular, if U
(t) exists, then U
(t) ? 1.
The following statements are equivalent:
(i) U(1) = 1
(ii) U
? 1
(iii)
n
j=0
_
u
n,j+1
?u
n,j
?
1
n
_
2
= o
p
(n
?1
)
The Econometrics of High Frequency Data 65
Proof of Proposition 7. The ?rst statement uses a standard property of the variance: if ?t
n,j+1
=
t
n,j+1
?t
n,j
, and ?
n
= T/n, then
n
T
t
n,j+1
?t
(?t
n,j+1
)
2
=
n
T
t
n,j+1
?t
(?t
n,j+1
??
n
)
2
+
n
T
#¦t
n,i+1
? t¦(?
n
)
2
?
n
T
#¦t
n,i+1
? t¦(?
n
)
2
By taking limits as n ?? under F
(t) ? 1/T, we get that H(t) ?H(s) ? t ?s. In particular, the
same will be true for U.
The equivalence between (i) and (iii) follows from the proof of Lemma 2 (p. 1029) in Zhang
(2006). (The original lemma uses slightly di?erent assumptions).
The implication of the proposition is that under the scenario U(1) = 1, observation times are
“almost” equidistant. In particular, subsampling does not change the structure of the spacings.
On the other hand, when U(1) > 1, there is scope for subsampling to regularize the times.
Example 14. Suppose that the times are Poisson distributed. Instead of picking every observation,
we now pick every M’th observation. By the same methods as in Example 11, we obtain that
U(t) =
M + 1
M
t. (229)
Hence the sparser the subsampling, the more regular the times will be. This is an additional feature
of subsampling that remains to be exploited.
7.3 Proof of Theorem 11
We begin by describing the relationship between R
n
and P
?
n
. In analogy with Proposition 2 of
Mykland and Zhang (2007), we obtain that
Lemma 2.
log
dR
n
dP
?
n
(U
t
0
, ..., U
t
n,j
, ..., U
tn,n
)
=
i
?
i?1
?t
j
Recent years have seen an explosion in the amount of financial high frequency data. These are the records of transactions and quotes for stocks, bonds, currencies, options, and other financial instruments.
The Econometrics of High Frequency Data
?
Per A. Mykland and Lan Zhang
This version: February 22, 2009
?
Financial support from the National Science Foundation under grants DMS 06-04758 and SES 06-31605 is grate-
fully acknowledged. We would also like to thank Hong Kong University of Science and Technology, where part of the
manuscript was written.
The Econometrics of High Frequency Data 1
1 Introduction
1.1 Overview
This is a course on estimation in high frequency data. It is intended for an audience that includes
interested people in ?nance, econometrics, statistics, probability and ?nancial engineering.
There has in recent years been a vast increase in the amount of of high frequency data available.
There has also been an explosion in the literature on the subject. In this course, we start from
scratch, introducing the probabilistic model for such data, and then turn to the estimation question
in this model. We shall be focused on the (for this area) emblematic problem of estimating volatil-
ity. Similar techniques to those we present can be applied to estimating leverage e?ects, realized
regressions, semivariances, do analyses of variance, detect jumps, measure liquidity by measuring
the size of the microstructure noise, and many other objects of interest.
The applications are mainly in ?nance, ranging from risk management to options hedging (see
Section 2.6 below), execution of transactions, portfolio optimization (Fleming, Kirby, and Ostdiek
(2001, 2003)), and forecasting. The latter literature has been particularly active, with contributions
including Andersen and Bollerslev (1998), Andersen, Bollerslev, Diebold, and Labys (2001, 2003),
Andersen, Bollerslev, and Meddahi (2005), Dacorogna, Gen¸cay, M¨ uller, Olsen, and Pictet (2001),
Meddahi (2001). Methodologies based on high frequency data can also be found in neural science.
The purpose of this article, however, is not so much to focus on the applications as on the
probabilistic setting and the estimation methods. The theory was started, on the probabilistic side,
by Jacod (1994) and Jacod and Protter (1998), and on the econometric side by Foster and Nelson
(1996) and Comte and Renault (1998). The econometrics of integrated volatility was pioneered
in Andersen, Bollerslev, Diebold, and Labys (2001, 2003), Barndor?-Nielsen and Shephard (2002,
2004b) and Dacorogna, Gen¸cay, M¨ uller, Olsen, and Pictet (2001). The authors of this article
started to work in the area through Zhang (2001), Zhang, Mykland, and A¨?t-Sahalia (2005), and
Mykland and Zhang (2006). For further references, see Section 5.5.
This article is meant to be a moderately self-contained course into the basics of this material.
The introduction assumes some degree of statistics/econometric literacy, but at a lower level than
the standard probability text. Some of the material is research front and not published elewhere.
This is not meant as a full review of the area. Readers with a good probabilistic background can
skip most of Section 2, and occasional other sections.
The text also mostly overlooks the questions that arise in connection with multidimensional
processes. For further literature in this area, one should consult Barndor?-Nielsen and Shephard
(2004a), Hayashi and Yoshida (2005) and Zhang (2005).
The Econometrics of High Frequency Data 2
1.2 High Frequency Data
Recent years have seen an explosion in the amount of ?nancial high frequency data. These are
the records of transactions and quotes for stocks, bonds, currencies, options, and other ?nancial
instruments.
A main source of such data is the Trades and Quotes (TAQ) database, which covers the stocks
traded on the New York Stock Exchange (NYSE). For example, here is an excerpt of the transactions
for Monday, April 4, 2005, for the pharmaceutical company Merck (MRK):
MRK 20050405 9:41:37 32.69 100
MRK 20050405 9:41:42 32.68 100
MRK 20050405 9:41:43 32.69 300
MRK 20050405 9:41:44 32.68 1000
MRK 20050405 9:41:48 32.69 2900
MRK 20050405 9:41:48 32.68 200
MRK 20050405 9:41:48 32.68 200
MRK 20050405 9:41:51 32.68 4200
MRK 20050405 9:41:52 32.69 1000
MRK 20050405 9:41:53 32.68 300
MRK 20050405 9:41:57 32.69 200
MRK 20050405 9:42:03 32.67 2500
MRK 20050405 9:42:04 32.69 100
MRK 20050405 9:42:05 32.69 300
MRK 20050405 9:42:15 32.68 3500
MRK 20050405 9:42:17 32.69 800
MRK 20050405 9:42:17 32.68 500
MRK 20050405 9:42:17 32.68 300
MRK 20050405 9:42:17 32.68 100
MRK 20050405 9:42:20 32.69 6400
MRK 20050405 9:42:21 32.69 200
MRK 20050405 9:42:23 32.69 3000
MRK 20050405 9:42:27 32.70 8300
MRK 20050405 9:42:29 32.70 5000
MRK 20050405 9:42:29 32.70 1000
MRK 20050405 9:42:30 32.70 1100
“Size” here refers to the number of stocks that changed hands in the given transaction. This is
often also called “volume”.
There are 6302 transactions recorded for Merck for this day. On the same day, Microsoft
The Econometrics of High Frequency Data 3
(MSFT) had 80982 transactions. These are massive amounts of data. What can we do with such
data? This course is about how to approach this question.
1.3 A First Model for Financial Data: The GBM
Finance theory suggests the following description of prices, that they must be so-called semimartin-
gales. We defer a discussion of the general concept until later (see also Delbaen and Schachermayer
(1995)), and go instead to the most commonly used such semimartingale: the Geometric Brownian
Motion (GBM).
Set
X
t
= log S
t
= the logarithm of the stock price S
t
at time t. (1)
The GBM model is now that
X
t
= X
0
+µt +?W
t
, (2)
where µ and ? are constants, and W
t
is a Brownian Motion (BM), a concept we now de?ne. The
“time zero” is an arbitrary reference time.
De?nition 1. The process (W
t
)
0?t?T
is a Brownian motion provided
(1) W
0
= 0;
(2) t ?W
t
is a continuous function of t;
(3) W has independent increments: if t > s > u > v, then W
t
?W
s
is independent of W
u
?W
v
;
(4) for t > s, W
t
?W
s
is normal with mean zero and variance t ?s (N(0,t-s)).
1.4 Estimation in the GBM model
It is instructive to consider estimation in this model. We take time t = 0 to be the beginning of
the trading day, and time t = T to be the end of the day.
Let’s assume that there are n observations of the process (transactions). We suppose for right
now that the transactions are spaced equally in time, so that an observation is had every ?t
n
= T/n
units of time. This assumption is quite unrealistic, but it helps a straightforward development which
can then be modi?ed later.
The observations (log transaction prices) are therefore X
t
n,i
, where t
n,i
= i?t
n
. If we take
di?erences, we get observations
?X
t
n,i+1
= X
t
n,i+1
?X
t
n,i
, i = 0, ..., n ?1. (3)
The ?X
t
n,i+1
are independent and identically distributed (iid) with law N(µ?t
n
, ?
2
?t
n
). The
The Econometrics of High Frequency Data 4
natural estimators are:
ˆ µ
n
=
1
n?t
n
n?1
i=0
?X
t
n,i+1
= (X
T
?X
0
)/T both MLE and UMVU; and
ˆ ?
2
n,MLE
=
1
n?t
n
n?1
i=0
(?X
t
n,i+1
??X
tn
)
2
MLE; or (4)
ˆ ?
2
n,UMV U
=
1
(n ?1)?t
n
n?1
i=0
(?X
t
n,i+1
??X
tn
)
2
UMVU.
Here, MLE is the maximum likelihood estimator, and UMVU is the uniformly minimum variance
unbiased estimator (see Lehmann (1983) or Rice (2006)). Also, ?X
tn
=
1
n
n?1
i=0
?X
t
n,i+1
= ˆ µ
n
?t
n
.
The estimators (4) clarify some basics. First of all, µ cannot be consistently estimated for ?xed
length T of time interval. In fact, the ˆ µ
n
does not depend on n, but only on T and the value of
the process at the beginning and end of the time period. This is reassuring from a common sense
perspective. If we could estimate µ for actual stock prices, we would be very rich!!! – Of course, if
T ??, then µ can be estimated consistently.
It is perhaps more surprising that ?
2
can be estimated consistently for ?xed T, as n ? ?. In
other words, ˆ ?
2
n
p
??
2
as n ??. Set U
n,i
= ?X
t
n,i
/(??t
1/2
n
). Then the U
n,i
are iid with distribution
N((µ/?)?t
1/2
n
, 1). It follows from standard considerations for normal random variables that
n?1
i=0
(U
n,i
?
¯
U
n,·
)
2
is ?
2
distributed with n ?1 degrees of freedom. Hence, for the UMVU estimator,
ˆ ?
2
n
= ?
2
?t
n
1
(n ?1)?t
n
n?1
i=0
(U
n,i
?
¯
U
n,·
)
2
= ?
2
?
2
n?1
n ?1
.
It follows that
E(ˆ ?
2
n
) = ?
2
and Var(ˆ ?
2
n
) =
?
4
n ?1
, (5)
since E?
2
m
= m and Var(?
2
m
) = 2m. Hence ˆ ?
2
n
is consistent for ?
2
: ˆ ?
2
n
? ?
2
in probability as
n ??.
Similarly, since ?
2
n?1
is the sum of n ?1 iid ?
2
1
random variables, by the central limit theorem
we have the following convergence in law:
?
2
n?1
?E?
2
n?1
_
Var(?
2
n?1
)
=
?
2
n?1
?(n ?1)
_
2(n ?1)
L
? N(0, 1), (6)
The Econometrics of High Frequency Data 5
and so
n
1/2
(ˆ ?
2
n
??
2
) ? (n ?1)
1/2
(ˆ ?
2
n
??
2
)
=
?
2?
2
?
2
n?1
?(n ?1)
_
2(n ?1)
L
? ?
2
N(0, 2) = N(0, 2?
4
). (7)
This provides an asymptotic distribution which permits the setting of intervals. For example,
?
2
= ˆ ?
2
n
±1.96
?
2ˆ ?
2
n
would be an asymptotic 95 % con?dence interval for ?
2
.
Since ˆ ?
2
n,MLE
=
n?1
n
ˆ ?
2
n,UMV U
, the same asymptotics apply to the MLE.
1.5 Behavior of Non-Centered Estimators
The above discussion of ˆ ?
2
n,UMV U
and ˆ ?
2
n,MLE
is exactly the same as in the classical case of es-
timating variance on the basis of iid observations. More unusually, for high frequency data, the
mean is often not removed in estimation. The reason is as follows. Set
ˆ ?
2
n,nocenter
=
1
n?t
n
n?1
i=0
(?X
t
n,i+1
)
2
. (8)
Now note that for the MLE version of ˆ ?
n
,
ˆ ?
2
n,MLE
=
1
n?t
n
n?1
i=0
(?X
t
n,i+1
??X
tn
)
2
=
1
n?t
n
_
n?1
i=0
(?X
t
n,i+1
)
2
?n(?X
tn
)
2
_
= ˆ ?
2
n,nocenter
??t
n
ˆ µ
2
n
= ˆ ?
2
n,nocenter
?
T
n
ˆ µ
2
n
.
Since ˆ µ
2
n
does not depend on n, it follows that
n
1/2
_
ˆ ?
2
n,MLE
? ˆ ?
2
n,nocenter
_
p
? 0.
Hence, ˆ ?
2
n,nocenter
is consistent and has the same asymptotic distribution as ˆ ?
2
n,UMV U
and ˆ ?
2
n,MLE
.
It can therefore also be used to estimate variance. This is quite common for high frequency data.
1.6 GBM and the Black-Scholes-Merton formula
The GBM model is closely tied in to other parts of ?nance. In particular, following the work of
Black and Scholes (1973), Merton (1973), Harrison and Kreps (1979), and Harrison and Pliska
The Econometrics of High Frequency Data 6
(1981), precise option prices can be calculated in this model. See also Du?e (1996), Neftci (2000),
Øksendal (2003), or Shreve (2004) for book sized introductions to the theory.
In the case of the call option, the price is as follows. A European call option on stock S
t
with
maturity (expiration) time T and strike price K is the option to buy one unit of stock at price K
at time T. It is easy to see that the value of this option at time T is (S
T
?K)
+
, where x
+
= x if
x ? 0, and x
+
= 0 otherwise.
If we make the assumption that S
t
is a GBM, which is to say that it follows (1)-(2), and also
the assumption that the short term interest rate r is constant (in time), then the price at time t,
0 ? t ? T of this option must be
price = C(S
t
, ?
2
(T ?t), r(T ?t)), (9)
where
C(S, ?, R) = S?(d
1
) ?K exp(?R)?(d
2
), where
d
1,2
= (log(S/K) +R ±?/2) /
?
? and (10)
?(x) = P(N(0, 1) ? x) the standard normal cdf.
This is the Black-Scholes-Merton formula.
We shall see later on how high frequency estimates can be used in this formula. For the moment,
note that the price only depends on quantities that are either observed (the interest rate r) or nearly
so (the volatility ?
2
). It does not depend on µ. Unfortunately, the assumption of constant r and
?
2
is unrealistic, as we shall discuss in the following.
The GBM model is also heavily used in portfolio optimization
1.7 Our Problem to be Solved: Inadequacies in the GBM Model
We here give a laundry list of questions that arise and have to be dealt with.
1.7.1 The Volatility Depends on t
It is empirically the case that ?
2
depends on t. We shall talk about the instantaneous volatility ?
2
t
.
This concept will be de?ned carefully in Section 2.
1.7.2 The Volatility is Random; Leverage E?ect
Returns are usually assumed to be non-normal. Such behavior can for the most part be modeled
as ?
2
t
having random evolution. It is also usually assumed that ?
2
t
can be correlated with the (log)
The Econometrics of High Frequency Data 7
stock price. This is often referred to as Leverage E?ect. More about this in Section 2.
1.7.3 Jumps
The GBM model assumes that the log stock price X
t
is continuous as a function of t. The evolution
of the stock price, however, is often thought to have a jump component. The treatment of jumps
is largely not covered in this article, though there is some discussion in Section 6.4.1, which also
gives some references. Note that jumps and random volatility are often confounded, since any
martingale can be embedded in a Brownian motion (Dambis (1965), Dubins and Schwartz (1965),
see also Mykland (1995) for a review and further discussion).
1.7.4 Non-Normal Returns
Most non-normal behavior can be explained though random volatility and/or jumps. It would be
unusual to need more extensive modeling.
1.7.5 Microstructure Noise
An important feature of actual transaction prices is the existence of microstructure noise. Transac-
tion prices, as actually observed are typically best modeled on the form Y
t
= log S
t
= the logarithm
of the stock price S
t
at time t, where for transaction at time t
i
,
Y
t
i
= X
t
i
+ noise, (11)
and X
t
is a semimartingale. This is often called the hidden semimartingale model. This issue is an
important part of our narrative, and is further discussed in Section 5, see also Section 6.4.2.
1.7.6 Unequally Spaced Observations
In the above, we assumed that the transaction times t
i
are equally spaced. A quick glance at the
data snippet in Section 1.2 reveal that this is typically not the case. This leads to questions that
will be addressed as we go along.
1.8 A Note on Probability Theory
We will extensively use probability theory in these notes. To avoid making a long introduction on
stochastic processes, we will de?ne concepts as we need them, but not always in the greatest depth.
We will also omit other concepts and many basic proofs. As a compromise between the rigorous
The Econometrics of High Frequency Data 8
and the intuitive, we follow the following convention, that the notes will (except when the opposite
is clearly stated) use mathematical terms as they are de?ned in Jacod and Shiryaev (2003). Thus,
in case of doubt, this work can be consulted.
Other recommended reference books on stochastic process theory are Karatzas and Shreve
(1991), Øksendal (2003), Protter (2004), and Shreve (2004). For introduction to measure theoretic
probability, one can consult Billingsley (1995).
2 A More General Model: Time varying Drift and Volatility
2.1 Stochastic Integrals, Itˆo-Processes
We here make some basic de?nitions. We consider a process X
t
, where the time variable t ? [0, T].
We mainly develop the univariate case here.
2.1.1 Information Sets, ?-?elds, Filtrations
Information is usually described with so-called ?-?elds. The setup is as follows. Our basic space
is (?, T), where ? is the set of all possible outcomes ?, and T is the collection of subsets A ? ?
that will eventually be decidable (it will be observed whether they occured or not). All random
variables are thought to be a function of the basic outcome ? ? ?.
We assume that T is a so-called ?-?eld. In general,
De?nition 2. A collection / of subsets of ? is a ?-?eld if
(i) ?, ? ? /;
(ii) if A ? /, then A
c
= ? ?A ? /; and
(iii) if A
n
, n = 1, 2, ... are all in /, then ?
?
n=1
A
n
? /.
If one thinks of / as a collection of decidable sets, then the interpretation of this de?nition is
as follows:
(i) ?, ? are decidable (? didn’t occur, ? did);
(ii) if A is decidable, so is the complement A
c
(if A occurs, then A
c
does not occur, and vice versa);
(iii) if all the A
n
are decidable, then so is the event ?
?
n=1
A
n
(the union occurs if and only if at least
one of the A
i
occurs).
A random variable X is called /-measurable if the value of X can be decided on the basis of the
information in /. Formally, the requirement is that for all x, the set ¦X ? x¦ = ¦? ? ? : X(?) ?
x¦ be decidable (? /).
The Econometrics of High Frequency Data 9
The evolution of knowledge in our system is described by the ?ltration (or sequence of ?-?elds)
T
t
, 0 ? t ? T. Here T
t
is the knowledge available at time t. Since increasing time makes more sets
decidable, the family (T
t
) is taken to satisfy that if s ? t, then T
s
? T
t
.
Most processes will be taken to be adapted to (T
t
): (X
t
) is adapted to (T
t
) if for all t ? [0, T],
X
t
is T
t
-measurable. A vector process is adapted if each component is adapted.
We de?ne the ?ltration (T
X
t
) generated by the process (X
t
) as the smallest ?ltration to which
X
t
is adapted. By this we mean that for any ?ltration T
t
to which (X
t
) is adapted, T
X
t
? T
t
for
all t. (Proving the existence of such a ?ltration is left as an exercise for the reader).
2.1.2 Wiener Processes
A Wiener process is Brownian motion relative to a ?ltration. Speci?cally,
De?nition 3. The process (W
t
)
0?t?T
is an (T
t
)-Wiener process if it is adpted to (T
t
) and
(1) W
0
= 0;
(2) t ?W
t
is a continuous function of t;
(3) W has independent increments relative to the ?ltration (T
t
): if t > s, then W
t
? W
s
is inde-
pendent of T
s
;
(4) for t > s, W
t
?W
s
is normal with mean zero and variance t ?s (N(0,t-s)).
Note that a Brownian motion (W
t
) is an (T
W
t
)-Wiener process.
2.1.3 Predictable Processes
For de?ning stochastic integrals, we need the concept of predictable process. “Predictable” here
means that one can forecast the value over in?nitesimal time intervals. The most basic example
would be a “simple process”. This is given by considering break points 0 = s
0
? s
1
< t
1
? s
2
<
t
2
< ... ? s
n
< t
n
? T, and random variables H
(i)
, observable (measurable) with respect to T
s
i
.
H
t
=
_
H
(0)
if t = 0
H
(i)
if s
i
< t ? t
i
(12)
In this case, at any time t (the beginning time t = 0 is treated separately), the value of H
t
is known
before time t.
De?nition 4. More generally, a process H
t
is predictable if it can be written as a limit of simple
functions H

t
. This means that H

t
(?) ?H
t
(?) as n ??, for all (t, ?) ? [0, T] ?.
All adapted continuous processes are predictable. More generally, this is also true for adapted
processes that are left continuous (c`ag, for continue `a gauche). (Proposition I.2.6 (p. 17) in Jacod
and Shiryaev (2003)).
The Econometrics of High Frequency Data 10
2.1.4 Stochastic Integrals
We here consider the meaning of the expression
_
T
0
H
t
dX
t
. (13)
The ingredients are the integrand H
t
, which is assumed to be predictable, and the integrator X
t
,
which will generally be a semi-martingale (to be de?ned below in Section 2.3.5).
The expression (13) is de?ned for simple process integrands as
i
H
(i)
(X
t
i
?X
s
i
) (14)
For predictable integrands H
t
that are bounded and limits of simple processes H

t
, the integral
(13) is the limit in probability of
_
T
0
H

t
dX
t
. This limit is well de?ned, i.e., independent of the
sequence H

t
.
If X
t
is a Wiener process, the integral can be de?ned for any predictable process H
t
satisfying
_
T
0
H
2
t
dt < ?. (15)
It will always be the case that the integrator X
t
is right continuous with left limits (c`adl`ag, for
continue `a droite, limites `a gauche).
The integral process
_
t
0
H
s
dX
s
=
_
T
0
H
s
I¦s ? t¦dX
s
(16)
can also be taken to be c`adl`ag. If (X
t
) is continuous, the integral is then automatically continuous.
2.1.5 Itˆo Processes
We now come to our main model, the Itˆo process. X
t
is an Itˆo process relative to ?ltration (T
t
)
provided (X
t
) is (T
t
) adapted; and if there is an (T
t
)-Wiener process (W
t
), and (T
t
)-adapted
processes (µ
t
) and (?
t
), with
_
T
0
[µ
t
[dt < ?, and (17)
_
T
0
?
2
t
dt < ? (18)
so that
X
t
= X
0
+
_
t
0
µ
s
ds +
_
t
0
?
s
dW
s
. (19)
The Econometrics of High Frequency Data 11
The process is often written on di?erential form:
dX
t
= µ
t
dt +?
t
dW
t
. (20)
We note that the Itˆo process property is preserved under stochastic integration. If H
t
is bounded
and predictable, then
_
t
0
H
s
dX
s
=
_
t
0
H
s
µ
s
dt +
_
t
0
H
s
?
s
dW
s
. (21)
It is clear from this formula that predictable processes H
t
can be used for integration w.r.t. X
t
provided
_
T
0
[H
t
µ
t
[dt < ? and (22)
_
T
0
(H
t
?
t
)
2
dt < ?. (23)
2.2 Two Interpretations of the Stochastic Integral
One can use the stochastic integral in two di?erent ways: as model, or as a description of trading
pro?t and loss (P/L).
2.2.1 Stochastic Integral as Trading Pro?t and Loss
Suppose that X
t
is the value of a security. Let H
t
be the number of this stock that is held at time
t. In the case of a simple process (12), this means that we hold H
(i)
units of X from time s
i
to
time t
i
. The trading P/L is then given by the stochastic integral (14). In this description, it is
quite clear that H
(i)
must be known at time s
i
, otherwise we would base the portfolio on future
information. More generally, for predictable H
t
, we similarly avoid using future information.
2.2.2 Stochastic Integral as Model
This is a di?erent genesis of the stochastic integral model. One simply uses (19) as a model, in
the hope that this is a su?ciently general framework to capture most relevant processes. The
advantage of using predictable integrands come from the simplicity of connecting the model with
trading gains.
For simple µ
t
and ?
2
t
, the integral
i
µ
(i)
(t
i
?s
i
) +
i
?
(i)
(W
t
i
?W
s
i
) (24)
The Econometrics of High Frequency Data 12
is simply a sum of contitionally normal random variables, with mean µ
(i)
(t
i
? s
i
) and variance
(?
(i)
)
2
(t
i
?s
i
). The sum need not be normal, since µ and ?
2
can be random.
It is worth noting that in this model,
_
T
0
µ
t
dt is the sum of instantaneous means (drift), and
_
T
0
?
2
t
dt is the sum of intstantaneous variances. In fact, in the model (19), one can show the
following: Let Var([T
t
) be the conditional variance given the information at time t. If X
t
is an Itˆo
process, and if 0 = t
n,0
< t
n,i
< ... < t
n,n
= T, then
i
Var(X
t
n,i+1
?X
t
n,i
[T
t
n,i
)
p
?
_
T
0
?
2
t
dt (25)
when
max
i
[t
n,i+1
?t
n,i
[ ? 0. (26)
If the µ
t
and ?
2
t
processes are nonrandom, then X
t
is a Gaussian process, and X
T
is normal
with mean X
0
+
_
T
0
µ
t
dt and variance
_
T
0
?
2
t
dt.
2.2.3 The Heston model
A popular model for volatility is due to Heston (1993). In this model, the process X
t
is given by
dX
t
= µdt +?
t
dW
t
d?
2
t
= ?(? ??
2
t
)dt +??
t
dZ
t
, with (27)
Z
t
= ?W
t
+ (1 ??
2
)
1/2
B
t
(28)
where (W
t
) and (B
t
) are two independent Wiener processes, ? > 0, and [?[ ? 1.
2.3 Semimartingales
2.3.1 Conditional Expectations
Denote by E([T
t
) the conditional expectation given the information available at time t. Formally,
this concept is de?ned as follows:
Theorem 1. Let / be a ?-?eld, and let X be a random variable so that E[X[ < ?. There is a
/-measurable random variable Z so that for all A ? /,
EZI
A
= EXI
A
, (29)
where I
A
is the indicator function of A. Z is unique “almost surely”, which is that if Z
1
and Z
2
satisfy the two criteria above, then P(Z
1
= Z
2
) = 1.
The Econometrics of High Frequency Data 13
We thus de?ne
E(X[/) = Z (30)
where Z is given in the theorem. The conditional expectation is well de?ned “almost surely”.
For further details and proof of theorem, see Section 34 (p. 445-455) of Billingsley (1995).
This way of de?ning conditional expectation is a little counterintuitive if unfamiliar. In partic-
ular, the conditional expectation is a random variable. The heuristic is as follows. Suppose that
Y is a random variable, and that / carries the information in Y . Introductory textbooks often
introduce conditional expectation as a non-random quantity E(X[Y = y). To make the connection,
set
f

The conditional expectation we have just de?ned then satis?es
E(X[/) = f(Y ). (32)
2.3.2 Properties of Conditional Expectations
• Linearity: for constant c
1
, c
2
:
E(c
1
X
1
+c
2
X
2
[ /) = c
1
E(X
1
[ /) +c
2
E(X
2
[ /)
• Conditional constants: if Z is /-measurable, then
E(ZX[/) = ZE(X[/)
• Law of iterated expectations (iterated conditioning, tower property): if /
? /, then
E[E(X[/)[/
] = E(X[/
)
• Independence: if X is independent of /:
E(X[/) = E(X)
• Jensen’s inequality: if g : x ?g(x) is convex:
E(g(X)[/) ? g(E(X[/))
Note: g is convex if g(ax +(1 ?a)y) ? ag(x) +(1 ?a)g

x
,
g(x) = (x ?K)
+
. Or g
exists and is continuous, and g
(x) ? 0.
The Econometrics of High Frequency Data 14
2.3.3 Martingales
An (T
t
) adapted process M
t
is called a martingale if E[M
t
[ < ?, and if, for all s < t,
E(M
t
[T
s
) = M
s
. (33)
This is a central concept in our narrative. A martingale is also known as a fair game, for the
following reason. In a gambling situation, if M
s
is the amount of money the gambler has at time
s, then the gambler’s expected wealth at time t > s is also M
s
. (The concept of martingale applies
equally to discrete and continuous time axis).
Example 1. A Wiener process is a martingale. To wit, for t > s, since W
t
?W
s
is N(0,t-s) given
T
s
, we get that
E(W
t
[T
s
) = E(W
t
?W
s
[T
s
) +W
s
= E(W
t
?W
s
) +W
s
by independence
= W
s
. (34)
A useful fact about martingales is the representation by ?nal value: M
t
is a martingale for
0 ? t ? T if and only if one can write
M
t
= E(X[T
t
) for all t ? [0, T] (35)
(only if by de?nition (X = M
T
), if by Tower property). Note that for T = ? (which we do not
consider here), this property may not hold. (For a full discussion, see Chapter 1.3.B (p. 17-19) of
Karatzas and Shreve (1991)).
Example 2. If H
t
is a bounded predictable process, and for any martingale X
t
,
M
t
=
_
t
0
H
s
dX
s
(36)
is a martingale. To see this, consider ?rst a simple process (12), for which H
t
= H
(i)
when
s
i
< t ? t
i
. For given t, if s
i
> t, by the properties of conditional expectations,
E
_
H
(i)
(X
t
i
?X
s
i
)[T
t
_
= E
_
E(H
(i)
(X
t
i
?X
s
i
)[T
s
i
)[T
t
_
= E
_
H
(i)
E(X
t
i
?X
s
i
[T
s
i
)[T
t
_
= 0, (37)
and similarly, if t
i
? t ? s
i
, then
E
_
H
(i)
(X
t
i
?X
s
i
)[T
t
_
= H
(i)
(X
t
?X
s
i
) (38)
The Econometrics of High Frequency Data 15
so that
E(M
T
[T
t
) = E
_
i
H
(i)
(X
t
i
?X
s
i
)[T
t
_
=
i:t
i
0,
? = inf¦t ? 0 : X
t
= A¦. (42)
One can show that P(? < T) = 1. De?ne the modi?ed integral by
Y
t
=
_
t
0
1
?
T ?s
I¦s ? ?¦dW
s
= X
??t
, (43)
where
s ? t = min(s, t). (44)
The Econometrics of High Frequency Data 16
The process (43) has the following trading interpretation. Suppose that W
t
is the value of a
security at time t (the value can be negative, but that is possible for many securities, such as
futures contracts). We also take the short term interest rate to be zero. The process X
t
comes
about as the value of a portfolio which holds 1/
?
T ?t units of this security at time t. The process
Y
t
is obtained by holding this portfolio until such time that X
t
= A, and then liquidating the
portfolio.
In other words, we have displayed a trading strategy which starts with wealth Y
0
= 0 at time
t = 0, and end with wealth Y
T
= A > 0 at time t = T. In trading terms, this is an arbitrage. In
mathematical terms, this is a stochastic integral w.r.t. a martingale which is no longer a martingale.
We note that from (41), the condition (15) for the existence of the integral is satis?ed.
For trading, the lesson we can learn from this is that some condition has to be imposed to make
sure that a trading strategy in a martingale cannot result in arbitrage pro?t. The most popular
approach to this is to require that the traders wealth at any time cannot go below some ?xed
amount ?K. This is the so-called credit constraint. (So strategies are required to satisfy that the
integral never goes below ?K). This does not quite guarantee that the integral w.r.t. a martingale
is a martingale, but it does prevent arbitrage pro?t. The technical result is that the integral is a
super-martingale (see the next section).
For the purpose of characterizing the stochastic integral, we need the concept of a local martin-
gale. For this, we ?rst need to de?ne:
De?nition 5. A stopping time is a random variable ? satisfying ¦? ? t¦ ? T
t
, for all t.
The requirement in this de?nition is that we must be able to know at time t wether ? occurred
or not. The time (42) given above is a stopping time. On the other hand, the variable ? =
inf¦t : W
t
= max
0?s?T
W
s
¦ is not a stopping time. Otherwise, we would have a nice investment
strategy.
De?nition 6. A process M
t
is a local martingale for 0 ? t ? T provided there is a sequence of
stopping times ?
n
so that
(i) M
?n?t
is a martingale for each n
(ii) P(?
n
= T) ?1 as n ??.
The basic result for stochastic integrals is now that the integral with respect to a local martingale
is a local martingale, cf. result I.4.34(b) (p. 47) in Jacod and Shiryaev (2003).
2.3.5 Semimartingales
X
t
is a semimaringtale if it can be written
X
t
= X
0
+M
t
+A
t
, 0 ? t ? T, (45)
The Econometrics of High Frequency Data 17
where X
0
is T
0
-measurable, M
t
is a local martingale, and A
t
is a process of ?nite variation, i.e.,
sup
i
[X
t
i+1
?X
t
i
[ < ?, (46)
where the supremum is over all grids 0 = t
0
< t
1
< ... < t
n
= T, and all n.
In particular, an Itˆo process is a semimartingale, with
M
t
=
_
t
0
?
t
dW
t
and
A
t
=
_
t
0
µ
t
dt. (47)
A supermartingale is semimartingale for which A
t
is nonincreasing. A submartingale is a semi-
martingale for which A
t
is nondecreasing.
2.4 Quadratic Variation of a Semimartingale
2.4.1 De?nitions
We start with some notation. A grid of observation times is given by
( = ¦ t
0
, t
1
, ..., t
n
¦, (48)
where we suppose that
0 = t
0
< t
1
< ... < t
n
= T. (49)
Set
?(() = max
1?i?n
(t
i
?t
i?1
). (50)
For any process X, we de?ne its quadratic variation relative to grid ( by
[X, X]
G
t
=
t
i+1
?t
(X
t
i+1
?X
t
i
)
2
. (51)
One can more generally de?ne the quadratic covariation
[X, Y ]
G
t
=
t
i+1
?t
(X
t
i+1
?X
t
i
)(Y
t
i+1
?Y
t
i
). (52)
An important theorem of stochastic calculus now says that
Theorem 2. For any semimartingale, there is a process [X, Y ]
t
so that
[X, Y ]
G
t
p
?[X, Y ]
t
for all t ? [0, T], as ?(() ?0. (53)
The limit is independent of the sequence of grids (.
The Econometrics of High Frequency Data 18
The result follows from Theorem I.4.47 (p. 52) in Jacod and Shiryaev (2003). In fact, the
t
i
can even be stopping times. (In our further development, the t
i
will typically be irregular but
nonrandom).
For an Itˆo process,
[X, X]
t
=
_
t
0
?
2
s
ds. (54)
(Cf Thm I.4.52 (p. 55) and I.4.40(d) (p. 48) of Jacod and Shiryaev (2003)).
The process [X, X]
t
is usually referred to as the quadratic variation of the semimartingale (X
t
).
This is an important concept, as seen in Section 2.2.2. The theorem asserts that this quantity can
be estimated consistently from data.
2.4.2 Properties
Important properties are as follows:
(1) Bilinearity: [X, Y ]
t
is linear in each of X and Y .
(2) If (W
t
) and (B
t
) are two independent Wiener processes, then
[W, B]
t
= 0. (55)
Example 3. For the Heston model in Section 2.2.3, one gets from ?rst principles that
[W, Z]
t
= ?[W, W]
t
+ (1 ??
2
)
1/2
[W, B]
t
= ?t, (56)
since [W, W]
t
= t and [W, B]
t
= 0.
(3) For stochastic integrals over Itˆo processes X
t
and Y
t
,
U
t
=
_
t
0
H
s
dX
s
and V
t
=
_
t
0
K
s
dY
s
, (57)
then
[U, V ]
t
=
_
t
0
H
s
K
s
d[X, Y ]
s
. (58)
This is often written on “di?erential form” as
d[U, V ]
t
= H
t
K
t
d[X, Y ]
t
. (59)
by invoking the same results that led to (54).
(4) For any Itˆo process X, [X, t] = 0.
The Econometrics of High Frequency Data 19
Example 4. (Leverage E?ect in the Heston model).
d[X, ?
2
] = ??
2
t
d[W, Z]
t
= ??
2
?dt. (60)
(5) Invariance under discounting by the short term interest rate. Discounting is important in
?nance theory. The typical discount rate is the risk free short term interest rate r
t
. Recall that
S
t
= exp¦X
t
¦. The discounted stock price is then given by
S
?
t
= exp¦?
_
t
0
r
s
ds¦S
t
. (61)
The corresponding process on the log scale is X
?
t
= X
t
?
_
t
0
r
s
ds, so that if X
t
is given by (20), then
dX
?
t
= (µ
t
?r
t
)dt +?
t
dW
t
. (62)
The quadratic variation of X
?
t
is therefore the same as for X
t
.
It should be emphasized that while this result remains true for certain other types of discounting
(such as those incorporating cost-of-carry), it is not true for many other relevant types of discount-
ing. For example, if one discounts by the zero coupon bond ?
t
maturing at time T, the discounted
log price becomes X
?
t
= X
t
?log ?
t
. Since the zero coupon bond will itself have volatility, we get
[X
?
, X
?
]
t
= [X, X]
t
+ [log ?, log ?]
t
?2[X, log ?]
t
. (63)
2.4.3 Variance and Quadratic Variation
Quadratic variation has a representation in terms of variance. The main result concerns martingales.
For E(X
2
) < ?, de?ne the conditional variance by
Var(X[/) = E((X?E(X[/))
2
[A) = E(X
2
[/) ?E(X[/)
2
. (64)
and similarly Cov(X, Y[/) = E((X?E(X[/))(Y ?E(Y[/)[A).
Theorem 3. Let M
t
be a martingale, and assume that E[M, M]
T
< ?. Then, for all s < t,
Var(M
t
[T
s
) = E((M
t
?M
s
)
2
[T
s
) = E([M, M]
t
?[M, M]
s
[T
s
). (65)
A quick argument for this is as follows. Let ( = ¦ t
0
, t
1
, ..., t
n
¦, and suppose for simplicity that
s, t ? (. Then, for s ? t
i
< t
j
,
E((M
t
i+1
?M
t
i
)(M
t
j+1
?M
t
j
)[T
t
j
) = (M
t
i+1
?M
t
i
)E((M
t
j+1
?M
t
j
)[T
t
j
)
= 0, (66)
The Econometrics of High Frequency Data 20
so that by the Tower rule (since T
s
? T
t
j
)
Cov(M
t
i+1
?M
t
i
, M
t
j+1
?M
t
j
[T
s
) = E((M
t
i+1
?M
t
i
)(M
t
j+1
?M
t
j
)[T
s
) = 0. (67)
If follows that
Var(M
t
?M
s
[T
s
) =
s?t
i
0, and set
?
n
= inf¦ t ? [0, T] : n
?2
i
(t
i+1
? t ?t
i
? t)
4
> ¦.
Then
E[M
(2)
, M
(2)
]
?n
? n
?2
1
28
C
8
8
?
8
+
(122)
By assumption, n
?2
i
(t
i+1
? t ?t
i
? t)
4
? ?(()n
?2
i
(t
i+1
?t
i
)
3
p
? 0, and hence
P(?
n
,= T) ?0 as n ??. (123)
Hence, for any ? > 0,
P(n sup
0?t?T
[M
(2)
t
[ > ?) ? P(n sup
0?t??n
[M
(2)
t
[ > ?) +P(?
n
,= T)
?
1
?
2
E
_
n sup
0?t??n
[M
(2)
t
[
_
2
+P(?
n
,= T) (Chebychev)
?
1
?
2
C
2
2
n
2
E[M
(2)
, M
(2)
]
?n
+P(?
n
,= T) (Burkholder-Davis-Gundy)
?
1
?
2
C
2
2
1
28
C
8
8
?
8
+
+P(?
n
,= T) (from (122))
?
1
?
2
C
2
2
1
28
C
8
8
?
8
+
as n ?? (from (123)). (124)
Hence Proposition 1 has been shown.
3.7 Quadratic Variation of the Error Process: When Observation Times are
Independent of the Process
3.7.1 Main Approximation
We here assume that the observation times are independent of the process X. The basic insight
for the following computation is that over small intervals, (X
t
?X
t?
)
2
? [X, X]
t
?[X, X]
t?
. To the
The Econometrics of High Frequency Data 31
extent that this approximation is valid, it follows from (104) that
[M, M]
t
= 4
t
i+1
?t
_
t
i+1
t
i
([X, X]
s
?[X, X]
t
i
)d[X, X]
s
+ 4
_
t
t?
([X, X]
s
?[X, X]
t?
)d[X, X]
s
= 2
t
i+1
?t
([X, X]
t
i+1
?[X, X]
t
i
)
2
+ 2([X, X]
t
?[X, X]
t?
)
2
. (125)
We shall use this device several times in the following, and will this ?rst time do it rigorously.
Proposition 2. Assume (98), and that ?
2
t
is continuous in mean square:
sup
0?t?s??
E(?
2
t
??
2
s
)
2
?0 as ? ?0. (126)
Also suppose that the grids (
n
are nonrandom, or independent of the process X
t
. Also suppose that,
as n ?0, ?(() = o
p
(n
?1/2
), and assume (109). Then
[M, M]
t
= 2
t
i+1
?t
([X, X]
t
i+1
?[X, X]
t
i
)
2
+ 2([X, X]
t
?[X, X]
t?
)
2
+o
p
(n
?1
). (127)
If ?
t
is continuous, it is continuous in mean square (because of (98)). More generally, ?
t
can,
for example, also have Poisson jumps.
In the rest of this Section, we shall write all expectations implicitly as conditional on the times.
To show Proposition 2, we need some notation and a lemma, as follows:
Lemma 1. Let N
t
be an Itˆo process martingale, for which (for a, b > 0), for all t,
d
dt
E[N, N]
t
? a(t ?t
?
)
b
. (128)
Let H
t
be a predictable process, satisfying [H
t
[ ? H
+
for some constant H
+
. Set
R(()
v
=
_
n
i=0
(t
i+1
?t
i
)
v
_
. (129)
Then
[[
t?t
i+1
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds +
_
t
t?
(N
s
?N
t?
)H
s
ds[[
1
?
_
H
2
+
a
b + 3
R
b+3
(()
_
1/2
+R
(b+3)/2
(()
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
(130)
Proof of Proposition 2 Set N
t
= M
t
and H
t
= ?
2
t
. Then
d[M, M]
t
= 4(X
t
?X
t
i
)
2
d[X, X]
t
= 4([X, X]
t
?[X, X]
t
i
)d[X, X]
t
+ 4((X
t
?X
t
i
)
2
?([X, X]
t
?[X, X]
t
i
))d[X, X]
t
= 4([X, X]
t
?[X, X]
t
i
)d[X, X]
t
+ 2(N
t
?N
t
i
)?
2
t
dt. (131)
The Econometrics of High Frequency Data 32
Thus, the approximation error in (127) is exactly of the form of the left hand side in (130). We
note that
Ed[N, N]
t
= E(X
t
?X
t
i
)d[X, X]
t
= E(X
t
?X
t
i
)?
2
+
dt
= (t ?t
i
)?
4
+
dt (132)
hence the conditions of Lemma 1 are satis?ed with a = ?
4
+
and b = 1. The result follows from(117).
3.7.2 Proof of Lemma 1 (Technical Material, can be omitted)
Decompose the original problem as follows:
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds =
_
t
i+1
t
i
(N
s
?N
t
i
)H
t
i
ds +
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds. (133)
For the ?rst term, from Itˆo’s formula, d(t
i+1
? s)(N
s
? N
t
i
) = ?(N
s
? N
t
i
)ds + (t
i+1
? s)dN
s
, so
that
_
t
i+1
t
i
(N
s
?N
t
i
)H
t
i
ds = H
t
i
_
t
i+1
t
i
(t
i+1
?s)dN
s
(134)
hence
t
i+1
?t
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds =
t
i+1
?s
H
t
i
_
t
i+1
t
i
(t
i+1
?t)dN
s
+
t
i+1
?t
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds.
(135)
The ?rst term is the end point of a martingale. For each increment,
E
__
t
i+1
t
i
(N
s
?N
t
i
)H
t
i
ds
_
2
= E
_
H
t
i
_
t
i+1
t
i
(t
i+1
?s)dN
s
_
2
? H
2
+
E
__
t
i+1
t
i
(t
i+1
?s)dN
s
_
2
= H
2
+
E
__
t
i+1
t
i
(t
i+1
?s)
2
d[N, N]
s
_
= H
2
+
_
t
i+1
t
i
(t
i+1
?s)
2
dE[N, N]
s
= H
2
+
_
t
i+1
t
i
(t
i+1
?s)
2
d
ds
E[N, N]
s
ds
= H
2
+
_
t
i+1
t
i
(t
i+1
?s)
2
a(s ?t
i
)
b
ds
= H
2
+
a
b + 3
(t
i+1
?t
i
)
b+3
(136)
The Econometrics of High Frequency Data 33
and so, by the uncorrelatedness of martingale increments,
E
_
_
t
i+1
?t
H
t
i
_
t
i+1
t
i
(t
i+1
?t)dN
s
_
_
2
? H
2
+
a
b + 3
_
_
t
i+1
?t
(t
i+1
?t
i
)
3
_
_
? H
2
+
a
b + 3
R
b+3
(() (137)
On the other hand, for the second term in (135),
[[(N
s
?N
t
i
)(H
s
?H
t
i
)[[
1
? [[N
s
?N
t
i
[[
2
[[H
s
?H
t
i
[[
2
?
_
E(N
s
?N
t
i
)
2
_
1/2
[[H
s
?H
t
i
[[
2
= (E([N, N]
s
?[N, N]
t
i
))
1/2
[[H
s
?H
t
i
[[
q
=
__
s
t
i
d
du
E[N, N]
u
du
_
1/2
[[H
s
?H
t
i
[[
2
?
__
s
t
i
a(u ?t
i
)
b
du
_
1/2
[[H
s
?H
t
i
[[
2
=
_
a
b + 1
(s ?t
i
)
b+1
_
1/2
[[H
s
?H
t
i
[[
2
= (s ?t
i
)
(b+1)/2
_
a
b + 1
(s ?t
i
)
b+1
_
1/2
[[H
s
?H
t
i
[[
2
, (138)
and from this
[[
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds[[
1
?
_
t
i+1
t
i
[[(N
s
?N
t
i
)(H
s
?H
t
i
)[[
1
ds
?
_
t
i+1
t
i
(s ?t
i
)
(b+1)/2
ds
_
a
b + 1
_
1/2
sup
t
i
?s?t
i+1
[[H
s
?H
t
i
[[
2
= (t
i+1
?t
i
)
(b+3)/2
2
b + 3
_
a
b + 1
_
1/2
sup
t
i
?s?t
i+1
[[H
s
?H
t
i
[[
2
(139)
Hence, ?nally, for the second term in (135),
[[
t?t
i+1
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)dt[[
s
?
_
_
t?t
i+1
(t
i+1
?t
i
)
(b+3)/2
_
_
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
= R
(b+3)/2
(()
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
. (140)
The Econometrics of High Frequency Data 34
Hence, for the overall sum (135), from (137) and (140) and
[[
t?t
i+1
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds[[
1
? [[
t
i+1
?s
H
t
i
_
t
i+1
t
i
(t
i+1
?t)dN
s
[[
1
+[[
t
i+1
?t
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds[[
1
? [[
t
i+1
?s
H
t
i
_
t
i+1
t
i
(t
i+1
?t)dN
s
[[
2
+[[
t
i+1
?t
_
t
i+1
t
i
(N
s
?N
t
i
)(H
s
?H
t
i
)ds[[
1
?
_
H
2
+
a
b + 3
R
b+3
(()
_
1/2
+R
(b+3)/2
(()
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
.
(141)
The part from t
?
to t can be included similarly, showing the result.
3.7.3 Quadratic Variation of the Error Process, and Quadratic Variation of Time
To give the ?nal form to this quadratic variation, de?ne the “Asymptotic Quadratic Variation of
Time” (AQVT), given by
H(t) = lim
n??
n
T
t
n,j+1
?t
(t
n,j+1
?t
n,j
)
2
, (142)
provided that the limit exists. From Example 6, we know that dividing by n is the right order. We
now get
Proposition 3. Assume the conditions of Proposition 2, and that the AQVT exists. Then
n[M, M]
t
p
? 2T
_
t
0
?
4
s
dH
s
. (143)
The proof is a straight exercise in analysis. The heuristic for the result is as follows. From
(127),
[M, M]
t
= 2
t
i+1
?t
([X, X]
t
i+1
?[X, X]
t
i
)
2
+ 2([X, X]
t
?[X, X]
t?
)
2
+o
p
(n
?1
)
= 2
t
i+1
?t
(
_
t
i+1
t
i
?
2
s
ds)
2
+ 2(
_
t
t?
?
2
s
ds)
2
+o
p
(n
?1
)
= 2
t
i+1
?t
((t
i+1
?t
i
)?
2
t
i
)
2
+ 2((t ?t
?
)?
2
t?
)
2
+o
p
(n
?1
)
= 2
T
n
_
t
0
?
4
s
dH
s
+o
p
(n
?1
). (144)
The Econometrics of High Frequency Data 35
Example 8. We here give a couple of examples of the AQVT:
(i) When the times are equidistant: t
i+1
?t
i
= T/n, then
H(t) ?
n
T
t
n,j+1
?t
_
T
n
_
2
=
T
n
#¦t
i+1
? t¦
= T fraction of t
i+1
in [0, t]
? T
t
T
= t. (145)
(ii) When the times follow a Poisson process with parameter ?, we proceed as in case (ii) in Example
6. We condition on the number of sampling points n, and get t
i
= TU
(i)
(for 0 < i < n), where
U
(i)
is the i’th order statistic of U
1
, ..., U
n
, which are iid U[0,1]. Hence (again taking U
(0)
= 0 and
U

= 1)
H(t) ?
n
T
t
n,j+1
?t
(t
i+1
?t
i
)
2
= T
2
n
T
t
n,j+1
?t
(U
(i)
?U
(i?1)
)
2
= T
2
n
T
t
n,j+1
?t
EU
2
(1)
(1 +o
p
(1))
= T
2
n
T
#¦t
i+1
? t¦EU
2
(1)
(1 +o
p
(1))
= Tn
2
t
T
EU
2
(1)
(1 +o
p
(1))
= 2t(1 +o
p
(1)) (146)
by the law of large numbers, since the spacings have identical distribution [again, verify this], and
since EU
2
(1)
= 2/(n + 1)(n + 2). Hence H(t) = 2t.
3.7.4 The Quadratic Variation of Time in the General Case
We now go back to considering the times as possibly dependent with the process X. Note that by
using the Burkholder-Davis-Gundy Inequality conditionally, we obtain that
c
4
4
E((X
t
i+1
?X
t
i
)
4
[ T
t
i
) ? E(([X, X]
t
i+1
?[X, X]
t
i
)
2
[ T
t
i
) ? C
4
4
E((X
t
i+1
?X
t
i
)
4
[ T
t
i
), (147)
where c
4
and C
4
are as in Section 3.6.1. In the typical law of large numbers setting, [X, X, X, X]
t
?
i
E((X
t
i+1
?X
t
i
)
4
[ T
t
i
) is a martingale which is of lower order than [X, X, X, X]
t
itself, and the
same goes for
i
_
([X, X]
t
i+1
?[X, X]
t
i
)
2
?E(([X, X]
t
i+1
?[X, X]
t
i
)
2
[ T
t
i
)
¸
. In view of Propo-
sition 3, therefore, it follows that under suitable regularity conditions, if n[X, X, X, X]
t
p
? U
t
as
The Econometrics of High Frequency Data 36
n ? ?, and if the AQVT H
t
is absolutely continuous in t, then U
t
is also absolutely continuous,
and
c
4
4
2T?
4
t
H
t
? U
t
? C
4
4
2T?
4
t
H
t
. (148)
This is of some theoretic interest in that it establishes the magnitude of the limit of n[X, X, X, X]
t
.
However, it should be noted that C
4
4
= 2
18
/3
6
? 359.6, so the bounds are of little practical interest.
3.8 Quadratic Variation, Variance, and Asymptotic Normality
We shall later see that n
1/2
([X, X]
G
t
? [X, X]
t
) is approximately normal. In the simplest case,
where the times are independent of the process, the normal distribution has mean zero and variance
[M, M]
t
? 2
T
n
_
t
0
?
4
s
dH
s
. From standard central limit considerations, this is unsurprising when the
?
t
process is nonrandom, or more generally independent of the W
t
process. (In the latter case, one
simply conditions on the ?
t
process).
What is surprising, and requires more concepts, is that the normality result also holds when ?
t
process has dependence with the W
t
process. For this we shall need new concepts, to be introduced
in Section 4.
4 Asymptotic Normality
4.1 Stable Convergence
In order to de?ne convergence in law, we need to deal with the following issue. Suppose
ˆ
?
n
is an
estimator of ?, say,
ˆ
?
n
= [X, X]
Gn
T
and ? = [X, X]
T
=
_
T
0
?
2
t
dt. As suggested in Section 3.7.3, the
variance of Z
n
= n
1/2
(
ˆ
?
n
? ?) converges to 2T
_
t
0
?
4
s
dH
s
. What we shall now go on to show the
following convergence in law:
n
1/2
(
ˆ
?
n
??)
L
? U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
. (149)
where U is a standard normal random variable, independent of the ?
2
t
process. In order to show
this, we need to be able to bring along prelimiting information into the limit: U only exists in
the limit, while as argued in Section 3.5.1, the asymptotic variance 2T
_
t
0
?
4
s
dH
s
can be estimated
consistently, and so is a limit in probability of a prelimiting quantity.
To operationalize the concept in our setting, we need the ?ltration (T
t
) to which all relevant
processes (X
t
, ?
t
, etc) are adapted. We shall assume that Z
n
(the quantity that is converging in
law) to be measurable with respect to a ?-?eld ?, T
T
? ?. The reason for this is that it is often
convenient to exclude microstructure noise from the ?ltration T
t
. Hence, for example, the TSRV
(in Section 5 below) is not T
T
-measurable.
The Econometrics of High Frequency Data 37
De?nition 8. Let Z
n
be a sequence of ?-measurable random variables, T
T
? ?. We say that Z
n
converges T
T
-stably in law to Z as n ?? if Z is measurable with respect to an extension of ? so
that for all A ? T
T
and for all bounded continuous g, EI
A
g(Z
n
) ?EI
A
g(Z) as n ??.
The de?nition means, up to regularity conditions, that Z
n
converges jointly in law with all
T
T
measurable random variables. This intuition will be imprortant in the following. For further
discussion of stable convergence, see R´enyi (1963), Aldous and Eagleson (1978), Chapter 3 (p. 56)
of Hall and Heyde (1980), Rootz´en (1980) and Section 2 (p. 169-170) of Jacod and Protter (1998).
We now move to the main result.
4.2 Asymptotic Normality
We shall be concerned with a sequence of martingales M
n
t
, 0 ? t ? T, n = 1, 2, ..., and how it
converges to a limit M
t
. We consider here only continuous martingales, which are thought of as
random variables taking values in the set C of continuous functions [0, T] ?R.
To de?ne weak, and stable, convergence, we need a concept of continuity. We say that g is a
continuous function C ?R if:
sup
0?t?T
[x
n
(t) ?x(t)[ ?0 implies g(x
n
) ?g(x). (150)
We note that if (M
n
t
)
L
? (M
t
) in this process sense, then, for example, M
n
T
L
?M
T
as a random
variable. This is because the function x ? g(x) = x(T) is continuous. The reason for going via
process convergence is (1) sometimes this is really the result one needs, and (2) since our theory
is about continuous processes converging to a continuous process, one does not need asymptotic
negligibility conditions `a la Lindeberg (these kinds of conditions are in place in the usual CLT
precisely to avoid jumps is the asymptotic process).
In order to show results about continuous martingales, we shall use the following assumption
Assumption 1. There are Brownian motions W
(1)
t
, ..., W
(p)
t
(for some p) that generate (T
t
).
It is also possible to proceed with assumptions under which there are jumps in some processes,
but for simplicity, we omit any discussion of this here.
Under Assumption 1, it follows from Lemma 2.1 (p. 270) in Jacod and Protter (1998) that stable
convergence in law of a local martingale M
n
to a process M is equivalent to (straight) convergence
in law of the process (W
(1)
, ..., W
(p)
, M
n
) to the process (W
(1)
, ..., W
(p)
, M). This result does not
extend to all processes and spaces, cf. the discussion in the cited paper.
Another main fact about stable convergence is that limits and quadratic variation can be inter-
changed:
The Econometrics of High Frequency Data 38
Proposition 4. (Interchangeability of limits and quadratic variation). Assume that M
n
is a se-
quence of continuous local martingales which converges stably to a process M. Then: (M
n
, [M
n
, M
n
])
converges stably to (M, [M, M]).
For proof, we refer to Corollary VI.6.30 (p. 385) in Jacod and Shiryaev (2003), which also
covers the case of bounded jumps. More generally, consult ibid., Chapter VI.6.
We now state the main central limit theorem (CLT).
Theorem 6. Assume Assumption 1. Let (M
n
t
) be a sequence of continuous local martingales on
[0, T], each adapted to (T
t
), with M
n
0
= 0. Suppose that there is an (T
t
) adapted process f
t
so that
[M
n
, M
n
]
t
p
?
_
t
0
f
2
s
ds for each t ? [0, T]. (151)
Also suppose that, for each i = 1, .., p,
[M
n
, W
(i)
]
t
p
? 0 for each t ? [0, T] (152)
There is then an extension (T
t
) of (T
t
), and an (T
t
)-martingale M
t
so that (M
n
t
) converges stably
to (M
t
). Furthermore, there is a Brownian motion (W
t
) so that (W
(1)
t
, ..., W
(p)
t
, W
t
) is an (T
t
)-
Wiener process, and so that
M
t
=
_
t
0
f
s
dW
s
. (153)
It is worth while to understand the proof of this result, and hence we give it here. The proof
follows more or less verbatim that of Theorem B.4 in Zhang (2001) (p. 65-67), which is slightly more
general. (It has also been updated to re?ect the new edition of the work by Jacod and Shiryaev. A
similar result, involving predictable quadratic variations, is given in Theorem IX.7.28 (p. 590-591)
of Jacod and Shiryaev (2003).
Proof of Theorem 6. Since [M
n
, M
n
]
t
is a non-decreasing process and has non-decreasing con-
tinuous limit, the convergence (151) is also in law in D(R) by Theorem VI.3.37 (p. 354) in Jacod
and Shiryaev (2003). Thus, in their terminology (ibid., De?nition VI.3.25, p. 351), [M
n
, M
n
]
t
is
C-tight. From this fact, ibid., Theorem Vi.4.13 (p. 358) yields that the sequence M
n
is tight.
From this tightness, it follows that for any subsequence M
n
k
, we can ?nd a further subsequence
M
n
k
l
which converges in law (as a process) to a limit M, jointly with W
(1)
, ..., W
(p)
(in other
words, (W
(1)
, ..., W
(p)
, M
n
k
l
) converges in law to (W
(1)
, ..., W
(p)
, M). This M is a local martingale
by ibid., Proposition IX.1.17 (p. 526), using the continuity of M
n
t
. Using Proposition 4 above,
(M
n
k
l
, [M
n
k
l
, M
n
k
l
]) converge jointly in law (and jointly with the W
(i)
’s) to (M, [M, M]). From
(151) this means that [M, M]
t
=
_
t
0
f
2
s
ds. The continuity of [M, M]
t
assures that M
t
is continuous.
Also, from (152), [M, W
(i)
] ? 0 for each i = 1, .., p. Now let W
t
=
_
t
0
f
?1/2
s
dM
s
(if f
t
is zero
on a set of Lebesgue measure greater than zero, follow the alternative construction in Volume
The Econometrics of High Frequency Data 39
III of Gikhman and Skorohod (1969). By Property (3) in Section 2.4.2, [W
, W
]
t
= t, while
[W
, W
(i)
] ? 0. By the multivariate version of L´evy’s Theorem (Section 2.4.4), it therefore follows
that (W
(1)
t
, ..., W
(p)
t
, W
t
) is a Wiener process. The equality (153) follows by construction. Hence
the Theorem is shown for subsequence M
n
k
l
. Since the subsequence M
n
k
was arbitrary, Theorem
6 follows.
4.3 Application to Realized Volatility
4.3.1 Independent Times
We now turn our attention to the simplest application: the estimator from Section 3. Consider the
normalized (by
?
n) error process
M
n
t
= 2n
1/2
t
i+1
?t
_
t
i+1
t
i
(X
s
?X
t
i
)dX
s
+ 2n
1/2
_
t
t?
(X
s
?X
t?
)dX
s
. (154)
From Section 3.7.3, we have that Condition (151) of Theorem 6 is satis?ed, with
f
2
t
= 2T?
4
t
H
t
. (155)
It now remains to check Condition (152). Note that
d[M
n
, W
(i)
]
t
= 2n
1/2
(X
t
?X
t?
)d[X, W
(i)
]
t
(156)
We can now apply Lemma 1 with N
t
= X
t
and H
t
= (d/dt)[X, W
(i)
]
t
. From the Cauchy-Shwartz
inequality (in this case known as the Kunita-Watanabe inequality)
[[X, W
(i)
]
t+h
?[X, W
(i)
]
t
[ ?
_
[X, X]
t+h
?[X, X]
t
_
[W
(i)
, W
(i)
]
t+h
?[W
(i)
, W
(i)
]
t
?
_
?
2
+
h
?
h = ?
+
h (157)
(recall that the quadratic variation is a limit of sums of squares), so we can take H
+
= ?
+
. On the
other hand, (d/dt)E[N, N]
t
? ?
2
+
= a(t ?t
?
)
b
with a = ?
2
+
and b = 0.
Thus, from Lemma 1,
[[[M
n
, W
(i)
]
t
[[
1
= 2n
1/2
[[
t?t
i+1
_
t
i+1
t
i
(N
s
?N
t
i
)H
s
ds +
_
t
t?
(N
s
?N
t?
)H
s
ds[[
1
? 2n
1/2
_
H
2
+
a
b + 3
R
b+3
(()
_
1/2
+R
(b+3)/2
(()
2
b + 3
_
a
b + 1
_
1/2
sup
0?t?s??(G)
[[H
s
?H
t
[[
2
= O
p
(n
1/2
R
3
(()
1/2
) +O
p
(n
1/2
R
3/2
(() sup
0?t?s??(G)
[[H
s
?H
t
[[)
= o
p
(1) (158)
The Econometrics of High Frequency Data 40
under the conditions of Proposition 2, since R
v
(() = O
p
(n
1?v
) from (117), and since sup
0?t?s??(G)
[[H
s
?
H
t
[[ = o
p
(1) (The latter fact is somewhat complex. One shows that one can take W
(1)
= W by a
use of L´evy’s theorem, and the result follows).
We have therefore shown:
Theorem 7. Assume Assumption 1, as well as the conditions of Proposition 2, and also that the
AQVT H(t) exists and is absolutely continuous. Let M
n
t
be given by (154). Then (M
n
t
) converges
stably in law to M
t
, given by
M
t
= 2T
_
t
0
?
2
s
_
H
s
dW
s
. (159)
As a special case:
Corollary 1. Under the conditions of the above theorem, for ?xed t,
?
n
_
[X, X]
Gn
t
?[X, X]
t
_
L
? U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
. (160)
where U is a standard normal random variable independent of T
T
.
Similar techniques can now be used on other common estimators, such as the TSRV. We refer
to Section 5.
In the context of equidistant times, this result goes back to Jacod (1994), Jacod and Protter
(1998), and Barndor?-Nielsen and Shephard (2002). We emphasize that the method of proof in
Jacod and Protter (1998) quite di?erent from the one used here, and gives rise to weaker conditions.
The reason for our di?erent treatment is that we have found the current framework more conducive
to generalization to other observation time structures and other estimators. In the long run, it is
an open question which general framework is the most useful.
4.3.2 Endogenous Times
The assumption of independent sampling times is not necessary for a limit result, though a weak-
ening of conditions will change the result. It see what happens, we follow the development
in Li, Mykland, Renault, Zhang, and Zheng (2009), and de?ne the tricicity by [X, X, X]
G
t
=
t
i+1
?t
(X
t
i+1
?X
t
i
)
3
+ (X
t
?X
t?
)
3
, and assume that
n[X, X, X, X]
G
t
p
? U
t
and n
1/2
[X, X, X]
G
t
p
? V
t
, (161)
By the reasoning in Section 3.7.4, n and n
1/2
are the right rates for [X, X, X, X]
G
and [X, X, X]
G
,
respectively. Hence U
t
and V
t
will exist under reasonable regularity conditions. Also, from Section
3.7.4, if the AQVT exists and is absolutely continuous, then so are U
t
and V
t
. We shall use
U
t
=
_
t
0
u
s
ds and V
t
=
_
t
0
v
s
ds. (162)
The Econometrics of High Frequency Data 41
Triticity is handled in much the same way as quarticity. In analogy to the development in
Section 3.5.1, observe that
d(X
t
?X
t
i
)
3
= 3(X
t
?X
t
i
)
2
dX
t
+ 3(X
t
?X
t
i
)d[X, X]
t
= 3(X
t
?X
t
i
)
2
dX
t
+
3
2
d[M, X]
t
,
since d[M, M]
t
= 4(X
t
?X
t
i
)
2
d[X, X]
t
. It follows that if we set
M
(3/2)
t
=
t
i+1
?t
_
t
i+1
t
i
(X
s
?X
t
i
)
3
dX
s
+
_
t
t?
(X
s
?X
t?
)
3
dX
s
we get
[X, X, X]
G
t
=
3
2
[M, X]
t
+ 3M
(3/2)
t
.
In analogy with Proposition 1, we hence obtain:
Proposition 5. Assume the conditions of Proposition 1. Then
sup
0?t?T
[ [M, X]
t
?
2
3
[X, X, X]
G
t
[ = o
p
(n
?1/2
) as n ??. (163)
It follows that unless V
t
? 0, the condition (152) is Theorem 6 will not hold. To solve this
problem, de?ne an auxiliary martingale
˜
M
n
t
= M
n
t
?
_
t
0
g
s
dX
s
, (164)
where g is to be determined. We now see that
[
˜
M
n
, X]
t
= [M
n
, X]
t
?
_
t
0
g
s
d[X, X]
s
p
?
_
t
0
(
2
3
v
s
?g
s
?
2
s
)ds and
[
˜
M
n
,
˜
M
n
] = [M
n
, M
n
] +
_
t
0
g
2
s
d[X, X]
s
?2
_
t
0
g
s
d[M
n
, X]
p
?
_
t
0
(
2
3
u
s
+g
2
s
?
2
s
?2
2
3
g
s
v
s
)ds.
Hence, if we chose g
t
= 2v
t
/3?
2
t
, we obtain that [
˜
M
n
, X]
t
p
? 0 and [
˜
M
n
,
˜
M
n
]
p
?
_
t
0
(u
s
?v
s
?
?2
s
)ds.
By going through the same type of arguments as above, we obtain:
Theorem 8. Assume Assumption 1, as well as the conditions of Proposition 2. Also assume that
(161) holds for each t ? [0, T], and that the absolute continuity (162) holds. Then (M
n
t
) converges
stably in law to M
t
, given by
M
t
=
2
3
_
t
0
v
t
?
2
t
dX
t
+
_
t
0
_
2
3
u
s
?
4
9
v
s
?
2
s
_
1/2
dW
s
,
where W
is independent of W
(1)
, ..., W
(p)
.
The Econometrics of High Frequency Data 42
It is clear from this that the assumption of independent sampling times implies that v
t
? 0.
A similar result was shown in Li, Mykland, Renault, Zhang, and Zheng (2009), where implica-
tions of this result are discussed further.
4.4 Statistical Risk Neutral Measures
We have so far ignored the drift µ
t
. We shall here provide a trick to reinstate the drift in any analysis,
without too much additional work. It will turn out that stable convergence is a key element in the
discussion. Before we go there, we need to introduce the concept of absolute continuity.
We refer to a probability where there is no drift as a “statistical” risk neurtal measure. This
is in analogy to the use of equivalent measures in asset pricing. See, in particular, Ross (1976),
Harrison and Kreps (1979), Harrison and Pliska (1981), Delbaen and Schachermayer (1995), and
Du?e (1996).
4.4.1 Absolute Continuity
We shall in the following think about having two di?erent probabilities on the same observables.
For example, P can correspond to the system
dX
t
= ?
t
dW
t
, X
0
= x
0
, (165)
while Q can correspond to the system
dX
t
= µ
t
dt +?
t
dW
Q
t
, X
0
= x
0
. (166)
In this case, W
t
is a Wiener process under P, and W
Q
t
is a Wiener process under Q. Note that
since we are modeling the process X
t
, this process is the observable quantity whose distribution
we seek. Hence, the process X
t
does not change from P to Q, but its distribution changes. If we
equate (165) and (166), we get
µ
t
dt +?
t
dW
Q
t
= ?
t
dW
t
, (167)
or
µ
t
?
t
dt +dW
Q
t
= dW
t
. (168)
As we discussed in the constant µ and ? case, when carrying out inference for observations in a
?xed time interval [0, T], the process µ
t
cannot be consistently estimated. A precise statement to
this e?ect (Girsanov’s Theorem) is given below.
The fact that µ cannot be observed means that one cannot fully distinguish between P and Q,
even with in?nite data. This concept is captured in the following de?nition:
The Econometrics of High Frequency Data 43
De?nition 9. For a given ?-?eld /, two probabilities P and Q are mutually absolutely continuous
(or equivalent) if, for all A ? /, P(A) = 0 Q(A) = 0. More generally, Q is absolutely
continuous with respect to P if , for all A ? /, P(A) = 0 => Q(A) = 0.
We shall see that P and Q from (165) and (166) are, indeed, mutually absolutely continuous.
4.4.2 The Radon-Nikodym Theorem, and the Likelihood Ratio
Theorem 9. (Radon-Nikodym) Suppose that Q is absolutely continuous under P on ?-?eld /.
Then there is a random variable (/ measurable) dQ/dP so that for all A ? /,
Q(A) = E
P
_
dQ
dP
I
A
_
. (169)
For proof and a more general theorem, see Theorem 32.2 (p. 422) in Billingsley (1995).
The quantity dQ/dP is usually called either the Radon-Nikodym derivative or the likelihood
ratio, It is easy to see that dQ/dP is unique “almost surely” (in the same way as the conditional
expectation).
Example 9. The simplest case of a Radon-Nikodym derivative is where X
1
, X
2
, ..., X
n
are iid,
with two possible distributions P and Q. Suppose that X
i
has density f
P
and f
Q
under P and Q,
respectively. Then
dQ
dP
=
f
Q
(X
1
)f
Q
(X
n
)...f
Q
(X
n
)
f
P
(X
1
)f
P
(X
n
)...f
P
(X
n
)
(170)
Likelihood ratios are of great importance in statistical inference generally.
4.4.3 Properties of Likelihood Ratios
• P(
dQ
dP
? 0) = 1
• If Q is equivalent to P: P(
dQ
dP
> 0) = 1
• E
P
_
dQ
dP
_
= 1
• For all /-measurable Y : E
P
(Y ) = E
P
_
Y
dQ
dP
_
• If Q is equivalent to P:
dP
dQ
=
_
dQ
dP
_
?1
The Econometrics of High Frequency Data 44
4.4.4 Girsanov’s Theorem
We now get to the relationship between P and Q in systems (165) and (166). To give the generality,
we consider the vector process case (where µ is a vector, and ? is a matrix). The superscript “T”
here stands for “transpose”.
Theorem 10. (Girsanov). Subject to regularity conditions, P and Q are mutually absolutely
continuous, and
dP
dQ
= exp
_
?
_
T
0
?
?1
t
µ
t
dW
Q
t
?
1
2
_
T
0
µ
T
t
(?
t
?
T
t
)
?1
µ
t
dt
_
, (171)
The regularity conditons are satis?ed if ?
?
? ?
t
? ?
+
, and [µ
t
[ ? µ
+
, but they also cover much
more general situations. For a more general statement, see, for example, Chapter 5.5 of Karatzas
and Shreve (1991)).
4.4.5 How to get rid of µ: Interface with Stable Convergence
The idea is borrowed from asset pricing theory. We think that the true distribution is Q, but we
prefer to work with P since then calculations are much simpler.
Our plan is the following: carry out the analysis under P, and adjust results back to Q using
the likelihood ratio (Radon-Nikodym derivative) dP/dQ. Speci?cally suppose that ? is a quantity
to be estimated (such as
_
T
0
?
2
t
dt,
_
T
0
?
4
t
dt, or the leverage e?ect). An estimator
ˆ
?
n
is then found
with the help of P
?
, and an asymptotic result is established whereby, say,
n
1/2
(
ˆ
?
n
??)
L
?N(b, a
2
) stably (172)
under P. It then follows directly from the measure theoretic equivalence that n
1/2
(
ˆ
?
n
? ?) also
converges in law under Q. In particular, consistency and rate of convergence is una?ected by the
change of measure. We emphasize that this is due to the ?nite (?xed) time horizon T.
The asymptotic law may be di?erent under P and Q. While the normal distribution remains,
the distributions of b and a
2
(if random) may change.
The technical result is as follows.
Proposition 6. Suppose that Z
n
is a sequence of random variables which converges stably to
N(b, a
2
) under P. By this we mean that N(b, a
2
) = b + aN(0, 1), where N(0, 1) is a standard
normal variable independent of T
T
, also a and b are T
T
measurable. Then Z
n
converges stably in
law to b +aN(0, 1) under P, where N(0, 1) remains independent of T
T
under Q.
Proof of Proposition. E
Q
I
A
g(Z
n
) = E
P
dQ
dP
I
A
g(Z
n
) ? E
P
dQ
dP
I
A
g(Z) = E
Q
I
A
g(Z) by uniform
integrability of
dQ
dP
I
A
g(Z
n
).
The Econometrics of High Frequency Data 45
Proposition 6 substantially simpli?es calculations and results. In fact, the same strategy will be
helpful for the localization results that come next in the paper. It will turn out that the relationship
between the localized and continuous process can also be characterized by absolute continuity and
likelihood ratios.
Remark 1. It should be noted that after adjusting back from P to Q, the process µ
t
may show
up in expressions for asymptotic distributions. For instances of this, see Sections 2.5 and 4.3 of
Mykland and Zhang (2007). One should always keep in mind that drift most likely is present, and
may a?ect inference. 2
Remark 2. As noted, our device is comparable to the use of equivalent martingale measures in
options pricing theory (Ross (1976), Harrison and Kreps (1979), Harrison and Pliska (1981), see
also Du?e (1996)) in that it a?ords a convenient probability distribution with which to make
computations. In our econometric case, one can always take the drift to be zero, while in the
options pricing case, this can only be done for discounted securities prices. In both cases, however,
the computational purpose is to get rid of a nuisance “dt term”.
The idea of combining stable convergence with measure change appears to go back to Rootz´en
(1980). 2
4.5 Unbounded ?
t
We have so far assumed that ?
2
t
? ?
2
+
. With the help of stable convergence, it is also easy to
weaken this assumption. One can similarly handle restrictions µ
t
, and on ?
2
t
being bounded away
from zero.
The much weaker requirement is that ?
t
be locally bounded. This is to say that there is a
sequence of stopping times ?
m
and of constants ?
m,+
so that
P(?
m
? T) ?0 as m ?? and
?
2
t
? ?
2
m,+
for 0 ? t ? ?
m
. (173)
For example, this is automatically satis?ed if ?
t
is a continuous process.
As an illustration of how to incorporate such local boundedness in existing results, take Corollary
1. If we replace the condition ?
2
t
? ?
2
+
by local boundedness, the corollary continues to hold (for
?xed m) with ?
?n?t
replacing ?
t
. On the other hand we note that [X, X]
Gn
is the same for ?
?n?t
and ?
t
on the set ¦?
n
= T¦. Thus, the corollary tells us that for any set A ? T
T
, and for any
bounded continuous function g,
EI
A?{?m=T}
g
_
?
n
_
[X, X]
Gn
t
?[X, X]
t
__
?EI
A?{?m=T}
g
_
U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
_
(174)
The Econometrics of High Frequency Data 46
as n ?? (and for ?xed m), where U has the same meaning as in the corollary. Hence,
[EI
A
g
_
?
n
_
[X, X]
Gn
t
?[X, X]
t
__
?EI
A
g
_
U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
_
[
? [EI
A?{?m=T}
g
_
?
n
_
[X, X]
Gn
t
?[X, X]
t
__
?EI
A?{?m=T}
g
_
U
_
2T
_
t
0
?
4
s
dH
s
_
1/2
_
[
+ 2 max [g(x)[P(?
m
,= T)
?2 max [g(x)[P(?
m
,= T) (175)
as n ??. By choosing m large, the right hand sice of this expression can be made as small as we
wish. Hence, the left hand side actually converges to zero. We have shown:
Corollary 2. Theorem 7, Corollary 1, and Theorem 8 all remain true if the condition ?
2
t
? ?
2
+
is
replaced by a requirement that ?
2
t
be locally bounded.
5 Microstructure
5.1 The Problem
The basic problem is that the semimartingale X
t
is actually contaminated by noise. One observes
Y
t
i
= X
t
i
+
i
. (176)
We do not right now take a position on the structure of the
i
s.
The reason for going to this structure is that the convergence (consistency) predicted by The-
orem 2 manifestly does not hold. To see this, in addition to (, we also use subgrids of the form
H
k
= ¦0, t
k
, t
K+k
, t
2K+k
, ...¦. This given rise to the Average Realized Voltatility (ARV)
ARV (Y, (, K) =
1
K
K
k=1
[Y, Y ]
H
k
. (177)
Note that ARV (Y, (, 1) = [Y, Y ]
G
. If one believes Theorem 2, then the ARV (Y, (, K) should be
close for small K. In fact, the convergence in the theorem should be visible as K decreases to 1.
Figure 1 looks at the ARV (Y, (, K) for Alcoa Aluminun (AA) for january 4, 2001. As can be seen
in the ?gure, the actual data behaves quite di?erently from what the theory predicts. It follows
that the semimartingale assumption does not hold, and we have to move to a model like (176).
The Econometrics of High Frequency Data 47
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• • •
• •
dependence of estimated volatility on number of subgrids
K: # of subgrids
v
o
l
a
t
i
l
i
t
y
o
f
A
A
,
J
a
n
4
,
2
0
0
1
,
a
n
n
u
a
l
i
z
e
d
,
s
q
.
r
o
o
t
s
c
a
l
e
5 10 15 20
0
.
6
0
.
8
1
.
0
1
.
2
Figure 1. RV as One Samples More Frequently. The plot gives ARV (Y, (, K) for
K = 1, ..., 20 for Alcoa Aluminum for the transcations on January 4, 2001. It is clear
that consistency does not hold for the quadratic variation. The semimartingale
model, therefore, does not hold.
The Econometrics of High Frequency Data 48
•
•
•
•
•
•
•
•
•
•
•
•
•
• •
• • •
• •
dependence of estimated volatility on sampling frequency
sampling frequency (seconds)
v
o
l
a
t
i
l
i
t
y
o
f
A
A
,
J
a
n
4
,
2
0
0
1
,
a
n
n
u
a
l
i
z
e
d
,
s
q
.
r
o
o
t
s
c
a
l
e
50 100 150 200
0
.
6
0
.
8
1
.
0
1
.
2
Figure 2. RV as One Samples More Frequently. This is the same figure as Fig 1,
but the x axis is the average number of observations between each transaction for
each ARV (Y, (, K). There is one transaction each ca. 50 seconds in this particular
data.
5.2 An Initial Approach: Sparse Sampling
Plots of the type given in Figure 1 and 2 were ?rst considered by Andersen, Bollerslev, Diebold, and
Labys (2000) and called signature plots. The authors concluded that the most correct values for
the volatility were the lower ones on the left and side of the plot, based mainly on the stabilization
of the curve in this region. On the basis of this, the authors recommended to estimate volatility
using [Y, Y ]
H
, where H is a sparsely sampled subgrid of (. In this early literature, the standard
approach was so subsample about every ?ve minutes.
The philosophy behind this approach is that the size of the noise is very small, and if there
are not too many sampling points, the e?ect of noise will be limited. While true, this uses the data
ine?ciently, and we shall see that better methods can be found. The basic subsampling scheme
does, however, provide some guidance on how to proceed to more complex schemes. For this reason,
The Econometrics of High Frequency Data 49
we shall analyze its properties.
The model used for most analysis is that
i
is independent of X, and iid. One can still, however,
proceed under weaker conditions. For example, if the
i
have serial dependence, a similar analysis
will go through.
The basic decomposition is
[Y, Y ]
H
= [X, X]
H
+ [, ]
H
+ 2[X, ]
H
, (178)
where the cross term is usually (but not always) ignorable. Thus, if the ’s are independent of X,
and E() = 0, we get
E([Y, Y ]
H
[X process) = [X, X]
H
+E[, ]
H
. (179)
If the are identically distributed, then
E[, ]
H
= n
sparse
E(
K
?
0
)
2
, (180)
where n
sparse
= number of points in H, ?1. Smaller n
sparse
gives smaller bias, but bigger variance.
At this point, if you would like to follow this line of development, please consult the discussion
in Section 2 in (Zhang, Mykland, and A¨?t-Sahalia (2005)). This shows that there is an opti-
mal subsampling frequency, given by equation (31) (p 1399) in the paper. A similar analysis for
ARV (Y, (, K) is carried out in Section 3.1-3.3 of the paper.
5.3 Two Scales Realized Volatility (TSRV)
To get a consistent estimator, we go to the two scales realized volatility (TSRV). The TRSV is
de?ned as follows.
[X, X]
(tsrv)
T
= a
n
ARV (Y, (, K) ?b
n
ARV (Y, (, J) (181)
where we shall shortly ?x a
n
and b
n
. It will turn out to be meaningful to use
b
n
= a
n
¯ n
K
¯ n
J
, (182)
where ¯ n
K
= (n?K +1)/K. For asymptotic purposes, we can take a
n
= 1, but more generally will
assume that a
n
?1 as n ??.
This estimator is discussed in Section 4 in Zhang, Mykland, and A¨?t-Sahalia (2005), though
only in the case where J = 1. In the more general case, J is not necessarily 1, but J 0. Note that in the representation
(193), (192) becomes
?
0
=
3
2
_
T
0
_
f
t
?
t
_
2
dt and ?
1
= 2
_
T
0
_
f
2
t
+g
2
t
?
2
t
_
dt (194)
Example 10. In the case of a Heston model (Section 2.2.3), we obtain that
?
0
=
3
8
(??)
2
_
T
0
?
?2
t
dt and ?
1
=
1
4
?
2
(M ?1)
_
T
0
?
?2
t
dt. (195)
Remark 4. (One step discretization). Let P
?
n
be the measure Q
n
which arises when the block
length is M = 1. Observe that even with this one-step discretization, dP/dP
?
n
does not necessarily
converge to unity. In this case, ?
1
= 0, but ?
0
does not vanish when there is leverage e?ect.
2
6.2 Moving windows
The paper so far has considered chopping n data up into non-overlapping windows of size M each.
We here show by example that the methodology can be adapted to the moving window case. We
consider the estimation of ? =
_
T
0
[?
t
[
p
dt, as in Section 4.1 of Mykland and Zhang (2007). It should
be noted that the moving window is close to the concept of a moving kernel, and this may be a
promising avenue of further investigation. See, in particular, Linton (2007).
We use block length M, and we use for simplicity
˜ ?
2
?
n,i
=
1
?t
n
M
n
t
n,j
?(?
n,i
,?
n,i+1
]
(?X
t
n,j+1
)
2
, (196)
as estimator of ?
2
?
n,i
. The moving window estimate of ? is now
˜
?
MW
n
= (?t)
n?M
i=0
[?
t
n,i
[
r
.
It is easy to see that
˜
?
MW
n
=
1
M
M
m=1
˜
?
n,m
+O
p
(n
?1
),
where
˜
?
n,m
is the non-overlapping block estimator, with block number one starting at t
n,m
. In
view of this representation, it is once again clear from su?ciency considerations that the moving
The Econometrics of High Frequency Data 57
window estimator will have an asymptotic variance which is smaller (or, at least, no larger) than
the estimator based on non-overlapping blocks. We now carry out the precise asymptotic analysis.
To analyze this estimator, let / > M, and let A
n
= ¦i = 0, ..., n ? M : [t
n,i
, t
n,i+M
] ?
[k/, (k + 1)/] for some k¦, with B
n
= ¦0, ..., n ?M¦ ?A
n
. Write
n
1/2
(
˜
?
MW
n
??) = n
1/2
?t
k
i:[t
n,i
,t
n,i+M
]?[kM/n,(k+1)M/n]
(
[?
t
n,i
[
r
?[?
t
kM
[
r
)
+n
1/2
?t
i?Bn
(
[?
t
n,i
[
r
?[?
t
n,i
[
r
) +O
p
(n
?1/2
). (197)
Now apply our methodology from Section 6.1, with block size /, to the ?rst term in (197). Under
this block approximation, the inner sum in the ?rst term is based on conditionally i.i.d. observations,
in fact, for [t
n,i
, t
n,i+M
] ? [k//n, (k + 1)//n], ˜ ?
2
t
n,i
= ?
2
kM/n
S
i
, in law, where
S
i
= M
?1
i+M?1
j=i
U
2
j
, U
0
, U
1
, U
2
, ... iid standard normal. (198)
As in Section 4.1 of Mykland and Zhang (2007), there is no adjustment (`a la Remark 3) due
to covariation with the asymptotic likelihood ratios, and so the ?rst term in (197) converges stably
to a mixed normal with random variance as the limit of n?t
2
n
k
[?[
r
kM/n
Var
_
c
?1
M,r
M?M
i=0
S
r/2
i
_
,
which is
Tc
?2
M,r
1
/
Var
_
M?M
i=0
S
r/2
i
_
_
T
0
[?[
r
t
dt. (199)
Similarly, one can apply the same technique to the second term in (197), but now with the k’th
block (k ? 2) starting at k/?M. This analysis yields that the second term is also asymptotically
mixed normal, but with a variance what is of order o
p
(1) as /??. (In other words, once again,
?rst send n to in?nity, and then, afterwards, do the same to /). This yields that, overall, and in
the sense of stable convergence,
n
1/2
(
˜
?
MW
n
??)
L
? N (0, 1)
_
c
?2
M,r
?
M,r
T
_
T
0
[?[
r
t
dt
_
1/2
, (200)
where, from (199), ?
M,r
= lim
M??
Var
_
M?M
i=0
S
r/2
i
_
//, i.e.,
?
M,r
= Var(S
r/2
0
) + 2
M?1
i=1
Cov(S
r/2
0
, S
r/2
i
),
where the S
i
are given in (198).
The Econometrics of High Frequency Data 58
6.3 Multivariate and Asynchronous data
The results discussed in Section 6.1 also apply to vector processes (see Mykland and Zhang (2007)
for details). Also, for purposes of analysis, asynchronous data does not pose any conceptual di?-
culty when applying the results. One includes all observation times when computing the likelihood
ratios in the contiguity theorems. It does not matter that some components of the vector are not
observed at all these times. In a sense, they are just treated as missing data. Just as in the case
of irregular times for scalar processes, this does not necessarily mean that it is straightforward to
write down sensible estimators.
For example, consider a bivariate process (X
(1)
t
, X
(2)
t
). If process (X
(r)
t
) is observed at times :
(
(r)
n
= ¦0 ? t
(r)
n,0
< t
(r)
n,1
< ... < t
(r)
n,n
1
? T¦, (201)
one would normally use the grid (
n
= (
(1)
n
?(
(2)
n
?¦0, T¦ to compute the likelihood ratio dP/dQ
n
.
To focus the mind with an example, consider the estimation of covariation under asynchronous
data. It is shown in Mykland (2006) that the Hayashi-Yoshida estimator (Hayashi and Yoshida
(2005)) can be seen as a nonparametric maximum likelihood estimator (MLE). We shall here see
that blocking induces an additional class of local likelihood based MLEs. The di?erence between
the former and the latter depends on the continuity assumptions made on the volatility process,
and is a little like the di?erence between the Kaplan-Meier (Kaplan and Meier (1958)) and Nelson-
Aalen (Nelson (1969), Aalen (1976, 1978)) estimators in survival analysis. (Note that the variance
estimate for the Haysahi-Yoshida estimator from Section 5.3 of Mykland (2006) obviously also
remains valid in the setting of this paper).
For simplicity, work with a bivariate process, and let the grid (
n
be given by (201). For now,
let the block dividers ? be any subset of (
n
. Under the approximate measure Q
n
, note that for
?
n,i?1
? t
(1)
n,j?1
< t
(1)
n,j
? ?
n,i
and ?
n,i?1
? t
(2)
k?1
< t
(2)
k
? ?
n,i
(202)
the set of returns X
(1)
t
(1)
n,j
? X
(1)
t
(1)
n,j?1
and X
(2)
t
(2)
n,k
? X
(2)
t
(2)
n,k?1
are conditionally jointly normal with mean
zero and covariances
Cov
Qn
(X
(r)
t
(r)
n,j
?X
(r)
t
(r)
n,j?1
, X
(s)
t
(s)
n,k
?X
(s)
t
(s)
n,k?1
) [ T
?
n,i?1
) = (?
?
n,i?1
)
r,s
d¦(t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
)¦ (203)
where d is length (Lebesgue measure). Set ?
r,s,j,k
= ?d¦(t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
)¦. The Q
n
log
likelihood ratio based on observations fully in block (?
n,i?1
, ?
n,i
] is therefore given as
(?) = ?
1
2
lndet(?) ?
1
2
r,s,j,k
?
r,s;j,k
(X
(r)
t
(r)
n,j
?X
(r)
t
(r)
n,j?1
)(X
(s)
t
(s)
n,k
?X
(s)
t
(s)
n,k?1
) ?
N
i
2
ln(2?), (204)
where ?
r,s;j,k
are the elements of the matrix inverse of (?
r,s;j,k
), and N
i
is a measure of block
sample size. The sum in (j, k) is over all intersections (t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
) with positive
The Econometrics of High Frequency Data 59
length satisfying (202). Call the number of such terms
m
(r,s)
n,i
= # nonempty intersections (t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
) satisfying (202) . (205)
The “parameter” ? corresponds to ?
?
n,i?1
. The block MLE is thus given as
ˆ
?
(r,s)
?
n,i?1
=
1
m
(r,s)
n,i
j,k
(X
(r)
t
(r)
n,j
?X
(r)
t
(r)
n,j?1
)(X
(s)
t
(s)
n,k
?X
(s)
t
(s)
n,k?1
)
d¦(t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
)¦
(206)
where the sum is over j, k satisfying (202) for which the denominator in the summand is nonzero.
The overall estimate of covariation is thus
¸X
(r)
, X
(s)
)
T
=
i
ˆ
?
(r,s)
?
n,i?1
(?
n,i
??
n,i?1
). (207)
We suppose, of course, that each block is large enough for m
(r,s)
n,i
to be always greater than zero.
Under Q
n
, E
Qn
(
ˆ
?
?
n,i?1
[T
?
n,i?1
) = ?
?
n,i?1
, and
Var
Qn
(
ˆ
?
(r,s)
?
n,i?1
[T
?
n,i?1
) =
_
1
m
(r,s)
n,i
_
2
_
_
?
(r,r)
?
n,i?1
?
(s,s)
?
n,i?1
j,k
(t
(r)
n,j
?t
(r)
n,j?1
)(t
(s)
n,k
?t
(s)
n,k?1
)
d¦(t
(r)
n,j?1
, t
(r)
n,j
) ? (t
(s)
n,k?1
, t
(s)
n,k
)¦
2
+ (?
(r,s)
?
n,i?1
)
2
j
1
,j
2
,k
1
,k
2
d¦(t
(r)
n,j
1
?1
, t
(r)
n,j
1
) ? (t
(s)
n,k
2
?1
, t
(s)
n,k
2
)¦d¦(t
(r)
n,j
2
?1
, t
(r)
n,j
2
) ? (t
(s)
n,k
1
?1
, t
(s)
n,k
1
)¦
d¦(t
(r)
n,j
1
?1
, t
(r)
n,j
1
) ? (t
(s)
n,k
1
?1
, t
(s)
n,k
1
)¦d¦(t
(r)
n,j
2
?1
, t
(r)
n,j
2
) ? (t
(s)
n,k
2
?1
, t
(s)
n,k
2
)¦
_
_
,
(208)
The ?rst sum is over the same (j, k) as in (206), and the second sum is over all j
1
, j
2
, k
1
, k
2
satisfying
(202), again for which the denominator in the summand is nonzero.
It is therefore easy to see that subject to conditions on the observation times t
(r)
n,i
and t
(s)
n,i
,
n
1/2
(
¸X
(r)
, X
(s)
)
T
? ¸X
(r)
, X
(s)
)
T
) converges stably (under Q
n
), to a mixed normal distribution
with variance as the limit of
n
i
Var
Qn
(
ˆ
?
(r,s)
?
n,i?1
[T
?
n,i?1
)(?
n,i
??
n,i?1
)
2
. (209)
It is straightforward to see that there is no adjustment from Q
n
to P. A formal asymptotic analysis
would be tedious, and has therefore been omitted. In any case, to estimate the asymptotic variance,
one would use (208)-(209), with
ˆ
?
?
n,i?1
replacing ?
?
n,i?1
in (208).
Remark 5. An important di?erence from the Hayashi-Yoshida estimator is that (206) depends
on the observation times. This is in many instances undesirable, and the choice of estimator will
depend on the degree to which these times are trusted. The Hayashi-Yoshida estimator is also
aesthetically more pleasing. We note, however, that from likelihood considerations, the estimator
The Econometrics of High Frequency Data 60
(206) will have an asymptotic variance which, as the block size tends to in?nity, converges to a
limit which corresponds to the e?cient minimum for constant volatility matrix.
This phenomenon can be best illustrated for a scalar process (so there is no asynchronicity). In
this case, our estimator (206) of ¸X, X)
T
becomes (for block size M ?xed)
¸X, X)
T
=
i
(?
n,i
??
n,i?1
)
1
M
j: ?
n,i?1
0 a.s.
The Econometrics of High Frequency Data 62
The contiguity question is then addressed as follows. Let P
?
n
be the measure from Remark 4
(corresponding to block length M = 1). Recall that
log
dR
n
dP
= log
dR
n
dQ
n
+ log
dQ
n
dP
?
n
+ log
dP
?
n
dP
(214)
De?ne
B
n,j
=
_
?t
n,j+1
_
??
n,i
M
i
_
?1
?1
_
(215)
Theorem 11. (Asymptotic relationship between P
?
n
, Q
n
and R
n
). Assume the conditions of The-
orem 4 in Mykland and Zhang (2007), and let Z
(1)
n
and M
(1)
n
be as in that theorem (see (231) and
(234) in Section 7.3). Assume that the following limits exist:
?
2
=
p
2
lim
n??
j
B
2
n,j
and ?
3
=
p
2
lim
n??
j
log(1 +B
n,j
). (216)
Set
Z
(2)
n
=
1
2
i
t
n,j
?(?
n,i?1
,?
n,i
]
?X
T
t
n,j
((??
T
)
?1
?
n,i?1
)?X
t
n,j
_
?t
?1
n,j+1
?
_
??
n,i
M
i
_
?1
_
, (217)
and let M
(2)
n
be the end point of the martingale part of Z
(2)
n
(see (232) and (234) in Section 7.3 for
the explicit formula). Then, as n ??, (M
(1)
n
, M
(2)
n
) converges stably in law under P
?
to a normal
distribution with mean zero and diagonal variance matrix with diagonal elements ?
1
and ?
2
. Also,
under P
?
,
log
dR
n
dQ
n
= M
(2)
n
+ ?
3
+o
p
(1). (218)
The theorem can be viewed from the angle of contiguity:
Corollary 3. Under regularity conditions, the following statements are equivalent, as n ??:
(i) R
n
is contiguous to P.
(ii) R
n
is contiguous to Q
n
.
(iii) The following relationship holds:
?
3
= ?
1
2
?
2
. (219)
As we shall see, the requirement (219) is a substantial restriction. Corollary 3 says that unlike
the case of Q
n
, inference under R
n
may not give rise to desired results. Part of the probability
mass under Q
n
(and hence P
?
) is not preserved under R
n
.
To understand the requirement (219), note that
p
2
j
log(1 +B
n,j
) = ?
p
4
j
B
2
n,j
+
p
6
j
B
3
n,j
?... (220)
The Econometrics of High Frequency Data 63
since
j
B
n,j
= 0. Hence, (219) will, for example, be satis?ed if max
j
[B
n,j
[ ? 0 as n ? ?. One
such example is
t
n,j
= f(j/n) and f is continuously di?erentiable. (221)
However, (221) will not hold in more general settings, as we shall see from the following examples.
Example 11. (Poisson sampling.) Suppose that the sampling time points follow a Poisson
process with parameter ?. If one conditions on the number of sampling points n, these points behave
like the order statistics of n uniformly distributed random variables (see, for example, Chapter 2.3
in Ross (1996)). Consider the case where M
i
= M for all but (possibly) the last interval in
H
n
. In this case, K
n
is the smallest integer larger than or equal to n/M. Let Y
i
be the M-tuple
(B
j
, ?
i?1
? t
j
< ?
i
).
We now obtain, by passing between the conditional and unconditional, that Y
1
, ..., Y
Kn?1
are iid,
and the distribution can be described by
Y
1
= M(U
(1)
, U
(2)
?U
(1)
, ..., U
(M?1)
?U
(M?2)
, 1 ?U
(M?1)
) ?1, (222)
where U
(1)
, ..., U
(M?1)
is the order statistic of M ? 1 independent uniform random variables on
(0, 1). It follows that
j
B
2
n,j
=
n
M
(M
2
EU
2
(1)
?1) +o
p

j
log(1 +B
n,j
) =
n
M
E log(MU
(1)
) +o
p

since EU
2
(1)
= 2/(M+1)(M+2). Hence, both ?
2
and ?
3
are in?nite. The contiguity between R
n
and
the other probabilities fails. On the other hand all our assumptions up to Section ?? are satis?ed,
and so P, P
?
n
and Q
n
are all contiguous. The AQVT (equation (142)) is given by H(t) = 2t . Also,
if the block size is constant (size M), the ADD is K(t) = (M ?1)t .
Example 12. (Systematic irregularity.) Let be a small positive number, and let ?t
n,j
=
(1 + )T/n for odd j and ?t
n,j
= (1 ? )T/n for even j (with ?t
n,j
= T/n for odd n). Again,
all our assumptions up to Section 6 are satis?ed. The AQVT is given by H(t) = t(1 +
2
). If we
suppose that all M
i
= 2, the ADD becomes K(t) = t. On the other hand, B
n,j
= ±, so that, again,
both ?
2
and ?
3
are in?nite. The contiguity between R
n
and the other probabilities thus fails in the
same radical fashion as in the case of Poisson sampling.
7.2 Irregular Spacing and Subsampling
We here return to a more direct study of the e?ect of irregular spacings. We put ourselves in the
situation from Section 4.3.1, where observation times are independent of the process. As stated
The Econometrics of High Frequency Data 64
in equation (160), the limit law for the realized volatility (for
?
n
_
[X, X]
Gn
t
?[X, X]
t
_
) is mixed
normal with (random) variance
2T
_
t
0
?
4
s
dH
s
, (224)
where H is the asymptotic quadratic variation of time (AQVT). When observations are equidistant,
H
(t) ? 1. From the preceeding section, we also know that if times are on the form (221), the
asymptotic variance is una?ected. It is worth elaborating on this in direct computation. Set
F(t) = lim
n??
1
n
#¦t
n,i+1
? t¦ (225)
this quantity exists, if necessary by going through subsequences (Helly’s Theorem, see, for example,
p. 336 in Billingsley (1995)). Set
u
n,i
= F(t
n,i
). (226)
Asymptotically, the u
n,i
are equispaced:
1
n
#¦u
n,i+1
? t¦ =
1
n
#¦t
n,i+1
? F
(?1)
(t)¦ ?F(F
(?1)
(t)) = t (227)
Inference is invariant to this transformation: Observing the process X
t
at times t
n,i
is the same
as observing the process Y
t
= X
F
(?1)
(t)
at times u
n,i
. If we set | = ¦u
n,j
, j = 0, ..., n¦, then
[X, X]
G
T
= [Y, Y ]
U
T
. Also, in the limit, [X, X]
T
= [Y, Y ]
T
. Finally, the asymptotic distribution the
same in these two cases
If the u
n,i
have AQVT U(t), the mixed normal variance transforms
2T
_
T
0
H
(u)(¸X, X)
t
)
2
dt = 2
_
1
0
U
(u)(¸Y, Y )
t
)
2
dt. (228)
The transformation (226) regularizes spacing. It means that without loss of generality, one can
take T = 1, F
= 1 and and U = H. Also, the transformation (226) regularizes spacing de?ned by
(221), and in this case, U
(t) ? 1.
Example 13. On the other hand, it is clear from Example 11 that it is possible for U
(t) to take
other values than 1. The example shows that for Poisson distributed observation times, H
= U
? 2,
while, indeed F
(t) ? 1/T.
The general situation can be expressed as follows:
Proposition 7. Assume that F exists and is monotonely increasing. Also assume that H exists.
Then U exists. For all s ? t, U(t) ? U(s) ? t ? s. In particular, if U
(t) exists, then U
(t) ? 1.
The following statements are equivalent:
(i) U(1) = 1
(ii) U
? 1
(iii)
n
j=0
_
u
n,j+1
?u
n,j
?
1
n
_
2
= o
p
(n
?1
)
The Econometrics of High Frequency Data 65
Proof of Proposition 7. The ?rst statement uses a standard property of the variance: if ?t
n,j+1
=
t
n,j+1
?t
n,j
, and ?
n
= T/n, then
n
T
t
n,j+1
?t
(?t
n,j+1
)
2
=
n
T
t
n,j+1
?t
(?t
n,j+1
??
n
)
2
+
n
T
#¦t
n,i+1
? t¦(?
n
)
2
?
n
T
#¦t
n,i+1
? t¦(?
n
)
2
By taking limits as n ?? under F
(t) ? 1/T, we get that H(t) ?H(s) ? t ?s. In particular, the
same will be true for U.
The equivalence between (i) and (iii) follows from the proof of Lemma 2 (p. 1029) in Zhang
(2006). (The original lemma uses slightly di?erent assumptions).
The implication of the proposition is that under the scenario U(1) = 1, observation times are
“almost” equidistant. In particular, subsampling does not change the structure of the spacings.
On the other hand, when U(1) > 1, there is scope for subsampling to regularize the times.
Example 14. Suppose that the times are Poisson distributed. Instead of picking every observation,
we now pick every M’th observation. By the same methods as in Example 11, we obtain that
U(t) =
M + 1
M
t. (229)
Hence the sparser the subsampling, the more regular the times will be. This is an additional feature
of subsampling that remains to be exploited.
7.3 Proof of Theorem 11
We begin by describing the relationship between R
n
and P
?
n
. In analogy with Proposition 2 of
Mykland and Zhang (2007), we obtain that
Lemma 2.
log
dR
n
dP
?
n
(U
t
0
, ..., U
t
n,j
, ..., U
tn,n
)
=
i
?
i?1
?t
j