Thesis on Technical Analysis of Financial Markets

Description
Understanding the working mechanisms of financial markets is crucial to ensure the stability of domestic and global economies. Market participants vary highly with respect to their financial motivation, capital endowment, and their trading activity is often subject to different regulation.

Fundamental and technical analysis of
?nancial markets
PhD Thesis
Bal´azs Torma
Supervisor: L´aszl´o Gerencs´er, DSc
Faculty of Informatics,
E¨otv¨os Lor´and University
(ELTE IK)
Doctoral School in Informatics,
Foundations and Methods in Informatics PhD Program,
Chairman: Prof. J´anos Demetrovics, Member of MTA
Computer and Automation Research Institute,
Hungarian Academy of Sciences
(MTA SZTAKI)
Budapest, Hungary, 2010
ii
Declaration
Herewith I con?rm that all of the research described in this dissertation is my own original
work and expressed in my own words. Any use made within it of works of other authors in
any form, e.g., ideas, ?gures, text, tables, are properly indicated through the application of
citations and references. I also declare that no part of the dissertation has been submitted
for any other degree - either from the E¨otv¨os Lor´and University or another institution.
Bal´azs Torma
Budapest, May 2010
Contents
1 Introduction 1
2 Change detection in fundamental models using technical models 4
2.1 Context and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 The multi-agent stock market model . . . . . . . . . . . . . . . . . . . . . 8
2.3 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Fitting a GARCH model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 The GARCH-based change detection algorithm . . . . . . . . . . . . . . . 23
2.6 Numerical performance evaluation . . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Interpretation of an alarm . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.8 Empirical ?ndings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 E?cient o?-line model identi?cation 38
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 The Descent Direction Method with Cutting Planes (DDMCP) . . . . . . . 43
3.3 Theoretical aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.1 Convergence of descent methods . . . . . . . . . . . . . . . . . . . . 47
3.3.2 Remark on the convergence of DDMCP . . . . . . . . . . . . . . . . 49
3.4 Numerical performance evaluation . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.1 The convex case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.2 A nonconvex case: nonlinear least squares . . . . . . . . . . . . . . 53
3.4.3 Maximum likelihood estimation of GARCH parameters . . . . . . . 55
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4 Modelling information arrival processes 66
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
iii
iv CONTENTS
4.1.1 Context and motivation . . . . . . . . . . . . . . . . . . . . . . . . 66
4.1.2 Point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.3 Hawkes processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Simulation of Hawkes processes . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Identi?cation of Hawkes models . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4 The Fisher information matrix . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5 Regression analysis based on high-frequency data 92
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.2 The EM-method for estimating ?
?
. . . . . . . . . . . . . . . . . . . . . . 94
5.3 A randomized EM-method . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4 A real-time recursive randomized EM-method . . . . . . . . . . . . . . . . 101
5.5 Estimating the variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.6 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
A Transaction requests 109
B The arti?cial stock market simulator 111
Short summary 113
R¨ovid ¨osszefoglal´o (in hungarian) 114
Bibliography 115
Acknowledgment
First of all, I would like to express my thanks to my supervisor, Prof. L´aszl´o Gerencs´er,
for introducing me into the culture of mathematics and for his constant trust and guidance
during my studies.
I am very thankful to Prof. Ferenc Friedler for helping me generously to switch back
from industry into academics. In general, I feel deeply indebted to the whole Hungarian
academic community for the inspiring, open and supportive educational and research envi-
ronment. In particular, I am grateful to my host institute, the Computer and Automation
Research Institute (MTA SZTAKI), and I thank my colleagues in Stochastic Systems Re-
search Group, namely, Dr. Vilmos Prokaj and Dr. Mikl´os R´asonyi, for several fruitful and
motivating discussions and numerous helpful comments on my work. I am very grateful to
Dr. Bogl´ arka G.-T´oth and Dr. Vilmos Prokaj for their collaboration in research.
I am also grateful for the PhD scholarship that I received for one year from the Faculty
of Informatics (IK) of the E¨otv¨os Lor´and University (ELTE) and for the ?nancial support
of the Di?lton Arc Kft. lead by S´ andor Manno. I greatly appreciate the computational
resources provided by the Data Mining and Web Search Group at MTA SZTAKI headed
by Dr. Andr´as Bencz´ ur.
Last but not least, I am very thankful for the continuous and vital support and care of
my wife, Emese
´
Agnes Sz? ucs.
v
Chapter 1
Introduction
”Everything should be made as simple as possible, but not simpler.”
Albert Einstein
Understanding the working mechanisms of ?nancial markets is crucial to ensure the
stability of domestic and global economies. Market participants vary highly with respect
to their ?nancial motivation, capital endowment, and their trading activity is often subject
to di?erent regulation. Agent-based computational ?nance (ACF) models provide a means
to account for this heterogeneity, inasmuch they incorporate detailed descriptions of agent
behaviors and economic processes.
An important application of ACF models is to investigate unobservable economic pro-
cesses based on observed data. The di?culties in applying ACF models for inference about
economic variables are due to their complexity. In order to make inference feasible, ana-
lytically tractable agent-based models have been developed recently. The price to pay for
tractability is that these models are very abstract, include highly aggregated variables or
impose unrealistic conditions - in short, they neglect important characteristics of markets.
Consequently, the economic interpretability of empirical results that we get by ?tting these
simple models to data is limited.
In this thesis we propose a detailed ACF model and develop statistical algorithms
based on analytically tractable technical (econometric) models. Furthermore, we apply
technical models for analysing some of the key economic factors entering the ACF model.
The main mathematical tool of the statistical analysis is model identi?cation by Maximum
Likelihood Estimation. A novel local optimization algorithm is developed for e?cient
o?-line identi?cation. We also develop real-time recursive identi?cation algorithms and
change-point detection algorithms for some important technical models.
1
2 CHAPTER 1. INTRODUCTION
The thesis is organized as follows. The objective of Chapter 2 is to demonstrate the
application of technical models of stock prices on detecting abrupt changes in the dynam-
ics of real price processes. Moreover, we examine what latent economic factors entering
?nancial markets can trigger the change. To elaborate this approach we ?rst propose a
fundamental model that extends well-established concepts of agent-based computational
?nance with a novel element for information arrival processes. Announcements and news
about a company trigger analysts to pay more attention to that company, and change its
valuation. This, in turn, results in more analysts’ announcements about the given com-
pany. We model these feedback e?ects of market news by a discrete time Hawkes process.
We show by numerical experiments that in contrast to classical ACF models, stylized facts
emerge even by constant market fractions of chartists and fundamentalists as a conse-
quence of chartists trading. We further validate the ACF model using technical models.
In particular, a qualitative relationship between the market structure and the best-?tting
GARCH(1,1) (General Autoregressive Heteroscedastisity) model is established, which mo-
tivates us to apply GARCH(1,1) for indirect statistical inference on the market structure.
The use of GARCH models for detecting changes in the market-structure is justi?ed by a
well-known principle, stating that change detection is feasible using misspeci?ed models,
see discussion in Section 2.1. A real-time change detection method for GARCH processes
is presented based on the MDL (Minimum Description Length) approach to modelling.
Having the above mentioned relationship between the GARCH(1,1) and the ACF models,
we provide an economic interpretation of the GARCH-based change alarms. The perfor-
mance of the proposed algorithm is evaluated on simulated data. Change-alarms based on
real data are reported.
In order to examine the relationship between the market structure and GARCH(1,1)
models, we ?t a GARCH model on the time series generated by the ACF model by quasi
Maximum Likelihood Estimation, ?rst in an o?-line manner. We get a non-linear optimiza-
tion problem, which is well-known to be ill-conditioned in the GARCH case. Motivated
by this problem, in Chapter 3 we develop a new hybrid optimization method which com-
bines the advantages of descent methods and cutting plane approaches. The new method
gets fast to near-optimal region by using cutting planes and preserves the good conver-
gence properties of descent methods near the optimum. The method is tested on convex
functions, least squares problems and on GARCH parameter estimation by comparing
its performance to well-known methods. Numerical experiments show that the proposed
method is very e?cient on all the examined problem types and performs in average much
better than the benchmark methods. We also present a technique with which the dimen-
3
sion of the GARCH ?tting problem can be reduced. Overall, we get a GARCH estimation
algorithm that is an order of magnitude faster than the related MATLAB procedure.
We can model self-exciting e?ects arising on ?nancial markets with Hawkes processes.
A Hawkes process is a point process whose intensity is de?ned via a feedback mechanism
where the input is the past of the point process itself. In the standard Hawkes process,
when an event occurs, e.g. some news appears on the market, the intensity increases
by a constant amount. After the event the intensity is reverting to a minimum value
exponentially fast. Modelling news arrival processes can enhance the prediction of price
volatility, since, as shown by several empirical studies, the volatility is positively correlated
with the intensity of news arrival. Besides modelling information arrival processes, Hawkes
processes can be applied to capture the self-exciting property of credit default processes:
insolvency of a given company can increase the insolvency probability of another company.
In Chapter 4 we propose Hawkes processes in which the feedback path is de?ned by a ?nite
dimensional linear system. This model class allows for representing multiple event sources,
e.g. several analysts. We propose algorithms for the simulation and real-time estimation
of this type of Hawkes processes. Moreover, we investigate the Fisher information matrix
of the estimation problem numerically and analytically. In particular, we show that some
parts of the diagonal of the asymptotic Fisher information matrix in case of the one-
dimensional feedback go to in?nity, other parts of the diagonal go to zero as the parameters
approach the boundary of the stability domain. As a ?rst step we calculate the limit
distribution of the appropriately rescaled intensity process.
Since ?nancial prices are usually quantized data, we may get spurious results from
estimation methods neglecting quantization e?ects. In Chapter 5 we demonstrate this
phenomenon on a toy example, namely, on linear regression. We develop a real-time
method for estimating the parameters of a linear regression from quantized observations,
when the regressor is ?nite-valued. The algorithm applies Expectation Maximization (EM)
and Markov Chain Monte Carlo techniques. An example application is to regress high-
frequency price data on order-book-level supply and demand quantities.
Chapter 2
Change detection in fundamental
models using technical models
2.1 Context and motivation
There are basically two competing approaches in modelling stock prices: technical modelling
and fundamental modelling. Technical or econometric models capture statistical phenom-
ena observed directly on stock prices, while fundamental models consist of mathematical
descriptions of behaviors of economic agents a?ecting the stock price. Fundamental models
provide a more detailed description of the price generating mechanism than technical mod-
els. On the other hand, technical models are mathematically more tractable. An excellent
survey of technical models is given in Taylor (2007), while a similarly outstanding survey for
fundamental models is provided in LeBaron (2000); Chen et al. (2009). These works coin
the corresponding terminologies as ?nancial econometrics and agent-based computational
?nance (ACF), respectively.
A drawback of complex fundamental models is that they are not tractable by standard
mathematical techniques developed in the literature for system-identi?cation, see Ljung
and S¨ oderstr¨om (1983); Benveniste et al. (1990); Spall (2003). The reason for this is
the exogenous noise driving the price process can not be recovered even if the system
dynamics would be known. This situation is analogous to linear stochastic systems with a
large number of random sources. In the latter case Kalman ?ltering leads to an innovation
form that is already invertible. In the case of an ACF model we would require non-linear
?ltering leading to an in?nite dimensional problem, which is beyond the scope of Ljung and
S¨ oderstr¨om (1983); Benveniste et al. (1990); Spall (2003). In short, the identi?cation of
4
2.1. CONTEXT AND MOTIVATION 5
individual key economic factors entering the model is not possible by these standard tools,
and we are unaware of any alternative approach of similar sophistication and generality.
See also the discussion in Winker et al. (2007) on this.
In spite of analytical intractability, statistical inference on complex fundamental mod-
els is possible within limited scope. A basic problem that is tractable in spite of gross
uncertainties is the detection of changes in the market dynamics. In fact, change detec-
tion methods are known to be robust against modelling errors, in other words, detecting
changes is possible using mis-speci?ed models, see e.g. Basseville and Nikiforov (1993);
Campillo et al. (2000). Obviously, the quality of the detection method depends on the
quality of the model class used in modelling the data. A natural model class to be used
for detecting changes in an ACF model, or in a real price process is a class of econometric
models.
In this chapter we demonstrate the feasibility of the above program. The fundamental
model to be used is a signi?cant modi?cation of the fundamental models proposed in
the pioneering works of Kirman (1995), Lux (1998) and Brock and Hommes (1998), see
Hommes (2006) for a survey including these models. In these ACF models the market is
composed of two types of agents: chartists and fundamentalists. Chartists predict future
stock prices by extrapolating current trend, while the belief of fundamentalists about future
stock prices is a?ected by the information they receive on the company at hand. Assuming
a ?xed behavior pattern for individual agents of a given type, modulo random choices,
the market structure is then de?ned as the distribution of relative wealth among the two
groups of agents. In Kirman (1995) and Lux (1998) market fractions of chartists and
fundamentalists evolve endogenously driven by past pro?tability of these two strategies.
Thus, these models suggest that future expectations depend only on pro?ts realized in
the past. However, reduction of trading on fundamentals is possible due to decreased
predictability, triggered by the burst of an economic bubble, say. Or, increasing trading
on fundamentals can be triggered by an announcement about company acquisition, say.
Hence in our model we allow future expectations to depend on exogenous e?ects as well
and make the market fractions of chartists and fundamentalist a model parameter.
In general, exogenous uncertainty of fundamentals is caused by market news arriving at
random times. Since the reaction to news generates further news, we model the information
arrival process as a self-exciting point process, more speci?cally, a discrete time version of
well-known Hawkes’ process, see Hawkes (1971a).
We have veri?ed by extensive numerical experiments that a number of stylized facts
known of real price processes, such as volatility clustering and fat tails of return distri-
6 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
butions are reproduced by our model. In the models of Kirman (1995) and Lux (1998),
volatility clustering and fat tails of returns distributions are attributed to varying market
fractions in time, as in periods dominated by chartists the volatility of returns is higher
than in periods when the ratio of fundamentalists is high. In contrast, in our model volatil-
ity clustering and fat tails emerge even by constant market fractions, as a consequence of
chartist trading and the dynamical properties of the news arrival process.
For the statistical analysis of our ACF model we use the most widely known technical
model, the class of Generalized Autoregressive Conditional Heteroscedasticity, or GARCH
models developed in Engle (1982); Bollerslev (1986). Popular GARCH variants are the
Markov switching GARCH (MS-GARCH), see Bauwens et al. (2007) or threshold GARCH
(TGARCH), see Zakoian (1994), among other things. In our analysis we ?t a GARCH(1,1)
model o?-line using quasi maximum likelihood to a data-set of 10000 generated by the
ACF model. We established a monotonic relationship between the market fraction and the
GARCH coe?cients.
An earlier similar attempt for explaining GARCH parameters is due to Diks and van
der Weide. They develop an agent-based model in which an a-synchronous update of
agent’s beliefs yield a GARCH dynamics of returns, see Diks and van der Weide (2005).
This so called Continuous Belief System (CBS) is analytically tractable but still very
abstract, see discussion in Diks and van der Weide (2005) in section 4. An example of
major simpli?cations which is not discussed in the paper is that there is no distinction
between chartists and fundamentalists. We follow a di?erent route, inasmuch we include
much more details that is known on the dynamics of stock markets in our model, and use
a technical model for its analysis.
Due to the established relationship between the market fraction and the GARCH co-
e?cients, GARCH models become a natural candidate for change detection in the market
structure. In view of real-time needs we present a real-time change detection method for
GARCH processes to detect abrupt changes of its dynamics, along the lines of Gerencs´er
and Baikovicius (1991, 1992) using a Minimum Description Length, or MDL approach,
see Rissanen (1989). This requires a reliable implementation and testing of a recently
developed recursive quasi-maximum-likelihood method, see Gerencs´er et al. (2010). See
also Aknouche and Guerbyenne (2006) for a recursive least squares based identi?cation
method for GARCH models. See Berkes et al. (2004) for a change detection algorithm
using quasi-likelihood scores.
In order to outline the feasibility of direct statistical inference using analytically tractable
agent-based models, we review some of the attempts in recent works addressing identi?ca-
2.1. CONTEXT AND MOTIVATION 7
tion of some models of this class, along with describing major model assumptions. Common
to these models is that they implement a low degree of agent heterogeneity (as opposed
to our ACF model): heterogeneity among agents is re?ected only by two trader types, the
concept of agents with di?erent parameterizations within a given type is not included. An
additional simpli?cation is that the various agent types are characterized by a common
risk aversion factor. All these models rely on the concepts presented in the seminal paper
of Brock and Hommes, see Brock and Hommes (1998), except for the model in Alfarano
et al. (2005). In Alfarano et al. (2005), a simpli?ed version of the Kirman model is devel-
oped and a closed form solution for the distribution of returns is derived. The estimated
parameters in the model capture the tendency of traders to switch between the two groups,
fundamentalists and noise traders (noise traders are agents whose strategies do not have
any systematic component). In Amilon (2008) the model of De Grauwe and Grimaldi
(2004) is ?tted to data and, among others, the dynamics of the market fraction of chartists
and fundamentalists is estimated. Amilo reports that the ?t is generally quite poor. In
addition, the estimated market fraction in Figure 3(b) in Amilon (2008) indicates that for
long periods of time, for approximately a year, only chartists are active on the market, a
?nding which is at least questionable. The model presented in the following two papers,
see Boswijk et al. (2007), Kozhan and Salmon (2009), impose the additional condition that
agents have homogeneous expectations about the conditional variance of future returns.
Both models also assume that the fundamental value is known to the traders, thus, it is
approximated via some observable economic variables (proxies) in the estimation proce-
dure. Boswijk et al. (2007) estimate the mean reversion factor of fundamentalists, the
trend extrapolation factor of chartists and market fractions of the two agent types. They
found a signi?cant presence of both agent types on the S&P500 stock market. The central
question in Kozhan and Salmon (2009) is whether or not agents in the foreign exchange
market are uncertainty averse. Chartists are found to be uncertainty averse but funda-
mentalist are found to be uncertainty neutral, a result which contradicts several empirical
studies on uncertainty aversion, see e.g. Mangelsdor? and Weber (1994); Wakker (2001).
An interesting result in Kozhan and Salmon (2009) is the statistical evidence showing that
the model is capable of predicting the sign of one-step-ahead returns.
The chapter is organized as follows. First the key concepts used in our ACF model are
described. The model is validated by showing that it reproduces a number of important
stylized facts exhibited by real price processes. In Section 2.4 we present experimental
results establishing a qualitative relationships between the market structure and the pa-
rameters of the GARCH model that ?ts best the data. In Section 2.5 we describe a real-time
8 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
change-point detection algorithm for GARCH models, its performance is evaluated in Sec-
tion 2.6. In Section 2.7 we discuss a possible economic interpretation of the change-alarms,
and comment on the results of the change detection algorithm when applied on real market
data in Section 2.8. We conclude with a brief summary of results.
2.2 The multi-agent stock market model
In our ACF model agents are assumed to trade the stocks of only one company. In each
trading period t, say day or hour each agent applies his belief ˆ p
t
to determine his transaction
request (d
t
, b
t
), the requested quantity and the corresponding limit price, where d
t
< 0
if the agent wants to sell. Then, the stock market exchange matches the transaction
requests and determines the new stock price p
t
. We apply a pricing algorithm that is
commonly used in the closing trading session at real exchanges, such as the NASDAQ, see
www.nasdaqtrader.com: for a given price p calculate the total number of transactions that
would take place at this price, and choose the price at which trading volume is maximized,
see Gerencs´er and M´aty´as (2007b) for a formal description of the algorithm.
In any period t, agents develop their belief about next period’s market price. The
belief ˆ p
i
t
of agent i can be interpreted as a one step ahead predictor. E.g., ˆ p
t
could be
obtained by using a regression line ?tted to past price data. In the following we outline
the applied order placing rules, see Appendix A for details. Having ˆ p
t
, the agent proceeds
with calculating the actual orders (d
t
, b
t
) in a way that his wealth increases under the
assumption of exact prediction (p
t+1
= ˆ p
t
). It is easy to see that this can be achieved by
placing a limit buy order one cent below ˆ p and placing simultaneously a limit sell order one
cent above ˆ p. Then, if ˆ p > p, only the buy order executes, and if ˆ p < p, only the sell order
executes. Thus, the transaction request of agent i trading one share consists of orders
(d, b)
i,1
t
=
_
1, ˆ p
i
t
?0.01
_
,
(d, b)
i,2
t
=
_
?1, ˆ p
i
t
+ 0.01
_
.
Numerical simulations have shown that the applied ordering rules eventually result in a
market price which can be well approximated by the average of the predictions:
p
t
?
1
N
N

i=1
ˆ p
i
t
.
2.2. THE MULTI-AGENT STOCK MARKET MODEL 9
This ?nding is in line with considerations in Diks and van der Weide (2005), see eq. (5) in
that paper.
A schematic view of the agent model is depicted in Figure 2.1. The superscript C stands
for chartists, F for fundamentalists and E for the environment, see further notations below.
Next we describe the agent types and their predictor models.
Exchg
p
-
Chart’s
(d, b)
C

?
?
C
Fund’s
(d, b)
F

E
n
v
i
r
o
n
m
t

n, ?
E

v
-
v
?
?
F
Figure 2.1: The market model
Chartists. Chartists or technical traders determine their belief based on past market
prices, without incorporating any economic information about the company. According
to the survey of practitioners’ trading strategies Kaufman (1998), we can identify two
behavior groups. Trend followers (or momentum traders) believe that market prices will
continue to move in the direction of the current price dynamics. Traders typically detect
the trend by comparing long-term and short-term average market prices, see Chiarella
et al. (2006). As opposed to trend following, the mean reverting (or contrarian) behavior
is based on the opinion that prices oscillate around some long-term average price.
Let us now formalize the behavior model of chartists, which leans on the concepts
summarized in Chen et al. (2009). The number of chartist is denoted by N
C
. Let [E
q
(x)]
t
denote the exponential averaging operator de?ned as
[E
q
(x)]
0
= x
0
, (2.1)
[E
q
(x)]
t
= (1 ?q) [E
q
(x)]
t?1
+qx
t
, t > 0 (2.2)
for the process x
t
? R with forgetting factor 0 < q < 1.
10 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
De?ne the short-term and long-term average price calculated by chartist i with forget-
ting factor 0 < ?
i
< 0.5 and 2?
i
as
¯ p
i
t
= [E
?
i (p)]
t
,
¯
¯ p
i
t
= [E
2?
i (p)]
t
.
Each chartist uses a di?erent ? to ?lter p
t
, thus, we say that ? is an agent speci?c parameter.
Chartist i de?nes the trend of the price process as
¯ r
i
t
= log
_
¯ p
i
t
¯
¯ p
i
t
_
. (2.3)
The processes ¯ p
t
and
¯
¯ p
t
tracks p
t
with a delay, the tracking lag is controlled by the forgetting
factor. If ? is small, ¯ p
t
and
¯
¯ p
t
track p
t
slowly, in which case the speed of change of the
dynamics of ¯ r
t
is small.
Based on the heuristics above, trend followers denoted by superscript T and mean
reverters denoted by superscript M determine their belief by extrapolating past trend as
ˆ p
T,i
t
= p
t?1
exp
_
?
i
¯ r
i
t?1
_
, (2.4)
ˆ p
M,i
t
= p
t?1
exp
_
??
i
¯ r
i
t?1
_
, (2.5)
with the agent speci?c parameter ?
i
> 0 expressing the degree of trend extrapolation.
Adaptive beliefs. According to fundamental ?ndings of behavioral ?nance, real traders
often do not feel con?dent with their currently executed trading strategy, they usually
switch strategies from time to time caused by behavioral uncertainty. A strategy switch can
be triggered by some information they receive from the media or other market participants.
There are in fact ranking web sites, e.g. www.collective2.com, where traders publish some
basic information about their strategies. They see for example the pro?tability of each
other’s strategy and also some general information about its working mechanism. Thus,
we assume that chartists can collect information about the average pro?tability of trend
following and mean reverting, denoted by ¯ v
T
t
and ¯ v
M
t
, respectively. The aggregated ?tness
measures are determined in the environment based on individual pro?ts v
T
t
and v
M
t
chartist
achieve with either behavior types:
¯ v
T
t
=
1
N
C
N
C

i=1
v
T,i
t
, ¯ v
M
t
=
1
N
C
N
C

i=1
v
M,i
t
. (2.6)
See Appendix A for a detailed description of the calculation of the individual pro?ts.
2.2. THE MULTI-AGENT STOCK MARKET MODEL 11
Let s
i
t
indicate whether a given chartist i acts as trend follower (s
i
t
= 1) or as mean
reverter (s
i
t
= ?1). A chartist chooses between the two behaviors following the concepts
of the well-known binary choice model introduced by Brock and Hommes (1998) using ¯ v
T
and ¯ v
M
as ?tness and ?
i
> 0 intensity of choice parameter that re?ects how in?uential is
the overall pro?tability for the behavior of the given agent:
s
C
t
=
_
_
_
1, w. p. e
?¯ v
T
t
/
_
e
?¯ v
T
t
+e
?¯ v
M
t
_
?1, w. p. e
?¯ v
M
t
/
_
e
?¯ v
T
t
+e
?¯ v
M
t
_
.
(2.7)
(Note that in contrast to our model, in Brock and Hommes (1998) the choice is made
between fundamentalists and chartists.) Thus, a feedback system arises in which the
chartist agent applies a given behavior type with higher probability if it has been more
pro?table in the recent past for the whole chartist community on average. See Cesa-Bianchi
and Lugosi (2006) for a comprehensive treatise of learning using exponentially weighted
averages.
Using s
t
we can de?ne the belief of a given chartist from (2.4)-(2.5) as
ˆ p
C,i
t
= p
t?1
exp
_
s
i
t
· ?
i
· ¯ r
i
t?1
_
. (2.8)
Fundamentalists. Fundamentalists develop their belief based on company information,
they acquire each day n
t
? N pieces of public news. The same news are available to all
fundamentalist agents. Fundamentalists reinterpret and reevaluate the news received so
far each day, in a way that less weight is put on news received earlier in the past. We
denote the ?ltered, normalized n
t
by
¯ n
?
t
=
[E
?
(n)]
t
E(n
t
)
, (2.9)
where 0 < ? < 1 is the factor of the exponential forgetting and E(n
t
) denotes the expected
value of n
t
. Note that n
t
can be observed on real markets, see e.g. ?nance.yahoo.com for
a given ?rm related ?nancial headlines.
The news evaluation results in the opinion o
t
, expressing whether the company is in
a better (o
t
> 0) or worse (o
t
< 0) shape than the market price p
t?1
indicates, from the
agent’s point of view. The opinion of a given fundamentalist agent has a common and
an individual component. The common component re?ects some aggregated view of the
market environment. The private component is agent speci?c and independent from the
12 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
common opinion and the private opinions of other fundamentalists. Our opinion model
captures the economic assumption that the variability of investor opinion is positively
correlated to the number of news appearing on the market, see Mitchell and Mulherin
(1994), Kalev et al. (2004) for empirical evidence. Thus, the common opinion can be
formulated as
o
E
t
= µ
t
+?¯ n
(?
E
)
t
?
E
t
, with ?
E
t
? N(0, 1) i. i. d., (2.10)
µ
t
= ?0.0001 log
_
p
t?1
p
0
_
, (2.11)
where µ
t
expectation of news takes care of mean reversion to the constant fundamental
value p
0
. The private opinion o
F,i
for fundamentalist i is calculated similarly with an agent
speci?c forgetting factor ?
F,i
and i. i. d. noise ?
F,i
t
, which is independent from ?
E
as well.
For simplicity, we de?ne the unconditional variance of the common and private opinion
denoted by ?
2
to be equal. We can interpret ?
2
as a measure of the uncertainty about
fundamentals, expressing the variability of future economic outcomes.
We can ?nally specify the fundamentalist belief model as
ˆ p
F,i
t
= p
t?1
exp
_
o
F,i
t
+o
E
t
2
_
. (2.12)
Here the private and public opinions are weighted equally for simplicity.
The information arrival process n
t
. The news process n
t
evolves in the environment
driven by analyst activity. Our novel model of n
t
captures herding among analysts, result-
ing in a self-exciting phenomenon, see e.g. Welch (2000): announcements and news about
a company trigger analysts to follow that company, and thus increase its analyst coverage.
This, in turn, results in further analysts’ announcements about the given company. Let us
now re?ne and formalize this heuristics using the idea of Hawkes’s continuous time point
processes, see Hawkes (1971a). Let x > 0 denote analysts’ coverage, a quantity determin-
ing the intensity of news generation. We assume that analyst coverage would erode from
period to period by factor (1 ?a) with 0 < a < 1 if no news appeared on the market. On
the other hand, the random number of news n
t
increase next period’s analyst coverage by
2.2. THE MULTI-AGENT STOCK MARKET MODEL 13
factor b > 0. Thus, our number of news model can be formulated as
x
t
= ax
t?1
+bn
t?1
, (2.13)
n
t
? Poisson(µ
t
), (2.14)
µ
t
= m+x
t
(2.15)
where m > 0 is the minimum intensity. Figure 2.2 depicts a typical trajectory of n
t
and
x
t
, we can see some clustering of volatility on the graph of n
t
.
0 20 40 60 80 100
0
1
2
x
t
0 20 40 60 80 100
0
5
n
t
t
Figure 2.2: A typical trajectory of the new process n
t
and analyst coverage x
t
.
We can easily examine the stability properties of the discrete Hawkes process. Taking
expectations on (2.14)-(2.15) yields
E(x
t+1
) = (a +b)E(x
t
) +bm, (2.16)
from which we see that the process is stable if a + b < 1, the expected value explodes
linearly if a +b = 1 and exponentially if a +b > 1. Under stationarity assumptions we can
calculate E(n
t
) from (2.14)-(2.15), (2.16):
E(n
t
) = m +
mb
1 ?a ?b
.
In our simulations we initialize the state process as x
0
= mb/(1 ?a ?b).
Summary of design di?erences. The essential di?erences between our and classical ACF
models are that in our model 1. the behaviors exhibit a higher degree of heterogeneity;
2. the market fraction of chartists and fundamentalists is an (exogenous) parameter; 3.
the news arrival process exhibits serial correlations. Let us stress at this point that in
accordance with research goals of this chapter we have not optimized the design of the ACF
14 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
model for simplicity, but we rather included details, that in our view represent essential
characteristics of real markets. In the following, we examine the model by statistical means.
2.3 Model validation
We validate our ACF model by showing experimentally that it reproduces some important
stylized facts one can observe in real prices. In addition we analyse the impact of the
variation of market fractions on simulated returns with respect to stylized facts. We de?ne
returns as
r
t
= log
_
p
t
p
t?1
_
.
Chartist and fundamentalist market fractions are de?ned as
w
C
=
N
C
N
and w
F
=
N
F
N
, (2.17)
respectively. All agents are assumed to have the same initial endowment, thus relative
weights can be seen as the initial distribution of wealth among chartists and fundamental-
ists. Put
w =
w
C
w
N
.
The main source of agent heterogeneity is the variety of parameters specifying agent’s
behavior. For simulation purposes, for each agent speci?c parameter we specify intervals
from which we draw random samples uniformly for agents of a given type. Table 2.1 shows
the parameter values and intervals applied in the simulations.
Notation Meaning Value
Chart’s
? Price ?ltering [0.02, 0.03]
? Trend extrapolation [0, 5]
? Intensity of choice [1, 5]
Fund’s
?
F
Private news forgetting factor [0.01, 0.03]
?
E
Common news forgetting factor 0.01
News
a Coverage erosion 0.7
b Ampli?er factor 0.2
m Min. intensity 0.2
Economic
w Agent type fractions 1
p
0
Fundamental value 100
factors ?
2
Uncertainty about fundamentals 0.01
Table 2.1: Baseline parameter values.
2.3. MODEL VALIDATION 15
The statistics related to the stylized facts are calculated on several simulated price
trajectories of length 10
5
. In Table 2.2 we report results of hypothesis testing, each cell
in the table shows the proportion of rejected null-hypotheses. The name of the tests are
indicated in the column header. Table 2.3-2.4 report the averages of some statistics and
their standard errors in italics. The ?rst four rows correspond to increasing market fraction
settings, other parameters are kept ?x on the baseline values; the reported values here are
calculated from 20 price series. We apply the following perturbation scheme to perform a
sensitivity check on various parameter settings. The ?fth row for w ? (0.5, 1.5) contains
results based on 100 return series, each generated with parameters perturbed uniformly
randomly around the baseline values:
q
?
? q (1 +?u) , u ? U[?1, 1], (2.18)
where q is the baseline parameter and q
?
is the applied parameter, ? > 0. We applied
? = 0.2 for parameter ?, ? = 0.05 for parameters a and b, ? = 0.5 for the rest of the
parameters. In case of intervals, this perturbation rule is applied on the endpoints of the
interval.
For comparison purposes, in the last row (NQ100) separated by double line we report
statistics based on 1 minute intraday returns of the NASDAQ 100 index preprocessed
analogously as described in Section 2.8. The statistics are calculated on 10 di?erent returns
series each spanning a week, from the period 6th December 2009 until 25th February 2010.
Note that all ACF model parameters are kept ?xed over time in the simulations. This
is an admissible simpli?cation considering that agent behaviors and the economic environ-
ment are stable on the short term, a week or month, say, as major regular economic and
cash ?ow related reports are announced quarterly. Prices spanning even such short time
frames exhibit the essential stylized facts discussed below, see the statistical analysis of
intraday data in Cont (2001); Taylor (2007). Hence it is meaningful to compare simulated
prices to real prices with respect to stylized facts, even without more sophisticated mod-
els for the dynamics of p
0
, ?
2
, w. See Alfarano (2006) for an excellent summary of the
statistics we apply below for the comparison.
Unit root. The column ADF in Table 2.2 shows that the augmented Dickey-Fuller test
typically can not reject the null-hypothesis of unit roots in log(p
t
) at 5% signi?cance level.
In the column AR(1) in Table 2.3 we see that the root of a ?tted zero mean AR(1) model
is very close to unity, which is a property that real data exhibit as well.
Autocorrelations in returns. Column Q(1) in Table 2.2 reports results of the Box-Ljung
16 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
w ADF Q(1) ARCH(2) ARCH(20) GPH
(r)
GPH
(r
2
)
0 0.05 0.00 0.85 1.00 0.05 1.00
0.5 0.05 0.05 1.00 1.00 0.05 1.00
1 0.05 0.20 1.00 1.00 0.05 1.00
2 0.00 0.40 1.00 1.00 0.20 1.00
(0.5,1.5) 0.07 0.13 0.98 1.00 0.07 0.97
NQ100 0.10 0.20 1.00 1.00 0.00 1.00
Table 2.2: Proportion of rejected null-hypotheses.
test for checking the null-hypothesis of absence of autocorrelations at lag 1, which typically
can not be rejected at 5% signi?cance level. Note that a well-known property of the Box-
Ljung test is that it rejects the null-hypothesis for heavy tailed distributions more often
than the signi?cance level re?ects. The type I error accumulates with higher lags. In Table
2.3
ˆ
R
(r)
(l) denotes the sample autocorrelation of returns at lag l, we calculated very small
values. Empirical studies report both statistically insigni?cant and signi?cant lag 1 sample
autocorrelation in returns. The magnitude in the signi?cant cases is approximately 0.1 for
daily data and ?0.1 for intraday data.
The unit root tests and the measured autocorrelations in returns give statistical evi-
dence for near random walk characteristics of simulated prices. This happens despite model
components which introduce serial dependences in returns, such as mean reverting to con-
stant fundamentals and trend extrapolation. This e?ect can be explained as the random
model elements conceal the dependencies, e.g. the e?ect of trend extrapolation is mitigated
by s
C
t
in (2.8), because it has an expectation close to zero and it is almost independent
from past returns. The value of the intensity of choice parameter ? has a major in?uence
on the autocorrelations in returns, as it controls the weight of trend follower and mean
reverter chartists. A high ? can result in a shift of the weight towards trend followers in
a long, highly trending period, who then increase the autocorrelation of returns as E(s
C
t
)
increases.
Volatility clustering. We tested prices for volatility clustering using Engle’s ARCH test
with lags 2 and 20 and reported results in Table 2.2. The absence of ARCH e?ects has
been typically rejected even at 1% signi?cance level, con?rming volatility clustering. The
results are in accordance with results of ARCH tests performed on real prices. We also
report sample autocorrelations in squared returns for lags 1, 20, 50 denoted by
ˆ
R
(r
2
)
in
Table 2.3. We can observe that increasing the weight of chartists causes stronger volatility
clustering e?ects. We give a brief heuristic analysis of this e?ect in the following. The
2.3. MODEL VALIDATION 17
e?ect of the trading of a single chartist i is that he pushes p
t
towards his belief p
C,i
t
. The
magnitude of the price shift caused by the chartist is positively correlated with
log
_
p
C,i
t
p
t?1
_
= ?
i
¯ r
t?1
,
which we easily get from (2.8). Hence, the magnitude of the aggregate impact of the whole
chartist community is positively correlated with ¯ r
t?1
. Thus, the e?ect of chartist trading
can be interpreted as saying that it adjusts the variance of the market price process by
an amount that is proportional to the absolute trend and controlled by parameter ?. As
the trend varies slowly in time due to smoothing controlled by ?, the variance induced by
chartists possesses a high inertia. This explains the slow decay of the sample autocorre-
lation function. Higher autocorrelations by higher chartist presence is a consequence of
the stronger aggregate impact of chartists on volatility. The low values in columns
ˆ
R
(r
2
)
,
row w = 0 in Table 2.3 reveal that autocorrelations in the ?ltered news process ¯ n
t
do not
increase the autocorrelation in squared returns signi?cantly.
Long memory. We examined whether long-term dependence is exhibited by returns, and
squared returns series using the method of Geweke and Porter-Hudak (GPH). We applied a
frequency range between 10000
0.1
and 10000
0.8
. We report results in columns GPH
(r)
and
GPH
(r
2
)
for returns and squared returns respectively. In Table 2.4, the estimated fractional
di?erencing parameters are shown. Table 2.2 shows that the long memory null for raw
returns can not be rejected typically at 5% signi?cance level and it is typically rejected for
squared returns at 1% signi?cance level. The estimated fractional di?erencing parameter
for squared returns falls within the range (0.05, 0.5) reported by empirical studies, such
as e.g. Alfarano (2006). According to the tests, long memory is exhibited by squared
returns but not by raw returns. A possible source of long memory in squared returns is
the heterogeneity in agents memory de?ned by ?, ?, see Diks and van der Weide (2005);
Matteo et al. (2004).
Fat tails. In order to check for fat tails in returns distributions, we calculated the
popular Hill tail index from the upper 5% and 10 % of the tail. In Table 2.4 we report
results matching usual empirical ?ndings according to which the index lies between 2 and 5.
Note that intraday returns typically exhibit fatter tails than daily data. Interestingly, for
some return series generated with w = 2, the estimated 10% index fell below 2, indicating
an in?nite variance of returns. The variance is known to be ?nite though, as we apply a
cap of 2% on returns in the pricing algorithm. A tail index less than 2 are also reported
18 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
in some empirical statistical investigations, see Alfarano (2006).
We can consider heavy tails as a result of mixing di?erent returns distributions arising
in two regimes in the price dynamics: one representing calm market periods when the
variance of r
t
is small since |¯ r
t
| is small and one representing trending periods, i.e. when
the variance of r
t
is big since |¯ r
t
| is big. The di?erence between the variances in the two
regimes controls the tail fatness. A high ? reduces the di?erence in the variances of the two
regimes inasmuch it prevents the variance of r
t
from getting low in calm market periods.
The tail gets heavier when the market fraction of chartists is higher, as the di?erence in
variances in the two regimes increases as the chartist trading increases.
w AR(1)
ˆ
R
(r)
(1)
ˆ
R
(r)
(20)
ˆ
R
(r
2
)
(1)
ˆ
R
(r
2
)
(20)
ˆ
R
(r
2
)
(50)
0
0.9997 0.0052 0.0022 0.0288 0.0271 0.0203
0.0002 0.0079 0.0094 0.0102 0.0087 0.0113
0.5
0.9997 0.0007 0.0011 0.0518 0.0396 0.0280
0.0003 0.0097 0.0079 0.0142 0.0176 0.0112
1
0.9997 -0.0005 0.0026 0.1270 0.1101 0.0689
0.0003 0.0136 0.0101 0.0234 0.0134 0.0160
2
0.9997 -0.0027 -0.0035 0.2899 0.2676 0.2115
0.0003 0.0198 0.0237 0.0437 0.0474 0.0375
(0.5,1.5)
0.9996 -0.0017 -0.0018 0.1033 0.0887 0.0668
0.0004 0.0132 0.0114 0.0855 0.0761 0.0670
NQ100
0.9984 0.0089 0.0105 0.1849 0.0662 0.0399
0.0012 0.0347 0.0376 0.0655 0.0328 0.0291
Table 2.3: Estimated AR(1) coe?cient and sample autocorrelation of returns and squared
returns with various lags.
Summary. Although the results above indicate that our ACF model is capable of
reproducing important stylized facts exhibited by real prices, it is by far not proven that
this model is a true speci?cation of the real price generating mechanism. In fact, it is
impossible to formally validate an ACF model based on ?nancial data. However, our
model is a formal description of heuristics checked in the indicated empirical literature
and some of its components are well-established in ACF. Thus, considering the statistical
comparison above we are con?dent that our model can be applied to provide an economic
interpretation of statistical phenomena we observe on real prices.
2.4. FITTING A GARCH MODEL 19
w Hill 5% Hill 10% GPH
(r)
GPH
(r
2
)
0
4.4559 3.3065 -0.0021 0.0834
0.1662 0.0827 0.0149 0.0174
0.5
4.2554 3.1322 -0.0020 0.1116
0.1371 0.1089 0.0161 0.0152
1
3.7679 2.7640 -0.0002 0.1900
0.2561 0.1653 0.0174 0.0192
2
1.9251 1.2955 0.0112 0.2410
0.4648 0.3498 0.0259 0.0352
(0.5,1.5)
3.8173 2.7894 -0.0015 0.1465
0.8124 0.6445 0.0207 0.0632
NQ100
3.0011 2.3178 -0.0105 0.1866
0.2476 0.1654 0.0275 0.0379
Table 2.4: The Hill tail index and estimated fractional di?erencing parameter of returns
and squared returns.
2.4 Fitting a GARCH model
Our goal in this section is to ?nd a relationship between the ACF model and a technical
model, since it then could possibly allow to infer (inversely) on the ACF model parameters
from the ?tted technical model parameters. The GARCH model, being a simple volatility
model, seems to be suitable for this purpose, since according to numerical experiments
performed for validation, the parameters of the ACF model have a big in?uence on the
volatility structure of the generated time series. Thus we examine the e?ect of the fun-
damental value p
0
, the uncertainty about fundamentals ?
2
introduced in (2.10)-(2.11) and
the market fractions w on the parameters of a technical model ?tted to the return process.
We consider a GARCH(1, 1) model given by
r
t
=
_
h
t
?
t
, (2.19)
h
t
= ?
?
0
+?
?
r
2
t?1
+?
?
h
t?1
, (2.20)
with ?
t
? N(0, 1) i. i. d. sequence, and real parameter vector
?
?
=
_
_
_
?
?
0
?
?
?
?
_
_
_
, (2.21)
20 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
is such that ?
?
0
> 0 and ?
?
, ?
?
? 0. We impose the condition ?
?
+ ?
?
< 1, which ensures
the existence of a unique stationary solution. (Alternative noise models having fat tails,
are the subject of future analysis).
According to the GARCH model, the volatility of the return process r
t
is determined
by the conditional variance process h
t
as in (2.19). The conditional variance process h
t
follows a linear dynamics, it is driven by past squared returns.
The conditional quasi log-likelihood function of observations (r
1
, . . . , r
T
) is the follow-
ing:
L
T
(?|r
1
, . . . , r
T
) =
T

t=1
l
t
(?|r
1
, . . . , r
t
) = (2.22)
=
T

t=1
?
1
2
_
log
ˆ
h
t
(?) +
r
2
t
ˆ
h
t
(?)
_
, (2.23)
where
ˆ
h
t
is calculated by inverting (2.20) for an assumed true parameter ?, under the
condition
ˆ
h
0
=
?
0
1 ?? ??
, r
t
= 0 for t < 0. (2.24)
(Thus h is initialized with the unconditional variance.) Note, that in the case of Gaussian
disturbances ?
t
(2.23) is the exact conditional likelihood.
Our goal is to examine the relationship between the parameters of the GARCH(1,1)
model that matches best the stock returns generated by our ACF model, and the economic
factors p
0
, ?
2
, w. First, we examine the e?ect of the market fraction w, while keeping p
0
,
?
2
?xed. Let
r(w) = (r
1
(w), · · · , r
T
(w))
T
denote the corresponding return series. We ?t a GARCH(1,1) model using the standard
quasi maximum likelihood method, to obtain the o?-line maximum likelihood estimate
ˆ
?
T
.
Applying the methods of Gerencs´er et al. (2010) it can be shown that, under appropriate
technical conditions, for T tending to in?nity,
ˆ
?
T
converges to a limit
ˆ
?. The latter
corresponds to the GARCH model that ?ts best our ACF model in a metric de?ned by
the quasi-likelihood.
In our simulation experiment with N = 200 agents, we generated return series r(w) of
length 10
5
with varying w, 0 ? w ? 2, and we ?tted a GARCH(1,1) model to each time
series using an e?cient numerical optimization method developed recently by Torma and
G.-T´ oth (2010), see Chapter 3. We extracted ˆ ? and
ˆ
? from the resulted
ˆ
? and plotted
2.4. FITTING A GARCH MODEL 21
them against w which we can see on Figure 2.3.
The curve on the left shows that ˆ ? tends to get higher by increasing the weight of
chartists, indicating a higher impact of returns on volatility. The curve on the right shows
that
ˆ
? tends to get higher by increasing the weight of fundamentalists, indicating an
increasing memory for
ˆ
h
t
. We can explain this ?nding heuristically as follows. In case of
w = 0, i.e. without any chartist on the market, the estimated latent GARCH volatility
ˆ
h
t
approximates the dynamics of the ?ltered news process n
t
, which determines the volatility
via the trading of fundamentalists, see (2.10). With a fairly high memory of analyst
coverage (a = 0.7) and a low dependence on past news (b = 0.2) we get a high
ˆ
? and
a low ˆ ? on the fundamentalist market. As chartist adjust the volatility based on past
returns, increasing the weight of them increases the dependence of
ˆ
h
t
on past returns and
simultaneously decreases the dependence of
ˆ
h
t
on the ?ltered news process.
0 0.5 1 1.5 2
0
0.05
0.1
0.15
0.2
0.25
w
C
/w
F
ˆ
?
0 0.5 1 1.5 2
0.7
0.75
0.8
0.85
0.9
0.95
1
w
C
/w
F
ˆ
?
Figure 2.3: Relationship between estimated GARCH coe?cients and market fractions with
model parameters as in Table 2.1.
Note that we have set the baseline parameters and the upper limit for w so that the ?tted
GARCH parameters span a range that seems to be realistic considering some documented
results of empirical GARCH ?tting, see e.g. Cont (2001). We can also realize in Figure 2.3
that ˆ ? +
ˆ
? is close to 1, which is a typical observation when ?tting GARCH to real data.
In the following we examine the dependence of ?tted GARCH parameters on p
0
, ?
2
, w
statistically, by linear regression. In a regression experiment we ?rst choose a parameter
setting for all ACF model parameters according to (2.18) randomly, then we generate 20
times a return series of length 3 · 10
4
by varying a selected regressor again according to
22 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
(2.18). We ?t a GARCH(1,1) model and perform Student’s t–test with the null hypothesis
being that the regression coe?cient is zero. We ran this regression experiment 10 times
for each regressor in order to check the sensitivity of the results on the parameter settings.
In Table 2.5 the extrema of the p-values of the hypothesis tests are reported. None of
the null-hypotheses for the parameters p
0
and ?
2
can be rejected at a 5% signi?cance level,
indicating that ˆ ? and
ˆ
? are not correlated to p
0
, nor to ?
2
. On the other hand, it can
be rejected in all cases for parameter w on a very small signi?cance level, con?rming the
dependence we found by eye inspection of Figure 2.3. Numerical investigations indicate
the following reasons for this ?nding. The lack of in?uence of p
0
on ˆ ? and
ˆ
? follows from
the property that p
0
has no e?ect on the variance of the returns process at all. As for ?
2
,
this parameter is a constant scaler of the variance of the returns r
t
and also the trend ¯ r
t
,
hence it only controls ˆ ?
0
but not ˆ ? and
ˆ
?.
p
0
? ˆ ? p
0
?
ˆ
? ? ? ˆ ? ? ?
ˆ
? w ? ˆ ? w ?
ˆ
?
min. p-value 0.1433 0.1429 0.0910 0.0908 1 · 10
?9
3 · 10
?9
max. p-value 0.9236 0.9266 0.8356 0.8420 5 · 10
?5
2 · 10
?4
Table 2.5: Minimum and maximum p-value from the regression experiments.
Thus we come to the following conclusion.
Property 1. Considering ˆ ?,
ˆ
? as functions of p
0
, ?
2
, w, our numerical experiments
indicate that
? ˆ ?
?w
C
> 0,
?
ˆ
?
?w
F
> 0.
Moreover,
? ˆ ?
?p
0
=
?
ˆ
?
?p
0
= 0 and
? ˆ ?
??
=
?
ˆ
?
??
= 0,
for p
0
? (50, 150), ?
2
? (0.005, 0.015).
Discussion. At the end of this section, we shall compare our ACF model and the
GARCH(1,1) model from a system’s architecture point of view. In particular, we examine
the main sources of the volatility of the return process. In both models we can identify
two heuristic sources of volatility: market returns and the economic environment. In the
GARCH case, the hidden volatility process h
t
can be considered as a volatility source in
the environment. In the ACF model, the number of news, denoted by n
t
is generated by
the environment, as shown in Figure 2.1. Note that the ?ltering of n
t
in (2.9) increases the
memory of the opinion variance process ¯ n
t
, which we need to get a high GARCH coe?cient
ˆ
?.
2.5. THE GARCH-BASED CHANGE DETECTION ALGORITHM 23
The main di?erence between the two models lies in the relationship between the envi-
ronmental volatility source and market prices. In the GARCH case the conditional variance
h
t
= E (r
2
t
) depends only on past returns (r
t?1
, r
t?2
, . . .), implying the GARCH assumption
that the e?ect of all economic factors having an impact on the conditional variance are
already re?ected in past market prices. In contrast to this, the environmental volatility
process n
t
in the ACF model is independent from market returns. In the light of this
di?erence, it is very interesting that our ACF model reproduces certain properties of a
GARCH process so well.
2.5 The GARCH-based change detection algorithm
The established relationship between our ACF model and the best ?tting GARCH(1,1)
model motivates the use of GARCH(1,1) models for statistical inference on the economic
factors entering the ACF model. In particular, GARCH(1,1) models are prime candidates
for detecting structural changes in our ACF model.
The setup for GARCH(1,1) is the following: assume that market returns (r
1
, r
2
, . . .)
are generated by a GARCH(1,1) model with true parameter
?
?
t
=
_
?
1
for t < ?,
?
2
for t ? ?,
where ? is the change-point. The problem is to estimate ? on-line. Consider ?rst an on-line
recursive quasi maximum likelihood estimation algorithm with ?xed gain:
ˆ
?
t
=
ˆ
?
t?1
??
ˆ
H
?1
t
g
t
, (2.25)
ˆ
H
t
= (1 ??)
ˆ
H
t?1
+?H
t
(2.26)
for t > 0, where g
t
, H
t
are on-line approximations of the gradient and the Hessian of the
negative log-likelihood function, respectively, as described in Gerencs´er et al. (2010), and
0 < ? << 1 is a small ?xed gain. The initial value
ˆ
?
0
is given by the user, and we may
choose
ˆ
H
0
= I.
Based on the qualitative characterization of a fairly general class of ?xed gain recursive
estimators given in Gerencs´er (1996) we expect that the tracking error
ˆ
?
t
??
?
is of the order
of magnitude ?
1/2
for time-invariant systems with ? = ?. On the other hand increasing
? improves the adaptivity of the recursive estimation method at and after ?. For time-
24 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
invariant GARCH models with bounded noise ?
t
we get
sup
t>0
E
1
q
|
ˆ
?
t
??
?
|
q
= O
_
?
1
2
_
, 1 ? q ? ?
except for an exponentially decaying term, due to initialization e?ects, see Gerencs´er
(2006).
Let us now specify the derivatives needed in the recursion. The conditional negative
quasi log-likelihood of observation r
t
can be written as
l
t
(r
t
|F
t?1
; ?) = log
ˆ
h
t
(?) +
r
2
t
ˆ
h
t
(?)
. (2.27)
Di?erentiating yields
?
?
l
t
(?) =
?
?
ˆ
h
t
(?)
ˆ
h
t
(?)
_
1 ?
r
2
t
ˆ
h
t
(?)
_
, (2.28)
We di?erentiate (2.28) to get the Hessian, which we can split into a term containing only
?rst order derivatives of
ˆ
h and a term containing only second order derivatives of
ˆ
h:
?
2
?
l
t
(?) =
?
2
?
ˆ
h
t
(?)
ˆ
h
t
(?) ??
?
ˆ
h
t
(?)
_
?
?
ˆ
h
t
(?)
_
T
ˆ
h
2
t
(?)
_
1 ?
r
2
t
ˆ
h
t
(?)
_
?
?
?
?
ˆ
h
t
(?)
_
?
?
ˆ
h
t
(?)
_
T
r
2
t
ˆ
h
3
t
(?)
=
= D
(1)
t
(?) +D
(2)
t
(?),
D
(1)
t
(?) = c
t
· ?
?
ˆ
h
t
(?)
_
?
?
ˆ
h
t
(?)
_
T
, c
t
=
ˆ
h
?2
t
(?)
_
2r
2
t
ˆ
h
t
(?)
?1
_
,
D
(2)
t
(?) =
?
2
?
ˆ
h
t
(?)
ˆ
h
t
(?)
_
1 ?
r
2
t
ˆ
h
t
(?)
_
.
Now assume for the moment that ? = ?
?
. Then, running the recursion for
ˆ
h with the
right initial condition we have
ˆ
h = h and
r
2
t
ˆ
h
t
(?)
= ?
2
t
for ? = ?
?
2.5. THE GARCH-BASED CHANGE DETECTION ALGORITHM 25
from (2.19). Thus we have
E
_
D
(2)
t
(?)
_
= E
_
E
_
D
(2)
t
(?)|F
t?1
__
= (2.29)
= E
_
?
2
?
ˆ
h
t
(?)
ˆ
h
t
(?)
E
_
1 ??
2
t
_
_
= (2.30)
= 0 for ? = ?
?
. (2.31)
In (2.30) we used that
ˆ
h
t
and ?
2
?
ˆ
h
t
(?) are F
t?1
-measurable and ?
2
t
is independent from
F
t?1
.
Based on (2.29)-(2.31) we neglect D
(2)
t
from the Hessian approximation. Numerical
simulations have shown that a quasi Newtonian direction with an approximate Hessian
D
(1)
points typically closer to ?
?
from a distant ? than using a Hessian D
(1)
+D
(2)
.
Thus, in (2.25)-(2.26) we apply
g
t
= ?
?
l
t
_
ˆ
?
t?1
_
, (2.32)
H
t
= D
(1)
_
ˆ
?
t?1
_
. (2.33)
The algorithm is extended by a basic resetting mechanism to keep
ˆ
? in the stability domain,
see Gerencs´er and M´aty´as (2007a).
In order to save the computational e?ort needed for the inversion of
ˆ
H
t
in (2.25), we
can rewrite the recursion (2.26) directly for the inverse P
t
=
ˆ
H
?1
t
using the well-known
Matrix Inversion Lemma
(A+UCV )
?1
= A
?1
?A
?1
U
_
C
?1
+V A
?1
U
_
?1
V A
?1
where A, C, U, V are matrices of appropriate size. Thus we get from (2.26)
P
t
=
1
1 ??
_
P
t?1
?q
t
P
t?1
_
?
?
ˆ
h
t
(?)
__
?
?
ˆ
h
t
(?)
_
T
P
t?1
_
, (2.34)
q
t
=
_
1 ??
?c
t
+
_
?
?
ˆ
h
t
(?)
_
T
P
t?1
_
?
?
ˆ
h
t
(?)
_
_
?1
, (2.35)
where
ˆ
h
t
is the recursive estimation of h
t
.
On Figure 2.4 on the left we present the trajectory of the on-line estimates of
GARCH(1,1) parameters using ?
1
= 0.0005.
The change detection algorithm itself is based on the Minimum Description Length
26 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
(MDL) principle, and is an adaptation of the method for ARMA-processes given in
Gerencs´er and Baikovicius (1992, 1991).
A central idea of MDL modelling is that the estimated conditional negative quasi log-
likelihood can be interpreted as the code length needed for encoding the next incoming
observation, say r
t
, see Rissanen (1989). Here conditional means conditional on past
data. Thus we get a so-called predictive encoding procedure. The choice of the encoding
procedure may depend on the assumed position of the change point. The hypothesis giving
the shortest code-length will then be accepted.
To carry out this program let us consider the estimated conditional negative quasi
log-likelihood for the observation r
t
:
C
t
(?) := l
t
_
ˆ
?
t?1
(?)|r
t
_
= log
ˆ
h
t
+
r
2
t
ˆ
h
t
,
where
ˆ
?
t?1
(?) is the estimated parameter and
ˆ
h
t
is the estimated h
t
using the recursive
estimation algorithm (2.25)-(2.26) with step size ?. Thus C
t
is interpreted as a code length
for encoding r
t
. An alternative de?nition of C
t
may be obtained by using an assumed fat-
tailed distribution for the noise, such as Student’s t-distribution.
The main idea of the proposed change-point detection algorithm is to run two instances
of the recursive estimation algorithm with di?erent step sizes 0 < ?
1
< ?
2
< 1. (We apply
throughout the chapter ?
2
= 2?
1
.) Taking into account the above mentioned properties
of the ?xed gain recursive estimators we note that
ˆ
?
t
(?
2
) has higher variance than
ˆ
?
t
(?
1
)
in the initial phase prior to ?, while
ˆ
?
t
(?
2
) has better tracking abilities at and right after
?. Thus in the initial phase C
t
(?
2
) is expected to exceed C
t
(?
1
), on average, while at and
right after ? the opposite is true:
E(C
t
(?
1
) ?C
t
(?
2
)) < 0 for t < ?, (2.36)
E(C
t
(?
1
) ?C
t
(?
2
)) > 0 for t immediately after ?. (2.37)
Using these properties, we can use the so-called Hinkley-detector, see Hinkley (1971)
to detect the change-point. Let us de?ne the so-called CUSUM statistics by
S
t
=
t

k=1
(C
k
(?
1
) ?C
k
(?
2
)) ,
2.6. NUMERICAL PERFORMANCE EVALUATION 27
and set
S
?
t
= min
k<t
S
k
.
Then an alarm is generated as soon as
S(t) ?S
?
t
> ?,
where ? > 0 is a prescribed sensitivity threshold. On Figure 2.4 on the right the trajectory
of the CUSUM statistics is given, with ? in the middle of the horizon.
Note that the above method can also be applied if ?
?
t
exhibits a considerable drift,
rather than an abrupt change, starting at ?.
2.6 Numerical performance evaluation
We carried out extensive numerical experiments to test the performance of the change de-
tection algorithm. Table 2.6 shows test problems I.-IV. de?ned by the GARCH parameters
before and after the change point. The test problem coe?cients are selected to be close to
the coe?cients one may get when ?tting a GARCH(1,1) model to real data. The change
in the parameters is pretty small in all test problems. In the well-conditioned settings
I.-II. also the unconditional variance of returns changes at the change point, while in the
ill-conditioned settings III.-IV. it is constant. Thus, we can consider testing on problem
types III.-IV. as a stress test of the algorithm.
I. II. III. IV.
?
?
1
?
?
2
?
?
1
?
?
2
?
?
1
?
?
2
?
?
1
?
?
2
1 1 1 1 1 1 1 1
0.05 0.1 0.1 0.05 0.05 0.1 0.1 0.05
0.9 0.8 0.8 0.9 0.9 0.85 0.85 0.9
Table 2.6: GARCH parameters of the test problems.
Figure 2.4 shows the trajectories of the estimated parameters and CUSUM test statistics
for test problem I using ?
1
= 0.0005. The change-point is in the middle of the time horizon
at ? = 10
4
. We can see in the ?gure that the estimated coe?cients nicely track the change.
Note that in the typical real application the values of ?
1
and ?
2
are not constant but
slightly varying. In this case the downward trend of S
t
is less sti? since E (C
t
(?
1
) ?C
t
(?
2
))
is higher.
28 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
0 0.5 1 1.5 2
x 10
4
0
0.1
0.2
ˆ?
t
0 0.5 1 1.5 2
x 10
4
0.8
0.9
1
ˆ
?
t
t
0 0.5 1 1.5 2
x 10
4
?16
?14
?12
?10
?8
?6
?4
?2
0
2
S
t
t
Figure 2.4: GARCH(1,1)-based recursive estimation (left) and CUSUM test statistics
(right) on test problem I.
We also calculated the empirical Fisher information matrix
ˆ
I =
1
T
T

t=1
?l
t
(?
?
)
??
_
?l
t
(?
?
)
??
_
T
,
its eigenvalues ? and its condition number ? based on T = 20000 observations generated
with parameters de?ned by test problem I.:
ˆ
I
1
=
_
_
_
0.1444 2.4654 2.7596
2.4654 49.4789 49.6788
2.7596 49.6788 54.1695
_
_
_
?
1
=
_
_
_
0.0027
2.0971
101.693
_
_
_
?
1
= 3.7664 · 10
4
,
ˆ
I
2
=
_
_
_
0.1532 1.1356 1.3967
1.1356 11.3480 11.3907
1.3967 11.3907 13.3550
_
_
_
?
2
=
_
_
_
0.0052
0.9289
23.9222
_
_
_
?
2
= 4.6004 · 10
3
.
The relatively low upper-left value of the Fisher information matrices compared to the
other two diagonal values indicates that the algorithm can track the parameters ?
?
and ?
?
much faster than the parameter ?
?
0
. The condition number is pretty high.
In order to quantify the detection capability of the change-point detection algorithm
we estimated three quality measures: the probability that an alarm is generated too early,
which is called false alarm probability and de?ned as
Pr(FA) = Pr(ˆ ? < ?),
2.6. NUMERICAL PERFORMANCE EVALUATION 29
the probability that no alarm is detected until ? + ?t, which is called missed alarm prob-
ability and de?ned as
Pr(MA) = Pr(ˆ ? > ? + ?t),
and the expected alarm delay
E(ˆ ? ??|? < ˆ ? ? ? + ?t),
where ˆ ? is the change-point estimator. The parameter ?t expresses the time delay after
which a detected change is too late for the user. In experiments calculating the expected
delay, we set the time out large to get only a few missed alarms. The e?ect of the time
out setting is then examined separately.
We tested the algorithm with two di?erent step-size settings, ?
1
= 0.0005 and ?
1
=
0.001. In all experiments the length of returns series is n = 10000, the change-point is at
? = 5000. For all test problems, we generated 50 times the return series and calculated
the corresponding cumulative sums S
t
and then estimated the three quality measures from
the trajectories S
t
.
There is a trade-o? in choosing the sensitivity threshold ?: a high ? results in a low
false alarm probability but yields a higher expected alarm delay and a higher rate of missed
alarms at the same time. Thus, in Figure 2.5-2.6 we plot the Pr(FA) (solid) and average
delay (dashed) against a varying ?.
Comparing the two ?gure columns we can conclude that the e?ect of the step-size on
the average delay for similar false alarm probability levels di?er for di?erent test problem
types. For the well-conditioned problems, faster forgetting yields slightly better results.
For the ill-conditioned problems, slower forgetting yields better results.
Figure 2.7 shows P(MA) vs. ?t applying a ? which is the smallest that resulted in
P(MA) = 0.1 in the tests shown in Figure 2.5-2.6; only results with the better forgetting
factor are presented.
For comparison purposes we report numerical results also for the Threshold GARCH
(TGARCH) model in Figures 2.8-2.9 in a similar fashion as for GARCH except that here
only results with one forgetting factor setting are included. The TGARCH model captures
asymmetries in the volatility of returns: (2.20) modi?es to
h
t
= ?
?
0
+
_
?
?
_
?
I {r
t?1
< 0} r
2
t?1
+?
?
r
2
t?1
+?
?
h
t?1
,
where I {x < 0} = 1 if x < 0 and 0 otherwise. In the TGARCH related experiments we
30 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
0.5 1 1.5 2 2.5
0
0.25
0.5
0.75
1
P
r
(
F
A
)
0.5 1 1.5 2 2.5
100
200
300
400
A
v
e
r

D
e
l
a
y
1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.25
0.5
0.75
1
P
r
(
F
A
)
1 1.5 2 2.5 3 3.5 4 4.5 5
100
200
300
400
A
v
e
r

D
e
l
a
y
0.5 1 1.5 2 2.5
0
0.25
0.5
0.75
1
P
r
(
F
A
)
0.5 1 1.5 2 2.5
100
200
300
400
A
v
e
r

D
e
l
a
y
1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.25
0.5
0.75
1
P
r
(
F
A
)
1 1.5 2 2.5 3 3.5 4 4.5 5
100
200
300
400
A
v
e
r

D
e
l
a
y
Figure 2.5: GARCH: Results for test problems I.-II. from top to bottom, with ?
1
= 0.0005
(left) and ?
1
= 0.001 (right). Crosses indicate the smallest ? s. t. P(MA) = 0.1.
used the parameter settings from Table 2.6 for ?
?
0
and ?
?
, and we set
_
?
?
_
?
= ?
?
=
2
3
?
?
G
,
with ?
?
G
being the second coe?cient of the parameter vector from Table 2.6.
Since the TGARCH(1,1) model is in general more di?cult to estimate than the
GARCH(1,1) model, we used smaller forgetting factors in the TGARCH case. We get a
slightly worse overall change detection performance for the TGARCH model in comparison
to the GARCH model.
Change detection in market fractions. According to Property 1, we would expect a
change in parameters of a GARCH(1,1) model ?tted to a returns series generated by our
ACF model with an abrupt change in the market structure parameter w. We veri?ed this
intuition by running our recursive estimation algorithm (2.25)-(2.26) on returns simulated
by our ACF model with w satisfying
w
?
t
=
_
w
1
=
1
5
t < ?,
w
2
= 1 t ? ?.
In Figure 2.10 on the left P(FA) is shown against ?, × and + indicate the smallest ?
2.6. NUMERICAL PERFORMANCE EVALUATION 31
0.5 1 1.5 2 2.5
0
0.25
0.5
0.75
1
P
r
(
F
A
)
0.5 1 1.5 2 2.5
375
750
1125
1500
A
v
e
r

D
e
l
a
y
1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.25
0.5
0.75
1
P
r
(
F
A
)
1 1.5 2 2.5 3 3.5 4 4.5 5
375
750
1125
1500
A
v
e
r

D
e
l
a
y
0.5 1 1.5 2 2.5
0
0.25
0.5
0.75
1
?
P
r
(
F
A
)
0.5 1 1.5 2 2.5
375
750
1125
1500
A
v
e
r

D
e
l
a
y
1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.25
0.5
0.75
1
?
P
r
(
F
A
)
1 1.5 2 2.5 3 3.5 4 4.5 5
375
750
1125
1500
A
v
e
r

D
e
l
a
y
Figure 2.6: GARCH: Results for test problems III.-IV. from top to bottom, with ?
1
=
0.0005 (left) and ?
1
= 0.001 (right). Crosses indicate the smallest ? s. t. P(MA) = 0.1.
at which P(FA) = 0.1 and P(FA) = 0.2, respectively. On the right we show P(MA)
corresponding to the smaller (dashed) and bigger (solid) sensitivity thresholds.
We stress that the GARCH(1,1) model is an incorrect representation of the return
series generated by the ACF model. In particular, we found that the distribution of
the estimated residuals has a fatter tail than the normally distributed GARCH residuals.
The misspeci?cation of the driving noise increases by increasing the weight of chartists.
Because of model misspeci?cations we applied a quite low ?
1
= 5 · 10
?5
. Examining the
algorithm based on a GARCH model with ?
t
exhibiting heavy tails, such as the Student’s
t-distribution, say, is subject of future research.
Regarding the application of the algorithm on real data we conclude that the delay
tolerance of the user is a major factor determining the sampling rate of the price data
under analysis. For example, a tolerance level of approximately two weeks, which may be
characteristic for medium term investors or regulators, requires 15 minute intraday data.
See related empirical tests in Section 2.8.
32 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
100 200 300 400 500 600
0
0.1
0.2
0.3
0.4
0.5
0.6
timeout
P
r
(
M
A
)
500 1000 1500 2000 2500 3000 3500
0
0.1
0.2
0.3
0.4
0.5
0.6
timeout
P
r
(
M
A
)
Figure 2.7: GARCH: Missed alarm probability vs. timeout for problem I. (solid), II.
(dashed) on the left and III. (solid), IV. (dashed) on the right.
2.7 Interpretation of an alarm
Although the applicability of misspeci?ed models for change detection is a widely accepted
principle, a closer look in any speci?c application is needed. In this section we provide
a possible interpretation of an alarm generated by our change detection algorithm using
GARCH models.
Property 2. Let the price process be generated by our ACF model so that the behavior
patterns of the agents, and the information generating system do not change over time. On
the other hand, let us assume that the parameter vector (p
0
, ?
2
, w) does change at ?, but
is constant before and after time ?, and the value of w does change. Then, the parameters
ˆ ?,
ˆ
? of the GARCH(1,1) model that matches best the data before and after ? can not be
identical. Thus a change detection algorithm based on a GARCH(1,1) model will detect
the change in the market fraction parameter w.
Let us now discuss the assumptions formulated in the property.
Fix behaviors. Agent parameters and parameters of the information arrival process
represent variables controlling the behavior of portfolio managers and analysts, respectively.
We may assume that the underlying behavior of market participants is stable over time
naturally, although may change slowly via a learning process. In addition, learning has a
stabilizing e?ect, as demonstrated in Gerencs´er and M´aty´as (2007b).
Factors that may trigger a change in market fractions. Recall that
_
w
C
, w
F
_
denote the
distribution of wealth among fundamentalists and chartists. According to standard utility
theory, risk averse investors redistribute their wealth if their expectations about pro?t
extent and risk, regarding the two strategies, change. We can conclude, that a change in
pro?tability expectations may imply a change in the agent weights.
2.8. EMPIRICAL FINDINGS 33
2 3 4 5 6 7 8 9 10
0
0.25
0.5
0.75
1
P
r
(
F
A
)
2 3 4 5 6 7 8 9 10
200
400
600
800
A
v
e
r

D
e
l
a
y
1 1.5 2 2.5 3 3.5 4
0
0.25
0.5
0.75
1
P
r
(
F
A
)
1 1.5 2 2.5 3 3.5 4
625
1250
1875
2500
A
v
e
r

D
e
l
a
y
2 3 4 5 6 7 8 9 10
0
0.25
0.5
0.75
1
?
P
r
(
F
A
)
2 3 4 5 6 7 8 9 10
200
400
600
800
A
v
e
r

D
e
l
a
y
1 1.5 2 2.5 3 3.5 4
0
0.25
0.5
0.75
1
?
P
r
(
F
A
)
1 1.5 2 2.5 3 3.5 4
625
1250
1875
2500
A
v
e
r

D
e
l
a
y
Figure 2.8: TGARCH: Results for problems I.-II. with ?
1
= 0.0005 on the left and III.-IV.
with ?
1
= 0.00025 on the right. Crosses indicate the smallest ? s. t. P(MA) = 0.1.
Chartists measure their pro?tability using past observations. Since the statistical char-
acteristics of the price process are typically stable over time, signi?cant abrupt changes in
the expectations of chartists are unlikely.
Pro?tability expectations of fundamentalists may change if the di?erence between the
current market price and the belief about future fundamental values changes signi?cantly.
The main factors determining their future pro?t expectations can be captured by the
fundamental value p
0
and the uncertainty about fundamentals ?. In fact, it seems plausible
that public information attracting considerable attention may cause portfolio managers to
change their beliefs about the uncertainty of future fundamental values. This may in turn
trigger them to redistribute funds between chartist and fundamentalist strategies. Thus,
we conclude that p
0
and ? do have an in?uence on ˆ ? and
ˆ
?, but this in?uence is realized
only via w.
2.8 Empirical ?ndings
We tested our change-point detection algorithm on real market data. We downloaded 15
minute prices of the NASDAQ100 index from the data provider FINAM. As discussed
in Section 2.6, the performance of the change detection algorithm suggests a delay of
34 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
200 300 400 500 600 700 800
0
0.1
0.2
0.3
0.4
0.5
0.6
timeout
P
r
(
M
A
)
1000 1500 2000 2500 3000 3500 4000 4500
0
0.1
0.2
0.3
0.4
0.5
0.6
timeout
P
r
(
M
A
)
Figure 2.9: TGARCH: Missed alarm probability vs. timeout for problem I. (solid), II.
(dashed) on the left and III. (solid), IV. (dashed) on the right.
1 2 3 4 5 6 7
0
0.25
0.5
0.75
1
?
P
r
(
F
A
)
1 2 3 4 5 6 7
500
1000
1500
2000
A
v
e
r

D
e
l
a
y
1000 1500 2000 2500 3000 3500
0
0.1
0.2
0.3
0.4
0.5
0.6
timeout
P
r
(
M
A
)
Figure 2.10: Change detection based on returns generated by the ACF model.
approximately 2 weeks for change alarms based on this time scale. The dataset spans the
time interval between the 17th December 2002 and 30th October 2009 and contains 44483
data points. In order to exclude the sometimes very large nightly price gaps, we used the
daily opening price instead of previous trading day’s closing price in calculating the return
of the ?rst 15 minute interval of a day.
Figure 2.11 depicts the GARCH parameters ?tted recursively with step size ?
1
= 0.0004.
In order to avoid spurious estimators due to an inaccurate Hessian in the burn-in phase,
we applied a very low ?
1
= 0.00002 for strong averaging for the ?rst 10
5
iterations until
approx. August 2004. We set
ˆ
?
0
to the result of ?tting GARCH(1,1) in an o?-line manner
on the ?rst 25000 sample points.
Figure 2.12 depicts the results of change-point detection, in which we applied a sen-
sitivity threshold ? = 10. The CUSUM statistics S
t
does not show the clear downward
trends before the change-point as in the case of simulated data due to model misspeci?-
cation e?ects and also because the best ?tting GARCH parameters may vary a little even
in calm market periods. In order to allow for subsequent alarms, we restarted the Hinkley
2.9. CONCLUSION 35
2003 2004 2005 2006 2007 2008 2009
0.02
0.04
0.06
ˆ
?
t
2003 2004 2005 2006 2007 2008 2009
0.92
0.94
0.96
0.98
ˆ
?
t
Figure 2.11: GARCH coe?cients estimated from NASDAQ100 15 minute returns.
detector after every detected alarm points, the locations of which are indicated by vertical
arrows.
In view of Property 2, the change alarms indicate a change in market fractions. Figure
2.11 suggests an immense increase in the share of chartists near the beginning of 2007.
The change-point arrows are included in the diagram on the top as well, so that we can
examine the e?ect of the assumed sudden big changes in the market fractions on the
market price dynamics. In the price chart on the top, we can realize huge moves of market
prices after almost every change-point signal. At some points, the forecast of the trend
beginning is astonishingly accurate, for example in October 2007, September 2008, March
2009. The ?gures suggest that a sudden change in market fractions causes a consistent up
or downward trend in market prices.
2.9 Conclusion
In this chapter we present a simulation study, in which we give a possible economic expla-
nation of the coe?cients of the GARCH(1,1) model. We generated price returns using a
novel agent-based stock market model by varying the market fractions of chartists and fun-
damentalist and ?tted a GARCH(1,1) model. We found a monotonic relationship between
market fractions and GARCH(1,1) parameters.
Motivated by the ability of the technical model to reproduce data generated by the
fundamental model we use GARCH models to detect changes in the market structure. An
alarm indicating a signi?cant increase of chartists’ market share may trigger regulators to
36 CHAPTER 2. CHANGE DETECTION IN FUNDAMENTAL MODELS
2003 2004 2005 2006 2007 2008 2009
1000
1500
2000
p
t
2003 2004 2005 2006 2007 2008 2009
?0.02
0
0.02
0.04
r
t
2003 2004 2005 2006 2007 2008 2009
80
100
120
S
t
Figure 2.12: NASDAQ100 prices, returns and CUSUM test statistics. Vertical arrows
indicate the change-point alarms.
prohibit shorting, in order to damp this form of market destabilizing chartist trading. We
present a real-time change detection algorithm applying the Minimum Description Length
principle based framework developed by Gerencs´er and Baikovicius (1992, 1991). The
algorithm contains a novel recursive method for GARCH parameter estimation, which is
an e?cient extension of the method analysed in Gerencs´er et al. (2010).
We have tested the change-point detection algorithm extensively on simulated data.
According to the detection performance, a promising application area is to detect structural
changes based on intraday or high-frequency ?nancial data. Thus we tried our algorithm
on 15 minute data of the NASDAQ100 index and found that alarms on the assumed abrupt
changes in the market structure occur just before the price trends become consistent, up
or down, indicating that a real change in the market dynamics has indeed occurred.
The in-depth analysis of the robustness of the algorithm against model misspeci?cations
is an interesting subject of future research.
2.9. CONCLUSION 37
Bibliographical remarks
The main ideas of this chapter were presented at the 3rd International Conference on Com-
putational and Financial Econometrics (CFE’09) held in Limassol, Cyprus, 29-31 October
2009 and at the Erich Schneider Seminar at the Christian-Albrechts-Universit¨at zu Kiel,
Germany on the 14th January 2010 on kind invitation by Prof. Dr. Thomas Lux. A
corresponding manuscript is due to submit for journal publication. The recursive GARCH
estimation component of the change detection algorithm has been accepted for publica-
tion in the proceedings of the 19th International Symposium on Mathematical Theory of
Networks and Systems, (MTNS 2010), Budapest, Hungary, see Gerencs´er et al. (2010).
The GARCH model related theoretical parts of the chapter (Sections 2.4, 2.5, 2.7) are
the joint work of L´aszl´o Gerencs´er and Bal´azs Torma, the recursive GARCH estimation
algorithm was analysed in cooperation with Zsanett Orlovits. The ACF model related
parts, and all statistical and numerical results (Sections 2.2, 2.3, 2.6, 2.8) are by Bal´azs
Torma.
Chapter 3
E?cient o?-line model identi?cation
3.1 Introduction
In this chapter we present a general local optimization method, which we used extensively
for identi?cation of Generalized Autoregressive Conditional Heteroscedastic (GARCH)
models based on the output of the ACF model presented in Chapter 2. The algorithm
is very e?cient and easy to implement, which made it very suitable to include in the JAVA
environment for the ACF model.
The optimization problem we would like to solve is
min
x?S
f(x), (3.1)
where f : S ? R is a twice continuously di?erentiable function with an optimal point x
?
inside S, where S ? R
n
is given by bound constraints. For simplicity, we exclude the cases
when x
?
can reside on the boundary of S. Let f
?
= f(x
?
). We would like to solve the
optimization problem iteratively, i.e. to ?nd a minimizing sequence x
0
, x
1
, x
2
, . . . ? S, such
that f(x
k
) ? f
?
as k ? ?. The calculation of f(x) and its derivatives is assumed to be
computationally expensive.
The main motivation of this research is parametric model identi?cation, see Soderstrom
and Stoica (1989), with a given system model M(?) characterized by a parameter vector
? ? S. Given a series of observations from the system, denoted by y
0
, y
1
, y
2
, . . ., we would
like to ?nd the parameter vector
ˆ
?, such that M(
ˆ
?) best describes the system. Two
well-known example of tools solving the identi?cation problem is least squares estimation
and maximum likelihood estimation. In both cases,
ˆ
? is determined by minimizing a loss
38
3.1. INTRODUCTION 39
Input: tolerance level ?, starting point x
0
Initialization: k = 0.
repeat
1 Search Direction. Determine a descent direction ?x
k
.
2 Line Search. Choose step size t > 0, s. t. f(x
k
) ?f(x
k
+t?x
k
) is su?ciently
large and x
k
+t?x
k
? S.
3 Update. x
k+1
= x
k
+t?x
k
, k = k + 1.
until t?x
k
< ?.
Algorithm 1: General descent method.
function
V (?, y
0
, y
1
, y
2
, . . .)
in ?, which, by choosing f = V and x = ?, directly leads to the optimization problem
(3.1). Evaluating V and its derivatives for larger systems usually requires processing a
huge amount of data, which makes the calculation expensive. An important problem of
this type is the identi?cation of GARCH models, see Bollerslev (1986); Engle (1982), which
we will address in Section 3.4.3 in more detail.
A widely used iterative scheme to solve optimization problem (3.1) is described by
Algorithm 1, see Boyd and Vandenberghe (2004); Fletcher (1980). We call this algorithm
framework as descent method, because in each iteration, ?x
k
satis?es
?f (x
k
)
T
?x
k
< 0, (3.2)
where ?f(x) is the gradient vector of f at x. Condition (3.2) ensures that ?x
k
points in
a direction where f decreases at x
k
. It is a well-known property of descent methods like
the Newton method, that they have a rapid convergence rate near x
?
. Thus, we can split
the iterations of descent methods into two stages: a fast convergence phase, when x
k
is
already near to x
?
, and a damped phase before, when t < 1 damps ?x
k
in order to get
a su?ciently small f(x
k
+t?x
k
).
In Step 2, the line search minimizes
f(x
k
+t?x
k
)
in t > 0. Let t
?
denote the optimal t. The algorithm choosing a t very close to t
?
is
called exact line search. In practice, to save computational time, t
?
is only approximated
with an inexact line search algorithm, in which a step size t is chosen, such that f(x
k
+
40 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
t?x
k
) decreases su?ciently. Conditions for su?cient decrease are developed for example in
Armijo (1966), for a survey of these methods see Shi (2004). The line search approximation
is again carried out iteratively. In the damped phase, f may be evaluated at several points
on the search line, which is a costly operation in several applications in model identi?cation.
In particular, the backtracking line search, starting with t = 1, reduces t by halving it in
a cycle until the objective is decreased su?ciently. Another popular class of inexact line
search algorithms is polynomial line search as discussed in Fletcher (1980); Murray and
Overton (1978), in which a polynomial is ?tted to the objective in the search direction and
the minimizer of the polynomial provides the step size. In our numerical experiments, we
have chosen backtracking line search for its simplicity and e?ectiveness in practice.
In the optimization literature, for example in Boyd and Vandenberghe (2004); Fletcher
(1980), even for inexact line search algorithms it is considered to be important to approxi-
mate t
?
well, i.e. to decrease the objective as much as possible. In model identi?cation, as
function evaluations are expensive, the main task of a line search algorithm should be to
help the descent method to arrive close to x
?
fast, i.e. to have a short damped phase. This
objective does not necessarily coincide with the one mentioned above, i.e. to determine
a nearly optimal step size t
?
. It is easy to see that it is actually not even necessary to
decrease f in order to get closer to x
?
in an iteration. See Auslender et al. (2007); Zhang
and Hager (2004) for recent nonmonotone line search techniques that allow some increase
of the objective, and are, at the same time, e?ective in saving function evaluations.
For a convex function f, localization methods can also be used for minimization. Local-
ization methods have a completely di?erent mechanism compared to descent methods. The
general problem they solve is to ?nd a point in a given set X ? S, implicitly de?ned by an
oracle. The oracle can decide, whether an arbitrary point y ? S is contained in X or not.
If for the current point x
k
/ ? X, it provides a separating hyperplane through x
k
. A local-
ization polyhedron containing X is shrunk by the hyperplanes iteratively until the oracle
says x
k
? X for some k. The query point x
k
is determined as some center of the localiza-
tion polyhedron – it can be for example the Chebyshev center, center of maximum volume
ellipsoid or analytic center, see Chapter 8.5 of Boyd and Vandenberghe (2004), and Boyd
(2008) for an introductory survey and freely available program code for centering methods.
The method applying analytic centers has been chosen for further analysis because of its
superior performance. Minimization can be achieved by a localization method by choosing
an oracle answering ’yes’ if ?f(x
k
) is small enough, for a simple example.
The Analytic Center Cutting Plane Method (ACCPM), see Sonnevend (1988), is mainly
used for minimization of non-smooth functions, because in the smooth case, when also
3.1. INTRODUCTION 41
Input: X ? S
Initialization:
¯
H
0
:= H
0
while true do
1 Calculate AC. Set x
k
= argmax
y

m
i=1
p
T
i
(o
i
?y), with o
i
, p
i
de?ning the m
halfspaces in
¯
H
k
.
2 Query oracle. Is x
k
? X? If yes, then break, else determine halfspace H
?f(x
k
)
de?ned by normal vector ?f(x
k
) and point x
k
.
3 Shrink polyhedron. Set
¯
H
k+1
=
¯
H
k
? H
?f(x
k
)
, k = k + 1.
end
Algorithm 2: Analytic center cutting plane method (ACCPM)
second order derivatives are available, other methods, such as the damped Newton method
is known to be much more e?cient. In the non-smooth case, ACCPM applies subgradients
where f is not di?erentiable. The ACCPM algorithm is proved to terminate in a ?nite
number of iterations, see the corresponding result in Ye (1997). Note, that Ye (1997)
also speci?es a Newton type minimization procedure for calculating the analytic center
(see Step (1) of Algorithm 2), for which a complexity estimation is also given. ACCPM
converges only in the convex case, we are not aware of any extensions for a non-convex
objective.
The ACCPM, in the form presented in this chapter by Algorithm 2, approaches x
?
without evaluating the objective function, as opposed to damped descent methods. Fur-
thermore, intuitively, the center of a shrinking localization polyhedron may approach x
?
in
early iteration steps faster, than the search point in a descent method. Figure 3.1 con?rms
this intuition by comparing ACCPM and the Newton method damped by backtracking line
search for the function (3.7) de?ned later. We measured the computational e?ort needed
by the algorithms for the search point x
k
to satisfy x
k
?x
?
< ?, for di?erent ? > 0. The
evaluation index is based on the number of function evaluations, the number of gradient
and Hessian evaluations for the methods. The minimizer was x
?
= (?3, 2)
T
, while the
initial polyhedron for ACCPM was de?ned by the bound constraints [?10, 3] ×[?6, 8]. For
the damped Newton method, di?erent starting points x
0
were taken equidistantly from
a circle around x
?
with a radius r = 5. We calculated the average of the corresponding
e?ciency indexes. It can be seen from the ?gure, that for ca. ? > 10
?2.5
, ACCPM reaches
the target neighborhood faster than the damped Newton method.
Descent methods, in contrast to ACCPM, work for nonconvex objective functions as
well and have, in general, a faster asymptotic convergence rate than ACCPM. Figure 3.1
also shows the asymptotic performance superiority of the Newton method over ACCPM.
42 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
1 0.1 0.01 0.001 0.0001 1e?0051e?006
0
50
100
150
?
E
v
a
l
u
a
t
i
o
n

i
n
d
e
x
Damped Newton
ACCPM
Figure 3.1: Computational e?ort to reach ?-neighbourhood of x
?
A further drawback of ACCPM is that redundant hyperplanes may be involved in the
calculation of the analytic center, as seen in Step 1 of the ACCPM algorithm. The linear
inequalities of redundant cutting planes may unnecessarily displace the analytic center.
This, in an extreme case, can result in a center that lies close to the boundary of the
localization polyhedron, which is quite counter-intuitive. It would be desirable to eliminate
redundant cutting planes, but identifying them is a costly operation.
The article is organized as follows. In the next section we present a novel bound
constrained optimization algorithm merging the advantages of descent methods and local-
ization methods. In Section 3.3 the convergence of descent methods and our method is
discussed. In the subsequent sections we give experimental evidence of the superior per-
formance of our method in comparison with well-established descent methods addressing
selected optimization problem types. Namely, in Section 3.4.1, we test our method on
convex nonlinear optimization problems and show that the performance of our method is
robust regarding the size of the initial polyhedron de?ned by the bound constraints and
regarding the value of the initial search point x
0
. In Section 3.4.2 we examine our method
on several nonlinear least squares problems. In Section 3.4.3 we identify GARCH models
with our method and point out that it is robust against noisy objective functions with
many local minima. We conclude the chapter with Section 3.5, in which we summarize our
results.
3.2. THE DESCENT DIRECTION METHOD WITH CUTTING PLANES (DDMCP)43
3.2 The Descent Direction Method with Cutting
Planes (DDMCP)
In this section we describe a novel hybrid algorithm, which takes steps in descent directions
and damps them by cutting planes, if necessary. In each iteration, it calculates a descent
direction as a descent method together with a cutting plane like an oracle in a localization
method. The algorithm determines the step length to the midpoint of the line segment
between the current search point and the boundary, in the previously calculated descent
direction, and takes the smaller between this or t = 1.
For the detailed description some de?nitions and notations are needed.
De?nition 1. Let H denote a set of half-spaces, and ?(H) denote the encapsulating poly-
hedron, which is the intersection of all half-spaces in H:
?(H) =

{H ? H}.
Let ?x denote a descent direction in x. We denote by d
?x,P
the relative length of the line
segment starting at x in the direction of ?x, ending at the boundary of the encapsulating
polyhedron P:
d
P,?x
= sup{? : x +??x ? P}.
Now we have all the notions necessary to describe our Descent Direction Method With
Cutting Planes (DDMCP) in Algorithm 3.
Input: ?, ?, H
0
, x
0
Initialization:
¯
H
0
:= H
0
, k = 0
repeat
1 Search Direction. Determine a descent direction ?x
k
.
2 Cleaning. Remove half-spaces that are too close to x
k
in the direction ?x
k
:
¯
H
k
=
_
H ?
¯
H
k
\ H
0
| d
H,?x
k
> ? ?x
k

_
? H
0
.
3 Step size. Choose step size t = min
_
1,
1
2
d
P
k
,?x
k
_
, where P
k
= ?(H
k
) is the
encapsulating polyhedron.
4 Extend polyhedron.
¯
H
k+1
=
¯
H
k
? H
?f(x
k
)
, where H
?f(x
k
)
is the halfspace de?ned
by the hyperplane through x
k
with normal vector ?f(x
k
).
5 Update. x
k+1
= x
k
+t?x
k
, k = k + 1.
until ?x
k
< ?.
Algorithm 3: Descent direction method with cutting planes (DDMCP)
44 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
Formally, DDMCP can be regarded as a descent method as in Algorithm 1, with a
speci?c technique determining step size t, de?ned by steps 2-4 of DDMCP. In fact, the
main novelty of DDMCP lies in the determination of t. We avoid calling steps 2-4 of
DDMCP a ’line search’ though, because it does not necessarily decrease the objective, as
the term would suggest in most of the optimization literature. In DDMCP, as in most
descent methods, the method of determining the search direction is independent from the
method of determining the step size.
In Step 1 of DDMCP, the search direction is calculated in the same way as in an
appropriate descent method that the user would choose to solve the given minimization
problem. In this chapter we have chosen minimization problems, such as convex nonlinear
minimization, nonlinear least squares and maximum likelihood estimation, for which there
exist well-known descent methods, where the search direction can be calculated from local
derivatives only, that is, without using any derivative values from previous iterations. We
have compared the appropriate descent method to DDMCP.
The essential idea of DDMCP is formulated in Step 3. In this step the method calculates
the midpoint ¯ x
k
of the line segment in the search direction ?x
k
from the current search
point x
k
to the boundary of the current encapsulating polyhedron. Since the midpoint
equalizes the distance on the line segment from the borders, ¯ x
k
can be seen as a center
of the polyhedron with respect to the current search point and search direction. Thus,
the centering idea of localization methods with its intuitive advantages as described in the
introduction is included in DDMCP. The euclidean distance of x
k
from ¯ x
k
is
x
k
? ¯ x
k
=
1
2
d
P
k
,?x
k
?x
k
.
Thus, the formula
t = min
_
1,
1
2
d
P
k
,?x
k
_
means, that t is chosen to be 1, if
¯ x
k
?x
k
> ?x
k
.
In this case we reject ¯ x
k
to be the next search point and we take a shorter step with length
?x
k
to allow a rapid convergence inherited from the corresponding descent method. On
the other hand, we use the damping e?ect by accepting ¯ x
k
if
¯ x
k
?x
k
? ?x
k
.
3.2. THE DESCENT DIRECTION METHOD WITH CUTTING PLANES (DDMCP)45
Figure 3.2: A blocking cutting plane
Step 2 in DDMCP ensures that no cutting plane in
¯
H
k
\ H
0
can block the search point
to progress further along the search direction. To see how blocking can occur, consider ?rst
Step 4, where a cutting plane is added to the localization polyhedron
¯
H
k
. Remember, the
cutting plane with normal vector ?f(x
k
) through search point x
k
has a local separating
property in x
k
, that is, in a su?ciently small neighborhood of x
k
, the objective function
is greater behind the cutting plane than in front of it. For points on the hyperplane far
away from x
k
, this local separating property may not hold any more. Consider the case
when the algorithm approaches a point y on the hyperplane, where y does not have this
separating property. In this situation, no matter how close x
k
is to y, the search direction
points towards the blocking hyperplane.
Without deleting the blocking hyperplanes, as described in Step 2, the algorithm would
get stuck at the congestion point y. This situation is exhibited in Figure 3.2 on the gradient
?eld of a two dimensional objective function. Here, the blocking hyperplane is the bold
line in the quasi diagonal direction on the picture. The search points are indicated by
bullets. In this example, the search direction is the negative gradient, thus, the cutting
planes H
?f(x
k
)
are orthogonal to the search direction. In our numerical experiments, the
threshold for the distance from a hyperplane was ? = 10
?6
.
Note that for ease of presentation we have de?ned DDMCP for bound constraint op-
46 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
timization, but it works for convex-constrained problems as well, as we will show via the
GARCH example in Chapter 3.4.3. When convex constraints are applied, S is de?ned by a
polyhedron, i.e. by ?nitely many hyperplanes. The DDMCP algorithm itself does not need
to be changed for convex constrained problems. An extension for unconstrained problems
can also be considered. In this case, DDMCP needs to be changed, so that a hyperplane
from the initial polyhedron H
0
acting as a blocking hyperplane has to be replaced by a new
hyperplane with the same normal vector but further away from the current search point.
An important advantage of the proposed method DDMCP over descent methods is
that DDMCP does not include any line search steps, and, as a direct consequence, it does
not evaluate the objective. The price to pay for this improvement is that a growing set
of cutting planes need to be stored and processed. This cost should be insigni?cant for
low and medium scale problems though. As the numerical experiments will show, the
performance improvement is high especially when the cost of a function evaluation is high.
An other consequence of the missing function evaluations is that DDMCP is more robust
against ill-behaved objective functions, such as for example noisy objectives with many
local minima. This issue will be elaborated in Section 3.4.3 on the problem of GARCH
identi?cation.
As DDMCP can also be viewed as a centering or localization method, it is reasonable
to compare it with a well-established member of this family, the ACCPM. One important
di?erence between ACCPM and DDMCP relies in the implementation of the centering
idea, which is to ?nd some center of the encapsulating polyhedron ?(
¯
H) de?ned by the set
of m half-spaces with linear inequalities
p
T
i
(y ?o
i
) ? 0, i = 1 . . . m.
As described in the introduction, in ACCPM, redundant hyperplanes may arise. In
contrast, DDMCP automatically neglects redundant cutting planes, it considers only the
boundary of ?(
¯
H) when determining the new center on the line x
k
+ ??x
k
, ? > 0 and
DDMCP does not require any nontrivial optimization procedure to calculate it. Numer-
ical experiments have shown that DDMCP has a faster convergence rate than ACCPM,
and DDMCP actually inherits the fast convergence rate from the corresponding descent
method. An advantage of ACCPM is that it does not require any initial point.
Figure 3.3 shows the search paths for test function (3.7) generated by the three di?erent
algorithms: Newton method damped with backtracking line search, ACCPM and DDMCP
with search directions calculated exactly as in the Newton method. The initial point is
3.3. THEORETICAL ASPECTS 47
x
0
= (?2, 3)
T
, the minimizer x
?
= (1, 1)
T
. The normalized vector ?eld de?ned by the
search direction is drawn in the background. The initial polyhedron is [?3, 3] × [?3, 3].
One can realize in this example, that, in the ?rst few iterations, the descent method
approaches x
?
slowly, meaning that x
k
?x
?
decreases slowly as k increases. However,
if it gets close enough to x
?
, it converges very fast, it actually jumps very close to the
minimizer. In contrast, ACCPM approaches x
?
in early iterations faster, but gets slow in
the neighborhood of the minimizer. DDMCP is fast in both convergence phases.
?2 0 2
?2
0
2
Damped Newton
?2 0 2
?2
0
2
ACCPM
?2 0 2
?2
0
2
DDMCP
Newton
Figure 3.3: Search paths of the di?erent algorithms
3.3 Theoretical aspects
3.3.1 Convergence of descent methods
In this chapter we compare the proposed method with popular descent algorithms often
applied in practice for the particular minimization problem type. In all of these methods
the descent direction ?x is the solution of
B
k
?x
k
= ??f(x
k
), (3.3)
where B
k
is an n ×n matrix, the concrete form of which is part of the speci?cation of the
particular algorithm. The role of B
k
is to allow to determine a search direction that takes
a local second order approximation of f in x
k
into account. It is either the Hessian matrix
B
k
= ?
2
f(x
k
), (3.4)
48 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
or some approximation of it, which we will see in more detail in the next sections.
The descent algorithms are equipped with a simple and e?cient line search algorithm,
called Armijo backtracking line search, see Armijo (1966). It is based on the Armijo rule
ensuring a su?cient decrease of the objective:
f(x
k
+t?x
k
) ?f(x
k
) < ?t?f(x
k
)
T
?x
k
, (3.5)
where ? is a parameter, typically ? = 10
?4
.
The backtracking line search algorithm is given by Algorithm 4, in that t is halved
until the candidate search point satis?es the bound constraints and (3.5). Theorem 1
gives conditions for the convergence of the overall descent method. The theorem is a
straightforward reformulation of Theorem 3.2.4 in Kelley (1999). See Shi (2004) for the
convergence analysis of di?erent line search algorithms.
Initialization: t = 1
repeat
t = t/2
until x
k
+t?x
k
? S and f(x
k
+t?x
k
) ?f(x
k
) < ?t?f(x
k
)
T
?x
k
Algorithm 4: Armijo line search
Theorem 1. Let f be bounded from below in S and ?f be Lipschitz continuous, that is,
there exists a positive constant L ? R, such that
?f(x) ??f(y) ? Lx ?y , ?x, y ? S.
Assume that the matrices B
k
are symmetric and positive de?nite (spd) and there exist con-
stants ¯ ? and ? such that for the condition number ?(B
k
) ? ¯ ?, and B
k
= max
x=0
B
k
x
x
?
? for all k. Then,
lim
k??
?f(x
k
) = 0.
The positive de?niteness of B
k
is a crucial condition. It ensures the search direction
?x
k
to be a descent direction, that is, ?x
k
points in a direction where f(x
k
) is locally
decreasing.
We have explicitly chosen problem types, where B
k
is spd by construction. For other
problem types a regularization technique is usually applied, that is, after determining B
k
,
a µI tag is added to it, where I denotes the identity matrix and µ is some appropriately
chosen positive real number. This kind of regularization can also ensure to satisfy the other
3.3. THEORETICAL ASPECTS 49
conditions (uniform boundedness, being well-conditioned) in Theorem 1. Regularization
is out of the scope of this work, see for example Kelley (1999), Boyd and Vandenberghe
(2004), Fletcher (1980) for details, or Polyak (2007) for a recent result.
3.3.2 Remark on the convergence of DDMCP
The global convergence of descent methods relies on the su?cient decrease of the objective,
which is ensured by choosing an appropriate step size in the line search procedure. The
convergence of ACCPM relies on the advantageous theoretical properties of the analytic
center Ye (1997). DDMCP neither ensures su?cient decrease nor uses analytic centers,
thus, we cannot apply any of the two established concepts to prove convergence. In fact,
the proof of convergence remains a subject of future work.
In order to keep the fast numerical performance of DDMCP, while enforcing conver-
gence, we can consider combining DDMCP with a convergent descent method as spacer
steps. According to the Spacer Step Theorem in Luenberger (1973) Chapter 7.9 the follow-
ing holds. Given a convergent algorithm A and an algorithm B, which does not increase
the objective, we can compose a new algorithm C, in which in some iterations A is ap-
plied, in the remaining iterations B. Then, if A is applied in?nitely often, then C is also
convergent.
Notice that DDMCP, while aspiring to approach x
?
, at the same time may also increase
the objective, although it does decrease it usually. Having a line search algorithm that
decreases the objective function su?ciently, as for example the Armijo backtracking line
search, we can apply the following simple alternating technique to ensure convergence by
the Spacer Step Theorem (although we did not apply it).
Let N be a positive integer. If DDMCP actually decreased the objective (not necessarily
su?ciently), then, inserting a su?ciently decreasing line search algorithm at every lN
(l ? N) iteration would ensure convergence of the whole x
k
series. To get rid of the
problem of non-decreasing DDMCP steps we just have to keep track of the value of the
objective function at steps (l ?1)N and lN?1, and, before making a su?ciently decreasing
line search step, ensure non-increase by x
lN?1
= x
(l?1)N
if necessary. In the bad case, the
last DDMCP steps are rolled back and we lose the progress of the last N ? 1 iterations.
In this case we may set N = max(N/2, 1) expressing that we are loosing faith in DDMCP.
Then, in the worst case, if N = 2
u
(u ? N) we may have

u
i=1
2
i
? 1 = 2
u+1
? u ? 1
iterations that are rolled back.
50 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
3.4 Numerical performance evaluation
3.4.1 The convex case
In this section we deal with the convex case, that is, when the function
f : S ?R
is convex.
For this type of functions, usually the Newton method damped with Armijo backtrack-
ing line search is used, which we denote by Newton-bLS. Therefore, we have compared our
DDMCP method with Newton-bLS, such that Newton search direction is used in DDMCP.
Our algorithm will be denoted by DDMCP
Newton
in this case.
In the Newton method the matrix B
k
in (3.3) is the Hessian matrix, which is always
spd in the convex case. Convexity has another useful technical implication: a cutting
hyperplane with the gradient as its normal vector does not cut out the minimum point,
i.e.
?f(x)
T
(x ?x
?
) < 0, ?x ? S.
Note, that despite of this property, we still need to drop hyperplanes too close to the
searching point x
k
as described in Step 2 in the DDMCP algorithm, because blocking
hyperplanes as shown in Figure 3.2 can occur even in the convex case.
For completeness, we present numerical results for the ACCPM too. For simplicity, we
installed a trivial oracle answering ’yes’ if
x
k
?x
?
< ?, (3.6)
where ? is the tolerance level. Note that, for our performance comparison purposes, it is
su?cient to choose this theoretical oracle already knowing the minimizer. If (3.6) is not
satis?ed, the oracle returns a cutting plane with ?f(x
k
) as its normal vector.
Numerical results. For the numerical experiments we used the following two convex
test functions:
f
1
(x, y) = log (1 +e
x
) ?
1
2
x +
1
2
log
_
1 +e
2y
_
?
1
2
y, (3.7)
f
2
(x, y) = (x ?arctan x) sign x +
_
y ?
1
4
arctan 4y
_
sign y. (3.8)
3.4. NUMERICAL PERFORMANCE EVALUATION 51
The form of the functions along the two dimensions is the same, the y coordinate is
transformed a?nely by factors 2 and 4 respectively. Apart from the a?ne constants,
the partial derivatives for f
1
and f
2
have the form
e
z
1 +e
z
,
z
2
1 +z
2
sign z,
respectively, where z stands for x or y.
The rationale behind these functions is that the (undamped) Newton method diverges
if the starting point is far away from the minimum point (0, 0)
T
, thus, damping line search
steps are needed for convergence.
Using both of these test functions we generated four test problems, each corresponding
to di?erent rows in Table 3.1, the ?rst four to function type (3.7), the second four to
function type (3.8). For each problem we de?ned a di?erent minimum point c = (c
1
, c
2
)
T
as seen in the ?rst two columns and implemented it simply by shifting the appropriate test
function as
f
c
i
(x, y) = f
i
(x ?c
1
, y ?c
2
), i ? {1, 2} .
Thus, a test problem is de?ned by the function type and the minimum point c. A minimiza-
tion run was started from 100 di?erent initial points by the Newton-bLS and DDMCP
Newton
for each test problem, where the initial points were placed equidistantly on a circle line
around the origin. In each run the number of function evaluations, number of gradient and
Hessian evaluations were noted together with an evaluation index built from them as
e =
n(n + 1)
2
e
H
+ne
g
+e
f
, (3.9)
where e
g
, e
H
, and e
f
stand for the number of gradient, Hessian and function evaluations,
respectively, and n is the problem dimension (n = 2 in these cases). We calculated the
average from the 100 measurements of e
H
, e
f
, e
g
, and e. The performance of ACCPM
and DDMCP
Newton
is given as a ratio of the corresponding evaluation index relative to
the evaluation index of Newton-bLS. We omitted the metric e
f
for DDMCP
Newton
and
ACCPM, e
H
for ACCPM, because they were always zero for these algorithms. We ran
ACCPM only once for each test problem, as it does not need any initial point. The radius
r of the circle and also the size parameter d of the initial polyhedron d[?1, 1]
2
de?ned by
H
0
was also chosen di?erently depending on c. The termination tolerance was ? = 10
?6
for all three methods.
52 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
We present the results of the numerical performance comparison in Table 3.1. Accord-
ing to the evaluation index e, DDMCP
Newton
is the most e?cient and ACCPM is the least
e?cient algorithm in most runs. DDMCP
Newton
needed, on average, even less iterations
(gradient and Hessian evaluations) than Newton-bLS, con?rming the viability of the cen-
tering approach. We listed time measurement results as well. The measurements were
taken on a 2.8 GHz machine with an Intel Pentium 4 CPU with 2 GB RAM using Mat-
lab for all methods. For algorithm Newton-bLS, in column
¯
t
0
average execution times in
milliseconds can be seen. For the other two algorithms we present execution times relative
to t
0
. We can conclude that also in terms of execution time DDMCP
Newton
is the fastest
algorithm out of those tested. ACCPM, on average, needed approximately two orders of
magnitude more time than Newton-bLS, while in terms of the evaluation index, it was only
slightly worse. This discrepancy can be explained by the fact that the evaluation index
does not take into account the e?ort needed to ?nd the analytic center as described in Step
1 of the ACCPM algorithm.
Table 3.1: Convex minimization: computational e?ort and execution time comparison
Newton-bLS ACCPM DDMCP
Newton
c r d ¯ e
f
¯ e
g
= ¯ e
H
¯ e
0
¯
t
0
e
g
e
e
0
t
t
0
¯ e
g
= ¯ e
H
¯ e
¯ e
0
t
t
0
1 1 2 4 8.02 5.68 36.4 1.56 61 3.35 300.00 4.89 0.67 0.40
2 2 4 8 15.60 6.61 48.6 2.49 56 2.30 144.18 5.87 0.60 0.57
3 3 6 12 24.30 7.26 60.6 3.59 87 2.87 287.19 6.64 0.55 0.40
4 4 8 16 33.31 7.41 70.4 4.84 78 2.22 158.06 7.13 0.51 0.23
Average 20.31 6.74 54.0 3.12 70.5 2.69 222.36 6.13 0.58 0.40
3 3 5 10 38.27 22.28 149.7 7.81 86 1.15 135.98 21.13 0.71 0.48
6 6 10 20 50.12 23.75 168.9 9.65 69 0.82 59.90 22.74 0.67 0.39
9 9 15 30 53.24 24.05 173.5 9.08 95 1.10 142.84 23.80 0.69 0.55
12 12 20 40 61.82 24.59 184.8 9.87 103 1.11 147.21 24.50 0.66 0.54
Average 50.86 23.67 169.2 9.10 88.3 1.04 121.48 23.04 0.68 0.49
We also examined the sensitivity of the performance of DDMCP
Newton
to the size of the
initial polyhedron. We ?xed c = (0.5, 0.5)
T
, r = 2 and by increasing d we measured the
average number of iterations for both test function types. Results for test functions f
1
and
f
2
are depicted on Figure 3.4 on the left and on the right, respectively. We can conclude
that even for large initial hypercube sizes, DDMCP
Newton
needs only a few iterations more
than in the case of a relatively small initial hypercube.
3.4. NUMERICAL PERFORMANCE EVALUATION 53
5 10 15 20 25 30 35 40
5
5.2
5.4
5.6
5.8
6
6.2
d
I
t
e
r
a
t
i
o
n
s
50 100 150 200
21
21.5
22
22.5
23
d
Figure 3.4: Average iteration number necessary vs. size of initial hypercube
3.4.2 A nonconvex case: nonlinear least squares
We consider the nonlinear least-squares problem
min
x?S
f(x), f(x) =
1
2
m

i=1
r
2
i
(x), m > n, (3.10)
where each component r
i
: R
n
? R of the residual vector r(x) is a twice continuously
di?erentiable function. Let J(x) be the Jacobian matrix of r(x). A widely used numerical
algorithm to solve (3.10) is the Gauss-Newton algorithm with backtracking Armijo line
search Kelley (1999), which is considered to be very e?ective in practice. We denote this
algorithm by GN-bLS. In this method B
k
in the equation system (3.3) is
B
k
= J(x
k
)
T
J(x
k
), (3.11)
and the gradient can be computed as
?f(x) = J(x)
T
r(x).
We denote our method using the same descent direction as DDMCP
GN
.
Numerical results. Table 3.2 lists the nonlinear least squares test problems together
with the performance results. The problems are suggested in Mor´e et al. (1981), though
we restrict ourself to test functions with ?nitely many global minimizers. The initial
54 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
parameter points are also de?ned in Mor´e et al. (1981). The termination tolerance on
the length of the search direction was ? = 10
?6
. The number in the ?rst column of the
table is the problem identi?er from Mor´e et al. (1981), n is the dimension of the problem,
m denotes the number of squared residuals to sum. In the ?fth column we listed the
bound constraints for f, which was a hypercube de?ning the initial set H
0
. At some of
the test functions, B
k
was ill-conditioned on the negative cone [??, 0]
n
, thus, without
regularization, no search direction was available. As the minimum point x
?
resides in the
positive cone for these functions and regularization is out of the scope of this work, we got
rid of this issue by simply restricting the domain onto the positive cone. Note that one
popular regularization method is the Levenberg-Marquardt algorithm, see Kelley (1999).
Table 3.2: Least squares: computational e?ort comparison
Function GN-bLS DDMCP
GN
Id name n m H
0
e
f
e
g
e
0
e
g
e
e
0
32 Linear 10 20 5[?1, 1]
n
4 3 34 2 0.59
1 Rosenbrock 2 2 3[?1, 1]
n
42 12 66 4 0.12
7 Helical valley 3 3 2[?1, 1]
n
18 10 48 26 1.63
13 Powell singular 4 4 5[?1, 1]
n
44 23 136 22 0.65
8 Bard 3 15 3[?1, 1]
n
12 7 33 6 0.55
15 Kowalik & Osborne 4 11 [0, 1]
n
60 31 184 20 0.43
10 Meyer 3 16 7000[?1, 1]
n
23 11 56 8 0.43
20 Watson 9 31 8[?1, 1]
n
10 6 64 5 0.70
12 Box 3 10 15[?1, 1]
n
12 7 33 6 0.55
35 Chebyquad 5 5 [0, 1]
n
11 6 41 5 0.61
17 Osborne 1 5 33 6[?1, 1]
n
18 9 63 7 0.56
19 Osborne 2 11 65 8[0, 1]
n
27 13 170 13 0.84
Average 23.4 11.5 77.3 10.3 0.64
We used the same performance metrics as given in (3.9). Since no Hessian was eval-
uated, e
H
was always zero. The results are summarized in Table 3.2. Both GN-bLS and
DDMCP
GN
have found the same minima for all problems (up to four decimal digits). On
problems ‘Freudenstein & Roth’, ‘Jennrich & Sampson’, and ‘Brown & Dennis’ neither
of the algorithms converged, because of ill-conditioned Gauss-Newton matrices B
k
, so we
have excluded them from the table. As seen on the results, our algorithm performs better
than GN-bLS on most of the problems, except problem ’Helical valley’, where GN-bLS was
better. Without this problem, even the number of iterations (gradient evaluations) was
less than or equal to the one for GN-bLS, con?rming that the centering approach is quite
e?cient.
3.4. NUMERICAL PERFORMANCE EVALUATION 55
Table 3.3: Least squares: computational e?ort comparison with increased size of H
0
Function H
0
GN-bLS DDMCP
GN
Id name d ? {10, 10
2
, 10
3
} e
f
e
g
e
0
e
g
e
e
0
32 Linear 5d[?1, 1]
n
4 3 34 2 0.59
1 Rosenbrock 3d[?1, 1]
n
44 12 68 3 0.09
7 Helical valley 2d[?1, 1]
n
21 11 54 30 1.67
13 Powell singular 5d[?1, 1]
n
44 23 136 22 0.65
8 Bard 3d[?1, 1]
n
12 7 33 6 0.55
15 Kowalik & Osborne d[0, 1]
n
62 32 190 20 0.42
20 Watson 8d[?1, 1]
n
10 6 64 5 0.70
12 Box 15d[?1, 1]
n
12 7 33 6 0.55
35 Chebyquad d[0, 1]
n
11 6 41 5 0.61
17 Osborne 1 6d[?1, 1]
n
18 9 63 7 0.56
19 Osborne 2 8d[0, 1]
n
27 13 170 13 0.84
Average 24.1 11.7 80.5 10.8 0.66
Table 3.3 summarizes results of comparison tests, where we multiplied the side of H
0
by 10
1
, 10
2
, 10
3
. All three sizes gave the same results in terms of the evaluation index,
thus, d > 10 did not require more computational e?ort. With extended bounds, we have
got similar results as in Table 3.2, with the exception of test problem ‘Meyer’. For this test
function, matrix B
k
was often very close to singular resulting in unusable search directions,
so neither of the methods converged. The problem could be solved again by regularization,
which we do not consider in this work, thus it has been excluded from this study. Table 3.4
contains time measurement results for di?erent sizes of the bounding boxes. The execution
times in case d ? {10
1
, 10
2
, 10
3
} were almost the same for all three initial hypercube sizes,
so we listed time results only once.
Let us remark that we have also tested Gauss-Newton with a polynomial line search
algorithm (see Kelley (1999), Chapter 3.2.1), but in general, the performance results with
polynomial line search were even worse than with backtracking line search.
3.4.3 Maximum likelihood estimation of GARCH parameters
In this section we demonstrate the viability of our method on the important problem
of estimating the parameters of GARCH models. GARCH models are usually identi?ed
by maximum likelihood estimation, see Berkes et al. (2003); Bollerslev and Wooldridge
(1992) for results regarding the consistency and asymptotic normality of the estimator.
We consider in this chapter the o?-line GARCH parameter estimation problem, when the
56 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
Table 3.4: Least squares: execution time comparison. The index E indicates the cases of
extended H
0
’s as given in Table 3.3.
Function GN-bLS DDMCP
GN
Id name t
0
(ms) t
E
0
(ms)
t
t
0
t
E
t
E
0
32 Linear 3.1 2.8 0.2 0.3
1 Rosenbrock 9.8 9.4 0.1 0.2
7 Helical valley 5.5 6.3 2.0 2.0
13 Powell singular 12.2 12.6 0.7 0.7
8 Bard 5.5 5.8 0.4 0.4
15 Kowalik & Osborne 19.4 21.0 0.5 0.4
10 Meyer 8.1 - 0.5 -
20 Watson 10.0 8.1 0.5 0.8
12 Box 4.6 5.0 0.6 0.4
35 Chebyquad 3.3 4.4 1.2 0.6
17 Osborne 1 11.9 11.4 0.6 0.5
19 Osborne 2 35.6 34.2 0.6 0.6
Average 10.7 11.0 0.7 0.6
estimation is carried out after all the observation data is available.
Next, we explain the connection between parameter estimation and nonlinear optimiza-
tion, and we present a popular optimization method for this problem type. Then we shortly
describe the GARCH model along with a simple technique introduced here to reduce the
dimension of the optimization problem.
Consider the estimation problem regarding to the univariate stochastic model in form
of
F
t
(y
t
, ?
?
) = ?
t
, t ? Z,
where y
t
? R are observations, F
t
is a twice-di?erentiable scalar-valued function, ?
t
are
independent identically distributed (i. i. d.) normal disturbances and ?
?
? S ? R
d
is the
parameter vector to be estimated, or identi?ed, according to the system theoretic termi-
nology. A popular way of identifying such models is the maximum likelihood estimation,
which means the minimization of the negative log-likelihood function L
T
(?), i.e.
ˆ
? = arg min
?
L
T
(?), (3.12)
given the observations y
1
, y
2
, . . . y
T
, where
ˆ
? denotes the estimator, that is, the minimizer
point we are searching for. Thus, with notations used throughout this chapter, (3.12) can
3.4. NUMERICAL PERFORMANCE EVALUATION 57
be formulated as an optimization problem with objective
f(x) = L
T
(?)|
?=x
, x ? S.
In what follows, we consider only log-likelihood functions, which can be written as
L
T
(?) =
T

t=1
l
t
(y
t
, ?), (3.13)
where l
t
can be calculated from the joint density function of (y
1
, y
2
, . . . , y
t
). A widely used
algorithm for minimizing a likelihood function with property (3.13) is the BHHH method
Berndt et al. (1974) with backtracking Armijo line-search, denoted by BHHH-bLS. The
BHHH algorithm speci?es matrix B
k
in Algorithm 1 as the outer product matrix built
from partial derivatives
B
k
=
T

t=1
?l
t
(y
t
, ?
k
)
??
_
?l
t
(y
t
, ?
k
)
??
_
T
, (3.14)
where ?
k
= x
k
. Notice that B
k
is asymptotically spd and only the gradients are needed
to calculate it, which makes it very useful in practice. We denote our algorithm applying
matrices as de?ned in (3.14) by DDMCP
BHHH
.
We consider GARCH(p, q) models given as
y
t
=
_
h
t
?
t
, ?
t
? N(0, 1), i. i. d. (3.15)
h
t
= ?
0
+
q

i=1
?
i
y
2
t?i
+
p

j=1
?
i
h
t?j
, (3.16)
where the real parameter vector to be estimated is
(?
0
, ?
1
, . . . , ?
q
, ?
1
, . . . , ?
p
)
T
, (3.17)
with positive components. We also apply the stationarity condition given in Berkes et al.
(2003); Bollerslev (1986) and restrict the parameter domain as
q

i=1
?
i
+
p

j=1
?
i
< 1. (3.18)
58 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
In a GARCH(p, q) model there are (p+q+1) parameters to estimate. Now we introduce
a technique, with which we can reduce the dimension of the optimization problem to (p+q).
Let’s ?rst denote by ?
2
the unconditional variance which is the expected value of the
conditional variance h
t
:
?
2
= E(h
t
), ?t. (3.19)
Under stationarity we can calculate ?
2
from the model parameters as follows. Taking
expectation on both sides of (3.16) we can write
E(h
t
) = ?
0
+E(h
t
)
q

i=1
?
i
+E(h
t
)
p

j=1
?
i
,
where we used E(y
2
t
|h
t
) = h
t
and stationarity. Using (3.19), it immediately follows that
?
2
=
?
0
1 ?

q
i=1
?
i
?

p
j=1
?
i
. (3.20)
Assume that we can estimate ?
2
, let us denote the estimator by ˆ ?
2
. Then we can use
(3.20) to express ?
0
and eliminate it from the parameter vector. Doing so, we arrive at
the following modi?ed equation for the latent variance process, which only contains (p+q)
unknown variables:
¯
h
t
= ˆ ?
2
_
1 ?
q

i=1
?
i
?
p

j=1
?
i
_
+
q

i=1
?
i
y
2
t?i
+
p

j=1
?
i
¯
h
t?j
. (3.21)
For the numerical experiments we need to de?ne the initial values of
¯
h
t
and y
t
. We
assume that the initial values of h
t
and y
t
are equal to the unconditional variance ?
2
, i.e.
¯
h
t
= y
2
t
= ?
2
, ?t ? 0. (3.22)
In our experiments, in order to save iterations in the numerical optimization procedure,
we applied this dimension reduction technique, that is, we ?rst estimated ?
2
using
ˆ ?
2
=
1
T
T

t=1
y
2
t
, (3.23)
3.4. NUMERICAL PERFORMANCE EVALUATION 59
then we took the model in the form of (3.21) to estimate the rest of the parameter vector
? = (?
1
, . . . , ?
q
, ?
1
, . . . , ?
p
)
T
.
Note, that the estimator resulting from this dimension reducing scheme can deviate from
the estimator we would get by maximizing the original likelihood explicitly including ?
0
.
For completeness, we next specify l
t
and its derivatives, which are used to calculate B
k
.
The negative log-likelihood of the form (3.13), is calculated from
l
t
(y
t
, ?) =
1
2
_
log
¯
h
t
+
y
2
t
¯
h
t
_
, (3.24)
where some constants are neglected because they do not play any role in the optimization.
Let us now calculate the score function. Di?erentiating (3.24) yields
?l
t
??
=
1
2
1
¯
h
t
?
¯
h
t
??
_
y
2
t
¯
h
t
?1
_
,
where the partial derivatives of h
t
can be calculated recursively from (3.21) and (3.22) as
?
¯
h
t
??
=
_
_
y
2
t?1
, . . . y
2
t?q
,
¯
h
t?1
, . . .
¯
h
t?p
_
T
? ˆ ?
2
+

p
i=1
?
i
?
¯
h
t?i
??
, t > 0
0, t ? 0.
(3.25)
Note that a Hessian approximation based on (2.33) in Chapter 2 Section 2.5 results a
matrix formally very similar to what we get by (3.14), the di?erence is a scalar factor
only. Numerical tests have shown that both formulas yield essentially the same search
directions, except for some small intervalls of ?, when ?
?
is close to the boundary of the
search domain, where the adjusted version yields better directions. For this reason we
adjust (3.14) as follows:
B
k
=
T

t=1
c
t
· ?
?
¯
h
t
(?)
_
?
?
¯
h
t
(?)
_
T
, c
t
=
¯
h
?2
t
(?)
_
2y
2
t
¯
h
t
(?)
?1
_
.
Because of the stochastic nature of the problem, the minimum point x
?
of the negative
likelihood can fall onto the boundary of the parameter space de?ned in (3.17) and (3.18).
For simplicity, we handled this problem by modifying the termination criteria both for
DDMCP
BHHH
and the BHHH-bLS, as the given algorithm also stops when the search
point x gets closer to the boundary of the parameter space than a prede?ned tolerance.
60 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
Numerical results. For our numerical experiments regarding the GARCH identi?cation
we have taken the simplest model class with p = q = 1. The low dimension allowed us to
su?ciently cover the parameter space when choosing the true parameter for the model to
be identi?ed. The parameter space S can be de?ned by the following linear inequalities:
H
0
=
_
¸
_
¸
_
(?1, 0)y < 0,
(0, ?1)y < 0,
(1, 1)y < 1
_
¸
_
¸
_
,
where y ? R
2
.
For each parameter pair (?
?
1
, ?
?
1
)
T
, we generated T = 10
5
observations by simulation
and estimated the parameters, with an initial value x
0
= (0.2, 0.2)
T
. For simplicity, the
data was generated with a ?xed ?
?
0
= 1. In the estimation procedure we ?rst applied
the dimension reduction technique and used the optimization methods DDMCP
BHHH
and
BHHH-bLS to ?nd the best estimate for the remaining parameter vector (?
?
1
, ?
?
1
)
T
. We
use the term ’estimation run’ to indicate the two-stage procedure consisting of generating
observations then estimating the parameter vector. For each parametrization we made 5
estimation runs. In all of the methods, the termination tolerance was ? = 10
?6
both on
the length of search direction ?x and on the distance from boundary. We have set the
parameter ? = 10
?8
in the Armijo condition (3.5).
For the method BHHH-bLS we modi?ed the line search slightly: we allow at most 6
search steps only. This cap was necessary, because otherwise in some cases we experienced
very long line search cycles. These were the consequence of frequent consecutive local
minima and maxima of the noisy log-likelihood function along the search direction, causing
condition (3.5) to be satis?ed for only very small step sizes. A very small step size t made
the algorithm actually stuck far from the optimum. The capped algorithm got rid of
the possibly tiny steps and despite violating theoretical conditions of convergence, it has
not caused any convergence problems in the numerical experiments. Figure 3.5 exhibits
the situation of a line search failure at x
k
= ?
k
= (0.0525, 0.0822)
T
on a model with
?
?
= (0.05, 0.05)
T
. The Armijo reference line shows
f(x
k
) +?t?f(x
k
)
T
?x
k
, t > 0,
with tick marks on it indicating the places where the Armijo condition was tested by
the algorithm. The Armijo condition is satis?ed for a step size t, where the objective
3.4. NUMERICAL PERFORMANCE EVALUATION 61
0 1 2
x 10
?4
?4
?2
0
2
4
6
8
10
x 10
?11
t
L
i
k
e
l
i
h
o
o
d
f(x
k
+t ? x
k
)?14755.39
Armijo reference line
Figure 3.5: Backtracking line search fails on the noisy likelihood
f(x
k
+ t?x
k
) is below this line, which is not the case for any of the step sizes marked on
the line.
For an additional comparison, we have also tested the constrained nonlinear minimiza-
tion method from the Optimization Toolbox in Matlab 7.0.0, see Crummey et al. (1991).
This method is used by Matlab’s state of the art GARCH Toolbox for model identi?cation.
We did not test the GARCH toolbox directly, in order to be able to apply the dimension
reduction technique, to have the same problem dimensions for all algorithms under com-
parison. The Matlab method is a quite di?erent algorithm than our DDMCP
BHHH
and
BHHH-bLS. It is a sequential quadratic programming method using the BFGS formula
for updating the Hessian approximation. An essential di?erence is that the BFGS formula
updates B
k
using B
k?1
, while the two BHHH based methods do not use any derivative
values from previous iterations for calculating B
k
. Please consult Crummey et al. (1991)
for details of the Matlab method.
The rows of Table 3.5 contain the performance results of the estimation runs. We
listed the number of function evaluations, gradient evaluations and the evaluation index
ratios as introduced in (3.9). In terms of average of the evaluation index, the proposed
method is clearly better than the other two algorithms. In comparison to BHHH-bLS,
62 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
Table 3.5: GARCH parameter estimation: performance comparison
?
?
1
?
?
1
BHHH-bLS Matlab DDMCP
BHHH
ˆ
??
e
f
eg e
0
e
f
eg
e
e
0
eg
e
e
0
ˆ ?
1
??
?
1
ˆ
?
1
??
?
1
0.05 0.05
12 12 36 40 7 1.50 21 1.17 -0.0058 -0.05
?
1.4 -2.4
8 8 24 54 12 3.25 8 0.67 -0.0127 0.0589
13 13 39 31 7 1.15 21 1.08 0.0085 ?0.05
?
-2.4 412.9 18 17 52 55 12 1.52 17 0.65 -0.0114 0.0679
24 12 48 50 11 1.50 13 0.54 -0.0210 0.1669
0.90 0.05
8 8 24 50 9 2.83 7 0.58 -0.1276 0.0066
0.2 -0.2
8 8 24 55 10 3.13 8 0.67 -0.0662 -0.0086
8 8 24 55 10 3.13 7 0.58 -0.0921 0.0037
-0.2 0.3 7 7 21 52 10 3.43 8 0.76 -0.0520 0.0026
8 8 24 55 10 3.13 7 0.58 -0.0762 -0.0018
0.05 0.90
8 8 24 11 2 0.63 8 0.67 -0.0006 0.0010
?
0.3 -0.6
9 9 27 61 12 3.15 9 0.67 0.0068 -0.0215
11 9 29 56 10 2.62 10 0.69 0.0043 0.0002
-0.6 1.9 7 7 21 59 11 3.86 8 0.76 0.0142 -0.0151
9 9 27 11 2 0.56 9 0.67 -0.0103 0.0176
?
0.15 0.15
9 9 27 47 9 2.41 8 0.59 -0.0273 0.1236
1.8 -2.8
9 8 25 35 6 1.88 7 0.56 0.0096 -0.0301
13 12 37 44 9 1.68 11 0.59 -0.0131 -0.0307
-2.8 45.5 9 9 27 39 7 1.96 8 0.59 0.0060 0.0253
6 6 18 37 7 2.83 5 0.56 0.0086 -0.0535
0.15 0.75
6 6 18 50 9 3.78 7 0.78 0.0106 -0.0097
0.8 -1.3
6 6 18 48 9 3.67 6 0.67 0.0040 -0.0312
9 7 23 52 9 3.04 7 0.61 -0.0236 0.0384
-1.3 2.9 7 7 21 56 11 3.71 8 0.76 0.0088 -0.0025
7 7 21 52 10 3.43 9 0.86 0.0042 -0.0102
0.75 0.15
6 6 18 46 8 3.44 7 0.78 -0.0384 -0.0080
2.7 -3.1
13 7 27 47 9 2.41 6 0.44 -0.0900 0.0079
7 7 21 47 9 3.10 7 0.67 -0.0656 -0.0099
-3.1 3.7 8 7 22 45 8 2.77 6 0.55 -0.0386 -0.0236
8 8 24 62 12 3.58 6 0.50 0.0246 0.0010
0.30 0.30
8 8 24 50 9 2.83 7 0.58 -0.0180 0.0127
1.8 -2.7
8 8 24 37 7 2.13 7 0.58 -0.0206 -0.0128
6 6 18 42 8 3.22 5 0.56 0.0395 -0.0275
-2.7 10.0 7 6 19 42 8 3.05 5 0.53 0.0253 -0.0467
7 7 21 50 10 3.33 6 0.57 0.0176 -0.0062
0.30 0.60
9 9 27 53 11 2.78 8 0.59 0.0165 -0.0147
1.2 -1.7
8 8 24 48 10 2.83 8 0.67 0.0212 0.0031
15 9 33 53 11 2.27 6 0.36 -0.0101 0.0045
-1.7 2.6 7 7 21 50 10 3.33 7 0.67 0.0173 -0.0043
9 8 25 57 12 3.24 7 0.56 -0.0215 0.0217
0.60 0.30
7 7 21 53 10 3.48 8 0.76 -0.0069 0.0126
1.3 -1.6
9 9 27 47 9 2.41 7 0.52 0.0423 -0.0207
8 8 24 47 9 2.71 8 0.67 0.0393 -0.0115
-1.6 2.1 7 7 21 50 11 3.43 8 0.76 -0.0165 0.0234
9 8 25 44 9 2.48 7 0.56 -0.0150 -0.0171
Average 8.27 7.55 23.4 43.4 8.39 2.50 7.61 0.60 -0.01 0.00
3.5. CONCLUSION 63
40% of the computational e?ort is saved by our algorithm on average. In almost all of
the cases BHHH-bLS achieved better performance than the general Matlab optimization
routine. Table 3.6 shows the average duration time needed by a BHHH-bLS estimation for
each parametrization. We list for the Matlab and DDMCP
BHHH
duration ratios relative
to BHHH-bLS. On average, DDMCP
BHHH
is approximately two times faster than BHHH-
bLS and 4.5 times faster than Matlab. Running the Matlab rutine without dimension
reduction, our algorithm is approximatly an order of magnitude faster. Note that in case
of the GARCH model, function evaluations are especially costly, because evaluating the
log-likelihood contains enormous number of logarithm calculations.
Table 3.5 also shows the estimated parameters ˆ ?
1
,
ˆ
?
1
of our algorithm, which matched
the results of the other two algorithms with a precision of 10
?4
in most of the runs.
Deviations are listed in Table 3.7, the corresponding rows in Table 3.5 are marked with
asterisk at the column
ˆ
?
1
? ?
?
1
. Deviations arose near the boundary of the parameter
space. In the case of ?
?
= (0.05, 0.05)
T
, in two out of ?ve estimation runs the algorithms
stopped for getting closer to the boundary than the speci?ed tolerance. In the case of
?
?
= (0.05, 0.90)
T
, in two estimation runs, the Matlab routine stopped prematurely, after
only two iterations at
ˆ
? = (0, 1)
T
. In contrast, BHHH-bLS and DDMCP
BHHH
have found
the optimum near the true parameters.
As for the accuracy of the estimation, we can realize, that
ˆ
?
1
varies considerably, when
the true model parameters ?
?
1
, ?
?
1
are both small, especially in case of ?
?
= (0.05, 0.05)
T
. In
order to examine this property we calculated an approximation of the asymptotic covariance
matrix as
ˆ
?
?
(?
?
) =
ˆ
I (?
?
)
?1
=
_
1
T
T

t=1
?l
t
(?
?
)
??
_
?l
t
(?
?
)
??
_
T
_
?1
,
where T = 10
6
and
ˆ
I denotes the approximated Fisher information matrix. We can
realize that the asymptotic variance of
ˆ
?
1
corresponding to the lower right corner value of
ˆ
?
?
is high in the high variation cases, indicating that this is a model inherent property.
Therefore, we can conclude that GARCH models with small coe?cients are more di?cult
to identify.
3.5 Conclusion
In this chapter we have presented a hybrid algorithm DDMCP for bound constrained
optimization. The main novelty over descent methods is the way of damping. Instead
64 CHAPTER 3. EFFICIENT OFF-LINE MODEL IDENTIFICATION
Table 3.6: GARCH: average estimation time comparison
?
?
1
?
?
1
t
BHHH-bLS
(sec)
t
Matlab
t
BHHH-bLS
t
DDMCP
BHHH
t
BHHH-bLS
0.05 0.05 0.4376 1.39 0.59
0.90 0.05 0.2596 2.56 0.48
0.05 0.90 0.2840 1.79 0.51
0.15 0.15 0.2940 1.79 0.44
0.15 0.75 0.2218 2.90 0.56
0.75 0.15 0.2408 2.57 0.48
0.30 0.30 0.2342 2.39 0.47
0.30 0.60 0.2838 2.38 0.43
0.60 0.30 0.2564 2.41 0.49
Average 0.2791 2.24 0.49
Table 3.7: Deviations in ˆ ?
1
,
ˆ
?
1
?
?
1
?
?
1
BHHH-bLS Matlab DDMCP
BHHH
ˆ ?
1
ˆ
?
1
ˆ ?
1
ˆ
?
1
ˆ ?
1
ˆ
?
1
0.05 0.05
0.0396 0.0000 0.0406 0.0000 0.0442 0.0000
0.0580 0.0000 0.0597 0.0000 0.0585 0.0000
0.05 0.90
0.0494 0.9010 0.0000 1.0000 0.0494 0.9010
0.0397 0.9176 0.0000 1.0000 0.0397 0.9176
3.5. CONCLUSION 65
of line search, which ensures su?cient decrease of the objective, DDMCP uses separating
hyperplanes to bound the step size in the search direction. Performance gain is achieved by
inheriting the centering idea of localization methods in the damped phase, while preserving
the rapid convergence rate of descent methods when the search point is su?ciently close
to the minimizer. Furthermore, DDMCP does not evaluate the objective function.
We have shown empirically the e?ciency of the method. We tested DDMCP for three
problem types: convex nonlinear minimization, nonlinear least squares minimization and
GARCH model identi?cation. For all the three problem types, for the majority of the test
problems, the performance of DDMCP was superior to the corresponding well established
descent methods with backtracking line search. For each problem type, we examined the
robustness of DDMCP from a di?erent perspective.
In the case of convex nonlinear minimization, we compared DDMCP to the Newton
method by starting it with di?erent initial values and we examined how sensitive the
performance is against the size of the initial polyhedron.
For the least squares problem type, we applied it against the Gauss-Newton method
on a large set of di?erent problems developed speci?cally for optimization test purposes.
In the GARCH case, the objective function exhibits numerous local minima and maxima
around the global minimizer. We can conclude that DDMCP is not too sensitive to the
initial value x
0
and the size of the initial polyhedron. DDMCP works well for the di?erent
test problems. The numerical results have also shown that DDMCP is quite robust against
local minima and maxima in the noisy objective function, unlike the BHHH method with
backtracking line search.
Bibliographic remarks
The paper based on the work presented in this chapter has been accepted for publication
in the Central European Journal of Operations Research, see Torma and G.-T´oth (2010).
The algorithm and the test concept are based on the ideas of Bal´azs Torma, the nu-
merical experiments were conducted by him as well. The work has been carried out with
cooperation of Bogl´arka G.-T´oth.
Chapter 4
Modelling information arrival
processes
4.1 Introduction
4.1.1 Context and motivation
Stochastic systems driven by point processes arise in many applications, see Daley and
Vere-Jones (2003a) for a modern and comprehensive treatise. The present investigations
were motivated by application areas in ?nance. First, as demonstrated in the description
of the ACF model in Section 2.2, point processes are intuitive models for news arrival
processes. Since the observable intensity of news arrival can be used as a proxy for price
volatility, see Kalev et al. (2004), modelling the news arrival dynamics is an important
subject of investigations. In addition, news arrival models are a widely unexplored research
area. Second, stochastic systems driven partially by point processes are widely used in
?nancial mathematics, in particular to study credit risk processes on bond markets. In
this case credit events (defaults) results in jumps in the state of the system. Letting
p(t) = p(t, T) be the price of a zero-coupon T-bond at time t ? T the price dynamics,
written in multiplicative form in terms of returns, is in its simplest form
dp(t) = p(t?)(?dt +?dW(t) +?dN(t)),
where N(t) is a counting process, counting the number of credit events up to time t.
By letting T vary, and using a suitable re-parametrization in terms of so-called forward
rates we get the extension of the celebrated HJM (Heath-Jarrow-Morton) model. Good
66
4.1. INTRODUCTION 67
references for bond markets with jumps are Giesecke et al. (2004); Runggaldier (2003).
In consequence of analysts’ herding behavior, see Welch (2000), market news appearing
on the market about a company can call analysts’ attention on that company and thus
make analysts generate more news. This intuition can be captured by a homogeneous
self-exciting point process N(t), also called Hawkes-processes, or Poisson cluster-process,
see Hawkes (1971b,a), where N(t) counts the news events up to time t. This is de?ned, in
its simplest form, by the feedback system
dN(t) = ?(t)dt +dm(t) (4.1)
d?(t) = ?a(?(t) ?m)dt +?dN(t), (4.2)
where ?(t) is the intensity process, dm(t) is a kind of white noise (a martingale di?erential),
a > 0 takes care of mean-reversion, and m > 0 is a steady-state value of the intensity. Self-
excitation is due to the fact that there is a positive correlation between the intensity and
the event process: an increase in the number of events temporally increases the intensity
of the event process. This is counter-balanced by a mean-reversion e?ect: the intensity
would tend to return to its stationary value m if no events takes place.
The above jump-di?usion model extends naturally to multi-variable models. The iden-
ti?cation (calibration) of these models, in particular the estimation of cross-e?ects is a hot
area of research. This leads to a the problem of the estimation of the internal dynamics of
the Hawkes process.
A recursive maximum-likelihood estimation will be developed and tested. The simula-
tion of Hawkes processes itself is a non-trivial problem, see Moller and Rasmussen (2005),
Moller and Rasmussen (2006). The weak point of these simulation methods is that they
are inherently non-dynamic. An alternative simulation technique will also be presented
in this chapter. Finally, the Fisher information matrix of the estimation problem will be
investigated.
4.1.2 Point processes
This introduction is based on Runggaldier (2003) and Bremaud (1981). A point process is
an increasing sequence of random times T
n
de?ned over some probability space (?, F, P)
with T
0
= 0 and T
n
< T
n+1
if T
n
< ?. A common assumption is that the process is
non-explosive, meaning that limT
n
= ?. Alternatively, the associated counting process
N(t) := #{i : T
i
? t}, counting the number of events up to time t, is also called a point
68 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
process. The internal history of a point process is de?ned by F
N
(t) = ?{N(s) : s ? t}. In
general, a ?ltration F(t) is called a history, if it is a re?nement of F
N
(t). An additional
technical assumption is that N(t) is integrable, i.e., E (N(t)) < +? for all t.
Homogeneous Poisson process. A prime example for a point process is the homoge-
neous Poisson process with constant, ?nite intensity ?: N(t) adapted to F(t) is called
homogeneous Poisson process if N
0
= 0, N(t) has independent stationary increments and
N(t +h) ?N(t) ? Poisson(?h), h > 0.
A well-known property of Poisson processes is that
M(t) = N(t) ??t
is a martingale, also called as compensated Poisson process. Indeed, using the properties
from the de?nition, for any 0 ? s < t we have
E (M(t) ?M(s) | F(s)) = E (N(t) ?N(s) | F(s)) ?E (? · (t ?s) | F(s)) = 0.
Another well-known property of Poisson processes is that the waiting time until the next
occurrence follows the exponential distribution with parameter ?.
General point processes. For general point processes the (stochastic and/or time-
variant) intensity is de?ned as follows: let (N(t)) be a point process, adapted to F(t),
and let ?(t) be a non-negative, locally integrable and F(t)-progressively measurable pro-
cess. If for all non-negative, adapted and left-continuous process C(t)
E
__
?
0
C(s)dN(s)
_
= E
__
?
0
C(s)?(s)ds
_
(4.3)
holds, then we say that N(t) admits the (P, F(t))-intensity ?(t), see Bremaud (1981) for
details. (Progressive measurability is a technical condition, meaning basically that the
whole trajectory is measurable: for all T > 0, ?(t)?
[0,T]
(t) is B(R) × F(T)-measurable,
where ? is the indicator function.) The integral on the left hand side is to be interpreted
as follows:
t
_
0
C(s) dN(s) =
N(t)

n=1
C(T
n
).
An intuitive way to construct a point process with intensity ?(t) is as follows. Consider
4.1. INTRODUCTION 69
a Poisson process N(x, y) on the plane with unit intensity. In this case the number of
points on every two disjoint areas are independently distributed according to the Poisson
distribution with intensities equaling the Lebesgue measure of the areas. Then, N(t) can
be considered as the number of points under ?(t):
N(t) =
_
[0,t]
_
[0,?(s)]
dN(s, .)ds.
4.1.3 Hawkes processes
The Hawkes process (N(t))
t?R
is a self-exciting point process. Its intensity ? depends on
the past of the process through the formula
?(t) = m+
_
(??,t)
g(t ?u)dN(u), (4.4)
where m ? 0 and g : [0, ?) ?[0, ?).
A necessary condition for (4.4) to have a stationary solution with ?(t) ? L
1
is that
_
?
0
g(u)du < 1. (4.5)
This condition is su?cient for the existence of a stationary process with the structure given
above, see Hawkes and Oakes (1974).
In this chapter we consider a class of Hawkes processes, proposed in Gerencs´er et al.
(2008b). In this model the intensity ? satis?es the linear state equations
dx(t) = ?Ax(t)dt +bdN(t), (4.6)
?(t) = c
T
x(t?) +m. (4.7)
with a matrix A and vectors b, c, such that the system’s impulse response is non-negative,
i.e.
g(u) = c
T
e
?Au
b ? 0, for all u ? 0. (4.8)
We consider only matrices A, for which ?A is stable. The sample path of the state process
x is right continuous with left limits. The left limit at a given time t is denoted by x(t?).
In this model the coordinates of x represent activities of di?erent analysts (or di?erent
analyst groups) with respect to a given company (or industry sector). The parameter b
70 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
controls how much analyst coverage increases on news events and parameter a expresses
the degree of the coverage deterioration.
The stability condition (4.5) for system (4.6)-(4.7) reads as
_
?
0
c
T
e
?At
bdt = c
T
A
?1
b < 1. (4.9)
The expected value of the intensity can be calculated form (4.4) using (4.3) as follows:
E (?) = m+ E (?)
_
?
0
c
T
e
?At
bdt, (4.10)
thus
E (?) =
m
1 ?c
T
A
?1
b
. (4.11)
From (4.11) we see that the intensity process is transient if (4.9) does not hold.
The log-likelihood of the observation (N(t))
0?t?T
can be written as
L
T
(?) =
_
T
0
?
ˆ
?(t)dt +
_
T
0
ln
ˆ
?(t)dN(t)
where
ˆ
?(t) =
ˆ
?(t, ?) is the solution of (4.6)-(4.7) with the observed point process N(t) and
parameter vector ?, see e.g. in Daley and Vere-Jones (2003b). The Fisher information
contained in the observation of (N(t))
0?t?T
is
I
T
(?) = E
_
??
2
?
L
T
(?)
_
= E
__
T
0
?
2
?
ˆ
?(t)dt ?
_
T
0
?
2
?
ln
ˆ
?(t)dN(t)
_
.
We obtain that the Fisher information contained in the observation can be written as
I
T
(?) =
_
T
0
E
_
(?
?
ˆ
?(t))(?
?
ˆ
?(t))
T
ˆ
?
2
(t)
?(t)
_
dt. (4.12)
Assuming that the initial state x(0) is known, then
ˆ
?(t) = ?(t) when ? is the true pa-
rameter. Note that if ? equals the true parameter, with arbitrary initialization we have
ˆ
?(t) = ?(t) + c
T
e
?At
x(0), thus the error decays exponentially fast. From identity (4.12)
and the ergodicity we can see that I
T
/T has a limit, which we call the time–normalized or
4.2. SIMULATION OF HAWKES PROCESSES 71
asymptotic Fisher information and denote by I(?), i.e.
I(?) = E
_
?
?
?
T
?
?
_
,
where the expectation is taken with respect to the stationary distribution and ?
?
stands
for the derivative ?
?
ˆ
?. Recall that ?
?
(t) is the derivative of the calculated intensity.
Standard Hawkes processes. In the simplest case as introduced in Hawkes (1971a),
(4.6)-(4.7) reduces to
d?(t) = ?a(?(t) ?m)dt +?dN(t) (4.13)
where a, ? = bc, m are positive real parameters, ?
T
= (a, ?, m). For the standard Hawkes
process the stability condition simpli?es to ? < a.
4.2 Simulation of Hawkes processes
In order to be able to examine Hawkes processes we have to simulate them with given
parameters. This seemingly innocent problem is in fact quite challenging, see Moller and
Rasmussen (2005), Moller and Rasmussen (2006). Unfortunately, these simulation methods
are inherently o?-line. An alternative procedure is to actually generate the sequence of
events, such as news or credit events using the following observation: if t
n
is a Poisson
process with deterministic intensity ?(t), then with
M(t) =
_
t
0
?
u
du
the process ?
n
= M(t
n
) is a homogeneous Poisson-process with intensity 1. Thus, we can
generate t
n
by the inverse mapping
t
n
= M
?1
(?
n
).
Let us now assume that T
n
has been constructed for the Hawkes process with given
dynamics. Let ?
n
be as above. Then solve (4.6)-(4.7) with dN
t
= 0, and noting that the
solution has a simple, closed form, de?ne T
n+1
by the equation
_
T
n+1
Tn
?
u
du = ?
n+1
??
n
.
72 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
For T
n
? t < T
n+1
the dynamics of x can be written from (4.6) as
x(t) = e
?A(Tn?t)
(x(T
n
?) +b) . (4.14)
Put ?T
n+1
= T
n+1
?T
n
. Thus, having x(T
n
?), T
n
, our goal is to calculate ?T
n+1
. Substi-
tuting (4.14) into (4.7) we get
_
T
n+1
Tn
?
u
du = m?T
n+1
+c
T
_
I ?e
?A?T
n+1
_
(x(T
n
?) +b) = ?
n+1
,
where ?
n+1
? Exponential(1) i. i. d. The above equation can be solved by standard
numerical root-?nder procedures.
Figure 4.1 shows the trajectories of x and ? of a simulated Hawkes process with pa-
rameters
A =
_
0.08 0
0 0.02
_
, b =
_
1
1
_
, c =
_
0.02
0.01
_
, m = 0.02.
0
0.5
1
1.5
2
2.5
x
1
,
t
0
1
2
3
4
5
x
2
,
t
0 10 20 30 40 50 60 70 80 90 100
0
0.02
0.04
0.06
0.08
0.1
0.12
Time (s)
?
t
Figure 4.1: A simulated Hawkes process. The symbol × indicates the event points.
4.3. IDENTIFICATION OF HAWKES MODELS 73
4.3 Identi?cation of Hawkes models
To develop a maximum-likelihood estimator choose an arbitrary parameter ?, and de?ne
the frozen-parameter process
ˆ
?(t) =
ˆ
?(t, ?) by
ˆ
?(t) = m +
_
(0,t]
g(t ?u, ?)dN(u), (4.15)
with initial condition
ˆ
?(0) = m. The asymptotic negative log-likelihood function is de?ned
as
W(?) = lim
1
T
_
T
0
_
log
ˆ
?(t) dN(t) ?
ˆ
?(t) dt
_
= E
_
(log
ˆ
?(t)) · ?(t) ?
ˆ
?(t)
_
, (4.16)
assuming stationary initialization for
ˆ
?(t) =
ˆ
?(t, ?). Here we made use of the formal
calculus E (f(t)dN(t) | F(t)) = f(t)?(t)dt for any left-continuous, adapted process f(t).
The maximum-likelihood estimate for Hawkes processes has been studied in depth in
Ogata and Akaike (1982). The asymptotic properties of the maximum likelihood estimates
of point processes have been discussed in Ogata (1978). Di?erentiating with respect to ?
leads to the likelihood equation
?
??
L
T
(?) =
_
T
0
?
??
ˆ
?
t
·
_
dN
t
ˆ
?
t
? dt
_
= 0. (4.17)
In asymptotic terms we would get
?
??
W(?) = ?E
_
ˆ
?
?
(t) ·
_
?(t)
ˆ
?(t)
?1
_
_
, (4.18)
assuming stationary initialization for
ˆ
?
t
=
ˆ
?
t
(?). Obviously, this becomes 0 for ? = ?
?
.
The asymptotic Fisher information matrix is
I(?
?
) =
?
2
??
2
W(?)
|?=?
? = E
_
1
ˆ
?(t, ?
?
)
·
ˆ
?
?
(t, ?
?
)
ˆ
?
T
?
(t, ?
?
)
_
(4.19)
with the right initialization, see also Ogata (1978). In practice, a wrong initialization
does not really matter, since the error decays exponentially fast. The computation of the
gradient process
ˆ
?
?t
=
?
??
ˆ
?
t
is straightforward.
74 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
The process
ˆ
?
t
and its derivatives are explicitly computable between any two events by
solving a linear di?erential equation with constant coe?cients. Having
ˆ
?(t) and
ˆ
?
?
(t) at
our disposal, the o?-line ML (maximum likelihood) estimate can be easily obtained by a
gradient method.
The conceptual framework of the quasi-Newton-type recursive maximum likelihood
method (RML) in continuous time is given as follows:
d
ˆ
?(t) =
1
t
ˆ
H
?1
(t)g(t) (4.20)
d
ˆ
H(t) =
1
t
_
H(t) ?
ˆ
H(t)
_
, (4.21)
where
ˆ
?(t) denotes the current estimate at time t and g(t) and H(t) are online approx-
imations of the gradient and Hessian of the asymptotic negative log-likelihood function,
respectively. The initial Hessian
ˆ
H(0), and the initial parameter
ˆ
?(0) is set by the user,
usually
ˆ
H(0) = I. In case of our Hawkes processes (4.6)-(4.7) we apply
g(t) =
˜
?
?
(t) ·
_
dN(t)
˜
?(t)
? dt
_
(4.22)
H(t) =
1
˜
?(t)
·
˜
?
?
(t)
˜
?
T
?
(t), (4.23)
where
˜
?(t),
˜
?
?
(t) are on-line estimates of
ˆ
?(t, ?(t)),
ˆ
?
?
(t, ?(t)), respectively. Note that,
instead of (4.21), we can apply the following recursion to calculate
ˆ
H
?1
without matrix
inversion:
d
ˆ
H
?1
(t) = ?
ˆ
H
?1
(t)d
ˆ
H(t)
ˆ
H
?1
(t) = (4.24)
=
1
t
_
ˆ
H
?1
(t) ?
ˆ
H
?1
(t)
1
˜
?(t)
˜
?
?
(t)
˜
?
T
?
(t)
ˆ
H
?1
(t)
_
= (4.25)
=
1
t
_
ˆ
H
?1
(t) ?
1
˜
?(t)
_
ˆ
H
?1
(t)
˜
?
?
(t)
__
ˆ
H
?1
(t)
˜
?
?
(t)
_
T
_
, (4.26)
where we used that
ˆ
H
?1
(t) is symmetric. Applying recursion (4.26) saves the computational
time needed for the matrix inversion.
Identi?cation of the standard Hawkes model. Let us now complete the di?erential
equation system (4.20)-(4.23) with the speci?cations for
˜
?(t),
˜
?
?
(t). For simplicity, we
shall consider ?rst the standard Hawkes model (4.13). Let the true parameters be denoted
4.3. IDENTIFICATION OF HAWKES MODELS 75
by ?
?
= (a
?
, ?
?
, m
?
). The frozen-parameter process
ˆ
?(t, ?) is then de?ned by
d
ˆ
?(t) = a(
ˆ
?(t) ?m)dt +?dN(t). (4.27)
Di?erentiating (4.27) with respect to ? = (a, m, ?) we get:
d
ˆ
?
at
= a
ˆ
?
a
(t) dt +
_
ˆ
?(t) ?m
_
dt (4.28)
d
ˆ
?
?
(t) = a
ˆ
?
?
(t) dt +dN(t) (4.29)
d
ˆ
?
m
(t) = a
ˆ
?
m
(t) dt ?a dt. (4.30)
Recall that
ˆ
?(t) = (ˆ a(t), ˆ ?(t), ˆ m(t)) denotes the current estimate at time t, see (4.20).
De?ne
d
˜
?(t) = ˆ a(t)
_
˜
?(t) ? ˆ m(t)
_
dt + ˆ ?(t)dN(t), (4.31)
and similarly for the approximations of the derivatives:
d
˜
?
a
(t) = ˆ a(t)
˜
?
a
(t) dt +
_
˜
?(t) ? ˆ m(t)
_
dt (4.32)
d
˜
?
?
(t) = ˆ a(t)
˜
?
?
(t) dt +dN(t) (4.33)
d
˜
?
m
(t) = ˆ a(t)
˜
?
m
(t) dt ? ˆ a(t) dt. (4.34)
Thus, solving the di?erential equation system given by (4.20)-(4.23) and (4.31)-(4.34) gives
the quasi-RML estimate
ˆ
?(t). Note that we apply a basic resetting mechanism at T
n
to
keep
ˆ
? in the stability domain, see Gerencs´er and M´aty´as (2007a).
Figure 4.2 demonstrates the convergence of
ˆ
?(t) in a standard Hawkes model with
?
?
= (0.35, 0.3, 0.1).
Identi?cation of the general Hawkes model. In the class of Hawkes models (4.6)-(4.7)
we are interested in estimating the true system parameters A
?
, b
?
, c
?
, m
?
. We can simplify
presentation by considering these variables as a known function of some real parameter
vector ?:
A = A(?), b = b(?), c = c(?), m = m(?). (4.35)
For a simple example, ? could be a vector collecting the elements of A, b, c, m with mappings
connecting the corresponding elements of ? and A, b, c, m.
76 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
?0.8
?0.6
?0.4
?0.2
0
0.2
0.4
0.6
n
ˆ?
(
T
n
)


(?a) m ?
Figure 4.2: Recursive estimation of the parameters for the test problem
Then, for any ?, de?ne
dˆ x(t) = ?Aˆ x(t)dt +bdN(t), (4.36)
ˆ
?(t) = c
T
ˆ x(t?) +m, (4.37)
with ˆ x(0) = 0. Di?erentiating (4.36)-(4.37) with respect to ? we get:
dˆ x
?
(t) = ?A
?
ˆ x(t)dt ?Aˆ x
?
(t)dt +b
?
dN(t), (4.38)
ˆ
?
?
(t) = c
T
?
ˆ x(t?) +c
T
ˆ x
?
(t) +m
?
. (4.39)
Put
ˆ
X(t) =
_
ˆ x(t)
ˆ x
?
(t)
_
,
ˆ
?(t) =
_
ˆ
?(t)
ˆ
?
?
(t)
_
.
Then we can write (4.36)-(4.39) in matrix form:
d
ˆ
X(t) = ?
_
A 0
A
?
A
_
ˆ
X(t)dt +
_
b
b
?
_
dN(t), (4.40)
d
ˆ
?(t) =
_
c
T
0
c
T
?
c
T
_
ˆ
X(t) +
_
m
m
?
_
. (4.41)
Finally, having the current estimate
ˆ
?(t) at hand, we approximate
ˆ
X(t,
ˆ
?(t)) and
ˆ
?(t,
ˆ
?(t))
by
˜
X(t) =
_
˜ x(t)
˜ x
?
(t)
_
,
˜
?(t) =
_
˜
?(t)
˜
?
?
(t)
_
, (4.42)
4.3. IDENTIFICATION OF HAWKES MODELS 77
respectively, for which
d
˜
X(t) = ?
_
ˆ
A(t) 0
ˆ
A
?
(t)
ˆ
A(t)
_
˜
X(t)dt +
_
ˆ
b(t)
ˆ
b
?
(t)
_
dN(t), (4.43)
d
˜
?(t) =
_
ˆ c
T
(t) 0
ˆ c
T
?
(t) ˆ c
T
(t)
_
˜
X(t) +
_
ˆ m(t)
ˆ m
?
(t)
_
. (4.44)
Thus, solving the di?erential equation system given by (4.20)-(4.23) and (4.42)-(4.44) gives
the quasi-RML estimate
ˆ
?(t).
Identi?ability issues. Let the invertible matrix T de?ne a similarity transformation.
Let us apply T on the state x of the system de?ned by (A
?
, b
?
, c
?
, m
?
). Note that apply-
ing a similarity transformation has no e?ect on the impulse response function as it only
de?nes an alternative basis for the state vector. However, it transforms the system para-
meters into (T
?1
(A
?
)T, T
?1
(b
?
), (c
?
)
T
T, m
?
). Thus, since T can be chosen arbitrarily, there
are in?nitely many system parametrizations describing the observed point process N(t).
Hence we have to reduce the dimension of the parameter space in order the system to be
identi?able. To this end, we assume that
(i) A
?
is diagonal,
(ii) b
?
= (1, 1)
T
.
Simulation results. We present simulation results for a model with two-dimensional
x, where ? = (A
11
, A
22
, c
1
, c
2
, m). Figure 4.3 demonstrates the convergence of
ˆ
?(t), the
true model parameters are as de?ned in (4.2). The empirical eigenvalues of the Fisher
information matrix calculated from data T
n
generated by this model are
z =
_
_
_
_
_
_
_
_
0.0586
0.5915
3.3189
10.3902
234.6782
_
_
_
_
_
_
_
_
.
The condition number is moderately high (4004.7) in this setting. All of the eigenvalues are
signi?cantly bigger than zero, suggesting that the model is indeed identi?able. For com-
parison, we mention that we calculated almost zero (1.22· 10
?5
) for the smallest eigenvalue
in an overparametrized case, when ? includes the elements of b as well.
78 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
0 0.5 1 1.5 2 2.5 3
x 10
4
0
0.02
0.04
0.06
0.08
0.1


0 0.5 1 1.5 2 2.5 3
x 10
4
0
0.005
0.01
0.015
0.02
0.025
0.03


0 0.5 1 1.5 2 2.5 3
x 10
4
0.015
0.02
0.025
0.03
n
ˆ
m
ˆ
A11
ˆ
A22
ˆ c1 ˆ c2
Figure 4.3:
ˆ
?(T
n
) in the Hawkes model with two-dimensional state.
4.4 The Fisher information matrix
Under general conditions the asymptotic accuracy of the identi?cation is determined by
the asymptotic Fisher information. Our goal in this section is to understand the behavior
of the Fisher information matrix near the boundary of the stability domain. The investiga-
tions in this chapter are carried out in the standard Hawkes model, see Prokaj and Torma
(2010) for related analysis in the model de?ned (4.6)-(4.7). We ?rst present some obser-
vations regarding the limiting behavior of the Fisher information, which we then examine
theoretically.
Recall that in the standard case, (4.6)-(4.7) reduces to
d?(t) = ?a(?(t) ?m)dt +?dN(t)
where a, ? = bc and m are positive real parameters. In this model the parameter vector is
? = (a, ?, m). For the standard Hawkes process the stability condition simpli?es to ? < a.
Notice, that the value of m does not play any role in the stability condition.
4.4. THE FISHER INFORMATION MATRIX 79
Figure 4.4 illustrates the e?ect of approaching criticality. The value of ? = 0.3 and
m = 0.1 is kept ?xed. On the left hand side of Figure 4.4, a = 0.35, while on the right it is
a = ?+10
?6
. The density of events and also E (?) is much larger when the parametrization
0 50 100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Time (s)
?
t
0 50 100
0
2
4
6
8
10
12
14
Time (s)
?
t
Figure 4.4: Intensity in the stable (left) and the nearly unstable (right) case.
is close to the boundary of the stability domain, i.e., when a ? ? is small. Moreover, the
nearly unstable intensity process shows di?usion characteristics.
Let us now turn to the time–normalized Fisher information with respect to the param-
eter a, i.e.,
I(a) = E
_
?
2
a
?
_
.
To evaluate I(a) the joint law of ?
a
and ? is needed. In Figure 4.5 the scatter plot of ?
a
against ? is shown with decreasing decay factor a, where a = 1 is the critical value. We
can see that the cloud gets narrower as a gets closer to 1. This indicates an increasing
correlation between ? and ?
a
. It is easy to calculate the correlation coe?cient, which
indeed tends to ?1 as a goes to 1, see Proposition 4.4.2 below.
Comparing the expected values of ? and ?
a
one can see that they have the same order
of magnitude, see (4.45) below. Then, at least at a heuristic level, we can expect that
?
2
a
/? ? ? and I(a) ? E (?) for a ? ? ? 0. This shows that the time–normalized Fisher
information I(a) goes to in?nity as a approaches the critical value.
In a similar manner one easily ?nds from Lemma 1 below that ?
a
? ?
?
? ? ? (a??)
?1
:
E (?) =
am
a ??
, E (?
a
) =
?m?
a(a ??)
, E (?
?
) =
m
a ??
. (4.45)
80 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
0 2 4 6
?1
?0.8
?0.6
?0.4
?0.2
0
a=3
?
?
a
0 5 10
?2.5
?2
?1.5
?1
?0.5
0
a=2
?
?
a
0 5 10 15
?10
?8
?6
?4
?2
0
a=1.4
?
?
a
0 20 40 60
?50
?40
?30
?20
?10
0
a=1.1
?
?
a
Figure 4.5: ?
a
vs. ? in a standard Hawkes model with ? = m = 1.
Thus, the rescaled Fisher information matrix with respect to parameters a and ? has the
form
lim
a???0
(a ??)I(a, ?) = vv
T
where v is a vector with non-zero elements.
Let us now make these ?ndings precise. The ?rst step is to investigate the limiting
behavior of the intensity process. Similar investigations have been carried out for branching
processes with immigration in Isp´any et al. (2005), Isp´any and Pap (2007). See also the
discussion on the analogies in the introduction of Prokaj and Torma (2010). Next, in
Theorem 2 we calculate the stationary distribution of the appropriately rescaled intensity
process, in which we use the following di?erentiation rule:
Proposition 4.4.1. Consider the (right continuous) process ? which satis?es
d?
t
= a
t
dt +?dN
t
,
4.4. THE FISHER INFORMATION MATRIX 81
where N
t
is a point process and a continuously di?erentiable function f. Then, for s < t
f(?
t
) ?f(?
s
) =
_
t
s
f
?
(?
u
)a
u
du +
_
t
s
[f(?
u
) ?f(?
u?
)] dN
u
.
or in the di?erential form
df(?
t
) = f
?
(?
t
)a
t
dt + [f(?
t
) ?f(?
t?
)] dN
t
.
Theorem 2. Consider the stationary point process given in (4.13). Let ?
0
be a positive
real number. Then, (a ? ?)? converges to ?2m
?
0
,
2
?
2
0
in distribution, as a and ? approach ?
0
such that ? < a.
Proof. In order to calculate the characteristic function, ?rst we determine the dynamics of
e
i??(t)
. Applying Proposition 4.4.1 with f(x) = e
i?x
, we can write
de
i??(t)
= ?i?e
i??(t)
a(?(t) ?m)d
t
+ [e
i?(?(t)+?)
?e
i??(t)
]dN
t
.
Taking expectation at both sides and applying (4.3) we get
0 = E[?i?e
i??(t)
a(?(t) ?m) +e
i??(t)
(e
i??
?1)?(t)],
where we set the left side to zero, since the mean change is zero under stationarity . With
(log ?)
?
(?) denoting the derivative of the logarithm of the characteristic function ?(?) we
have
(log ?)
?
(?) =
?am
e
i??
?1 ?i?a
. (4.46)
Applying basic calculus and elementary properties of the characteristic function we can
write from (4.46)
log ?(?(a ??)) =
_
?(a??)
0
xam
e
ix?
?1 ?ixa
dx. (4.47)
Let us now introduce
y =
x
a ??
and change variables in (4.47) to get
log ?
(a??)?(t)
(?) =
_
?
0
yam
e
i?y(a??)
?1?iy(a??)a
(a??)
2
dy. (4.48)
82 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
Applying
?iy(a ??)a = ?iy(a ??)(? + (a ??)) = ?iy?(a ??) ?iy(a ??)
2
in the denominator of the integrand we can rewrite (4.48) as
log ?
(a??)?(t)
(?) =
_
?
0
yam
(iy)
2
D(y, a, ?) ?iy
dy. (4.49)
with
D(y, a, ?) =
e
?iy(a??)
?1 ??iy(a ??)
[iy(a ??)]
2
.
Let us now take limit a, ? ??
0
, ? < a on (4.49):
lim
a,???
0
?<a
log ?
(a??)?(t)
(?) =
_
?
0
y?
0
m
(iy)
2
?
2
0
2
?iy
dy = (4.50)
= ?
2m
?
0
log
_
1 ?
i??
2
0
2
_
. (4.51)
In (4.50) we used
lim
a,???
0
?<a
D(y, a, ?) = lim
z?0
e
?
0
z
?1 ??
0
z
z
2
=
= lim
z?0
(1 +?
0
z +
(?
0
z)
2
2
+. . .) ?1 ??
0
z
z
2
=
?
2
0
2
,
with z = iy(a ??).
It follows from (4.51) that the characteristic function in the limit is
lim
a,???
0
?<a
?
(a??)?(t)
(?) =
_
1 ?
i??
2
0
2
_
?
2m
?
0
which is the characteristic function of the Gamma-distribution with parameters as in the
theorem.
Theorem 2 can be seen as a special case of Theorem 2 in Prokaj and Torma (2010) with
a slightly di?erent normalization factor. The next result, Theorem 3 says that the time-
normalized Fisher information gets in?nite as we approach the boundary of the stability
domain.
4.4. THE FISHER INFORMATION MATRIX 83
Theorem 3. Consider the stationary solution of (4.13). Let ?
0
be a positive real number.
Then,
lim
a,???
0
?<a
E
_
?
2
a
?
_
= lim
a,???
0
?<a
E
_
?
2
?
?
_
= ?,
moreover liminf(a ??)E (?
2
a
?
?1
) > 0 and similarly for ?
?
.
The proof is based on the key observation that the intensity process and its derivatives
are fully correlated in the limit, which is presented in Proposition 4.4.2. For this to show
we need the derivatives of ? and a method for calculating the moments and covariance of
the stationary ?, ?
a
, ?
?
.
Put
?
t
=
_
_
_
?
t
?m
?
at
?
?t
_
_
_
.
We can formulate the standard Hawkes model and its derivatives in matrix form as shown
in the following lemma.
Lemma 1 (Derivatives).
d?
t
= ?A?
t
dt +bdN
t
,
where
A =
_
¸
_
a 0 0
1 a 0
0 0 a
_
¸
_
, b =
_
¸
_
?
0
1
_
¸
_
.
Lemma 2 (Moments, covariance). Let j, k, l denote nonnegative integers, s = j +k+l ? 1.
Then, we have
?a(j +k +l)E
_
?
j
a
?
k
?
?
l
_
+jmE
_
?
j?1
a
?
k
?
?
l
_
?
?jE
_
?
j?1
a
?
k
?
?
l+1
_
+amlE
_
?
j
a
?
k
?
?
l?1
_
+
+E
_
?
j
a
(?
?
+ 1)
k
?
l+1
_
+ E
_
?
j
a
?
k
?
? (? +?)
l
_
?
?E
_
?
j
a
?
k
?
?
l+1
_
= 0.
(4.52)
Proof. We get by di?erentiating ?
j
a
(t)?
k
?
(t)?
l
(t)
d?
j
a
(t)?
k
?
(t)?
l
(t) = d?
j
a
(t)?
k
?
(t)?
l
(t) +d?
k
?
(t)?
j
a
(t)?
l
(t) +d?
l
(t)?
j
a
(t)?
k
?
(t). (4.53)
We can apply Proposition 4.4.1 on the derivatives given by Lemma 1 to calculate d?
l
(t),
84 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
d?
j
a
(t), d?
k
?
(t):
d?
l
(t) = l?
l?1
(t)(?a?(t) +am)dt +
_
(?(t?) +?)
l
??
l
(t?)
¸
dN(t), (4.54)
d?
j
a
(t) = j?
j?1
a
(t)(m??(t) ?a?
a
(t))dt, (4.55)
d?
k
?
(t) = k?
k?1
?
(t)(?a?
?
(t))dt +
_
(?
?
(t?) + 1)
k
??
k
?
(t?)
¸
dN(t). (4.56)
After inserting the right hand side of the equations (4.54), (4.55), (4.56) into (4.53), we
take expectation using (4.3). Note that the random variables in the expectations are all
integrable, see Proposition 10 in Prokaj and Torma (2010). Under stationarity the average
change is zero, that is
dE
_
?
j
a
(t)?
k
?
(t)?
l
(t)
_
= 0.
Ordering the terms yields the stated equation of the moments.
Proposition 4.4.2 (Joint distribution of scaled ?, ?
a
, ?
?
). Consider the stationary solu-
tion of (4.13). Let ?
0
be a positive real number. Then, ((a??)?, ??(a??)?
a
, a(a??)?
?
)
converges in distribution to (X, X, X) with X ? ?2m
?
0
,
2
?
2
0
, as a and ? approach ?
0
such that
? < a.
Proof. We show that the L
2
norm of the scaled di?erences
(a ??)(? ?(???
a
))
and
(a ??)(? ?a?
?
)
vanishes in the limit. For the L
2
norm of the ?rst di?erence we have
lim
a,???
0
?<a
E
_
((a ??)? +?(a ??)?
a
)
2
_
=
= lim
a,???
0
?<a
_
(a ??)
2
(E
_
?
2
_
+ 2?E (?
a
?) +?
2
E
_
?
2
a
_
)
¸
, (4.57)
for the second we have
lim
a,???
0
?<a
E
_
((a ??)? ?a(a ??)?
?
)
2
_
=
= lim
a,???
0
?<a
_
(a ??)
2
(E
_
?
2
_
?2aE (?
a
?) +a
2
E
_
?
2
?
_
)
¸
. (4.58)
4.4. THE FISHER INFORMATION MATRIX 85
To calculate the expectations in (4.57) and (4.58), using Lemma 2 we build a linear equation
system, in which the equations are of type (4.52) with all (j, k, l) triples from
{(j
?
, j
?
, k
?
)|0 ? j
?
, k
?
, l
?
? 2, j
?
+k
?
+l
?
? 1} .
Solving the equation system for the expectations gives
E
_
?
2
_
=
am(2am+?
2
)
2(a ??)
2
, (4.59)
E
_
?
2
a
_
=
m?
2
(a
2
+ 4am?2m?)
2a
2
(2a ??) (a ??)
2
, (4.60)
E
_
?
2
?
_
=
m(2a + 4am?2m? ?3a? + 2?)
2 (2a ??) (a ??)
2
, (4.61)
E (?
a
?) = ?
?m(a? ?2m? + 4am)
2(a ??)
2
(2a ??)
, (4.62)
E (?
?
?) =
am(4am?2m? +?)
2(a ??)
2
(2a ??)
. (4.63)
We substitute in (4.57) and (4.58), take limit to get zero for both L
2
norms.
Proof of Theorem 3. We consider ?rst the Fisher information in parameter a. By Fatou-
lemma and Proposition 4.4.2 we have
liminf
a,???
0
?<a
E
_
[??(a ??)?
a
]
2
(a ??)?
_
? E
_
liminf
a,???
0
?<a
[??(a ??)?
a
]
2
(a ??)?
_
= E
_
X
2
X
_
, (4.64)
where X ? ?2m
?
0
,
2
?
2
0
. It follows, that
liminf
a,???
0
?<a
?
2
(a ??)E
_
?
2
a
?
_
? E (X) .
From this we conclude. For the other limit, the proof goes analogously with analyzing
E
_
[a(a ??)?
?
]
2
(a ??)?
_
in the limit.
In the following theorem we examine the Fisher information with respect to the pa-
rameter m. Heuristically, we would expect that I(m) vanishes near the stability boundary,
86 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
since the variance of ? gets there large while m remains constant.
Theorem 4. Consider the stationary solution of (4.13). Let ?
0
be a positive real number.
Then,
lim
a,???
0
?<a
E
_
?
2
m
?
_
= 0.
Proof. Di?erentiating (4.13) we get
d?
m
(t) = ?a(?
m
(t) ?1)dt,
the stationary solution of which is ?
m
(t) = 1. Thus,
I(m) = E
_
?
2
m
?
_
= E
_
1
?
_
.
Since
1
?
?
1
m
,
and lim
1
?
= 0 in distribution, we have limE
_
1
?
_
= 0 by the Dominated Convergence
Theorem.
For the standard Hawkes process with parameter a > 0 we can give rather precise
estimation for the Fisher information. This is based on the identities given in the next
statement.
Proposition 4.4.3. Consider the stationary point process given in (4.13) with m = ? > 0.
Then, for any k, l ? Z, k ? 0 we have ?
k
a
(? ?m)
l
? L
1
(?) and
E
_
?
k
a
?
l+1
_
=
(a(k +l) +m)E
_
?
k
a
(? ?m)
l
_
+kE
_
?
k?1
a
(? ?m)
l+1
_
+ E
_
?
k
a
(? ?m)
l+1
_
. (4.65)
Proof. For k, l ? Z, k ? 0 and ? > 0, the integrability of ?
k
a
(? ? m + ?)
l
follows from
Proposition 10 in Prokaj and Torma (2010).
Write the dynamics of ?
k
a
(t)(?(t) ?m+?)
l
= x
k
2
(t)(x
1
+?)
l
(t) using Proposition 1 and
4.4. THE FISHER INFORMATION MATRIX 87
the change of variable formula:
d(x
1
+?)
l
(t) = ?al(x
1
+?)
l?1
(t)x
1
(t)dt+
_
(x
1
(t?) +? +?)
l
?(x
1
(t?) +?)
l
_
dN(t),
dx
k
2
(t) = ?kx
k?1
2
(t) (x
1
(t) +ax
2
(t)) dt
d(x
1
(t) +?)
l
x
k
2
(t) = (x
1
+?)
l
(t)dx
k
2
(t) +x
k
2
(t)d(x
1
+?)
l
(t)
Since the process (x
1
(t) +?)
l
x
k
2
(t) is stationary and in L
1
for all t we have that the mean
change is zero. Writing this out, but omitting the actual time t, we obtain that
?kE
_
x
k?1
2
x
1
(x
1
+?)
l
_
?kaE
_
x
k
2
(x
1
+?)
l
_
?laE
_
x
1
(x
1
+?)
l?1
x
k
2
_
+
E
_
x
k
2
(x
1
+? +?)
l
?
_
?E
_
x
k
2
(x
1
+?)
l
?
_
= 0.
Rearranging and letting ? ? 0+ gives the relation (4.65) by ? = m. For l ? 0 the
Dominated Convergence Theorem, for l < 0 the Beppo-Levi Theorem can be used to see
that we can take the limit inside the expectation.
For a given l, the integrability of ?
k
a
(? ?m)
l
for all k ? 0 follows from Proposition 10
in Prokaj and Torma (2010) if l ? 0, while for l < 0 from (4.65) by induction on ?l.
Theorem 5. In model (4.13) with ?xed m = 1 and ? = 1, a > 1 we have
2
(a ?1)a(a + 1)
?1 < E
_
?
2
a
?
_
<
2
(a ?1)a(a + 1)
.
Proof. First note that
2
(a ?1)a(a + 1)
= E
_
?
2
a
? ?1
_
.
This can be easily seen by applying Proposition 4.4.3 with k = 2, l = ?1 and with k = 1,
k = 0, which yield
E
_
?
2
a
? ?1
_
= ?
2
a + 1
E (?
a
)
and
E (?
a
) = ?
1
a(a ?1)
,
respectively. Thus, we have to prove
E
_
?
2
a
? ?1
_
?1 < E
_
?
2
a
?
_
< E
_
?
2
a
? ?1
_
. (4.66)
88 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
The second inequality is trivial, because
?
2
a
?
<
?
2
a
? ?1
.
Let us now work out the ?rst inequality. Applying Proposition 4.4.3 with k = 2, l = ?2
yields
E
_
?
2
a
?
_
= 2E
_
?
a
? ?1
_
+ E
_
?
2
a
?
(? ?1)
2
_
. (4.67)
We can calculate the expectation in the ?rst term of the right hand side of (4.67) as
E
_
?
a
? ?1
_
= E
_
?
a
(? ?? + 1)
? ?1
_
= E
_
?
a
?
? ?1
??
a
_
= ?1, (4.68)
where
E
_
?
a
?
? ?1
_
= E (?
a
) ?1
from Proposition 4.4.3 applied with k = 1, l = ?1. For the second term we have
E
_
?
2
a
? ?1
?
? ?1
_
?E
_
?
2
a
? ?1
_
= E
_
?
2
a
(? ?1)
2
_
= (4.69)
= E
_
?
a
(? ?1)
_
2
+D
2
_
?
a
? ?1
_
= 1 +D
2
_
?
a
? ?1
_
,
where D
2
(x) denotes the variance of x. Combining (4.67) with (4.69) we get
E
_
?
2
a
?
_
= E
_
?
2
a
? ?1
_
?1 +D
2
_
?
a
? ?1
_
. (4.70)
Since the variance is nonnegative, the ?rst inequality in (4.66) holds as well.
Numerical results. Next, we present a simulation experiment. The time–normalized
Fisher information matrix is approximated with a time average
ˆ
I(?) =
1
T
_
T
0
?
?
ˆ
?(t, ?)?
?
ˆ
?(t, ?)
T
ˆ
?(t, ?)
dt
for T large in a long simulation of the standard Hawkes process. We keep the parameters
? = 0.3 and m = 0.1 ?xed. Figure 4.6 shows the diagonal elements of this empirical matrix
as a approaches ? from above. The Fisher information with respect to parameters a and
? is a decreasing function of a ??, while
ˆ
I(m) is increasing. The graphs are in accordance
4.4. THE FISHER INFORMATION MATRIX 89
with the analytical results mentioned in the Theorem 3, 4, namely that I(a), I(?) tend to
in?nity and I(m) tends to zero as a ?? ?0 with m ?xed.
0.3 0.31 0.32 0.33 0.34 0.35 0.36
0
10
20
30
I
(
a
)
0.3 0.31 0.32 0.33 0.34 0.35 0.36
0
5
a
I
(
m
)
0.3 0.31 0.32 0.33 0.34 0.35 0.36
0
10
20
30
I
(
?
)
Figure 4.6: Diagonals of the empirical Fisher information matrix in the standard Hawkes
case.
From a practical point of view the inverse of the Fisher information matrix I
?1
(?) is
even more important then I(?) itself, since I
?1
(?) indicates the accuracy of parameter
estimation. For example, in the standard Hawkes model asymptotic normality holds for
the maximum likelihood estimator, see Ogata (1978). The asymptotic covariance matrix
is I
?1
(?). Note also that in the standard Hawkes case the overparametrization issue is
resolved by introducing ? = bc.
The inverse of the Fisher information matrix with a = 0.312, its eigenvalues z and the
condition number ? are
ˆ
I
?1
(a, ?, m) =
_
_
_
0.8737 0.8134 0.2007
0.8134 0.8059 0.1188
0.2007 0.1188 0.6605
_
_
_
, z =
_
_
_
0.0210
0.6156
1.7034
_
_
_
, ? = 81.11.
The parameters a and ? can be estimated by the maximum likelihood method approxi-
mately equally accurately, the estimation errors with respect to these two parameters are
highly correlated in this nearly unstable case (the correlation coe?cient is 0.9694). More-
over, the condition number is moderately high indicating that standard iterative numerical
90 CHAPTER 4. MODELLING INFORMATION ARRIVAL PROCESSES
procedures are applicable for maximum likelihood estimation of ? in this model.
In Figure 4.7 the trace of
ˆ
I
?1
(?) is shown. Simple theoretical considerations imply that
Tr (I
?1
(?)) should ?rst decrease and then go to in?nity as ? approaches criticality with m
?xed. The curve con?rms decreasing but it is incomplete on the left due to the immense
computational burden arising very close to criticality.
0.3 0.31 0.32 0.33 0.34 0.35 0.36
2
3
4
a
T
r
(
I
?
1
)
Figure 4.7: The trace of the empirical asymptotic covariance matrix in the standard Hawkes
case.
4.5 Conclusion
In this chapter a Hawkes model was presented, which captures the self-exciting e?ects
of market news. We developed algorithms for the simulation and recursive identi?cation
of this model. We investigated the Fisher information matrix in the simplest (standard
Hawkes) model. In particular, we showed that parts of the diagonal of the asymptotic
Fisher information matrix go to in?nity, other parts go to zero as the parameters approach
the boundary of the stability domain. As a ?rst step we calculated the limit distribution
of the appropriately rescaled intensity process.
The presented model describes the dynamics of news arrival, where the news are gen-
erated by multiple analysts with respect to a single company. A natural extension of the
model is to allow for interdependencies between companies, i.e. to capture the assumption
that news about a company can increase the analyst coverage of another company. To this
end, we can introduce multiple Poisson-channels as follows:
dx(t) = ?Ax(t)dt +BdN(t), (4.71)
?(t) = C
T
x(t?) +m, (4.72)
where m, ?(t) are vector-valued and A, B, C are matrices of appropriate dimensions.
It would be highly interesting to check empirically how well our news arrival model
4.5. CONCLUSION 91
describes real processes. The major obstacle of empirical investigations is that historical
news data is di?cult to acquire.
Bibliographical remarks
The linear state Hawkes model, the algorithms for simulation of Hawkes processes and re-
cursive identi?cation have been presented at the 18th International symposium on Math-
ematical Theory of Networks and Systems at Virginia Tech, Blacksburg, Virginia, USA
held in July 28-August 1, 2008, see Gerencs´er et al. (2008b), authored by Gerencs´er, L.,
Matias, C., V´ag´o, Zs., Torma, B., Weiss, B. The Fisher information matrix related re-
sults have been presented at the International Conference on Probability and Statistics
with Applications, dedicated to the 100th anniversary of the birthday of B´ela Gyires; the
accompanying paper has been accepted for publication in Publicationes Mathematicae,
Debrecen, see Prokaj and Torma (2010).
The linear state Hawkes model was proposed by L´aszl´o Gerencs´er, its application to
model news arrival processes was proposed by Bal´azs Torma. The simulation and identi-
?cation methods (Section 4.2, 4.3) are based on the ideas of L´aszl´o Gerencs´er, numerical
investigations have been carried out by Bal´azs Torma. The theoretical investigations re-
garding the Fisher information matrix (Section 4.4) have been carried out by Vilmos Prokaj
and Bal´ azs Torma in cooperation, all numerical experiments have been conducted by Bal´azs
Torma.
Chapter 5
Regression analysis based on
high-frequency data
5.1 Introduction
A general property of security prices, such as stock prices is that the price is some integer
multiple of the so-called minimum price increment or tick-size, say h. For example, on the
NASDAQ stock exchange, h = 0.01$. On futures markets a higher h is applied: on the
Chicago Mercantile Exchange, say, h = 12.50$ for the ES Mini S&P500 Futures contract.
The need for a minimum price increment is a consequence of pricing algorithms applied on
the exchanges, as they aggregate demand and supply on equidistant price levels, where the
distance of two consecutive price levels is h. Thus we can interpret the market price as a
price observed under aggregation. The aggregation in this form is also called quantization.
The loss of information due to quantization is especially high when h is large relatively
to the price volatility. This is the case when dealing with high-frequency data, that is,
the price process is sampled at a high frequency, 100 times a second, say. The analysis
of orderbook data, such as exploring relationships between the price dynamics and order
quantities at various price levels, has gained high attention recently.
A scalar quantizer is de?ned as a mapping q from IR to a discrete, ?nite or countable
set Y ? IR, representing the so-called quantization levels, assigning to each x ? IR its
quantized version
y = q(x). (5.1)
The simplest scalar quantizer is the uniform quantizer, where the set of quantization levels
is given by the integer multiples of a ?xed, positive number h, called the sensitivity of the
92
5.1. INTRODUCTION 93
quantizer, and if x is a real number then we set
q(x) = kh for I
k
= {kh ?h/2 < x ? kh +h/2}. (5.2)
A more realistic model for quantization is a quantizer with saturation, see Brockett and
Liberzon (2000), de?ned as above in the range
?(M + 1/2)h < x ? (M + 1/2)h,
with M being a positive integer, and setting q(x) = ±Mh outside the above interval. Thus
there are altogether 2M + 3 quantization domains, and we will denote them again by I
k
,
with k ? K, where K is the set of possible indices. See Widrow and Koll´ar (2008) for a
recent survey of the statistical theory of quantization.
Motivated by the application described above, in this chapter we consider the regression
problem of reconstructing the coe?cient vector ?
?
? IR
d
of the ?nite-valued regressor pro-
cess (?) = ?
n
? IR
d
when measured with additive Gaussian noise, followed by quantization.
I.e. the observed values are of the form
y
n
= q(?
T
n
?
?
+e
n
), (5.3)
where e
n
is an i.i.d. Gaussian sequence with mean 0 and known variance ?
2
= (?
?
)
2
, see
e.g. Masry and Cambanis (1980). Let J =
_
¯
?
1
, . . .
¯
?
M
_
denote the set ?
n
can take values
from. To get rid of overparametrization issues we further assume that J generates IR
d
and P(?
n
=
¯
?
j
> 0) for all n and j = 1 . . . M. The assumed knowledge of ?
2
may be
unrealistic in many applications, but it greatly simpli?es the presentation. We shall discuss
the possibility of handling unknown ?-s later.
An e?cient randomized EM-method to solve the o?-line maximum-likelihood estima-
tion problem, based on say N observations, has been developed in Finesso et al. (1999a).
In the course of this procedure we generate a sequence of estimators ?
t
that converge to
the o?-line maximum likelihood estimator
ˆ
?
N
almost surely, under appropriate technical
conditions. A real-time version of this method, exhibiting excellent convergence proper-
ties, has been developed in Finesso et al. (1999b). In the real-time scheme we generate a
sequence of estimators
ˆ
?
t
such that
ˆ
?
t
converges to ?
?
almost surely, under appropriate
technical conditions.
It is well-known form the theory of stochastic approximation, that, in the case of
a weighted stochastic gradient method based on the maximum-likelihood estimation, the
94 CHAPTER 5. REGRESSION ANALYSIS BASED ON HIGH-FREQUENCY DATA
best available asymptotic covariance is the inverse of the Fisher information, see Benveniste
et al. (1990) Section 3.2.3. This is achieved by a stochastic Newton-method.
5.2 The EM-method for estimating ?
?
Consider ?rst the case of o?-line estimation, i.e. when the number of samples N is ?xed.
For each y in the observation set de?ne the quantization domain I(y) = {x : q(x) = y}.
Write as usual
?(x; ?, ?, ?
2
) =
1
?
2??
e
?
(x ??
T
?)
2
2?
2
,
where ? = ?
n
is the regressor. Then for any ? and ? the ?-probability of observing y is,
with ?
2
= (?
?
)
2
,
P(I(y); ?, ?) =
_
I(y)
1
?
2??
e
?
(x ??
T
?)
2
2?
2
dx =
_
I(y)
?(x; ?, ?, ?
2
)dx. (5.4)
Having ?
N
= (?
1
, ..., ?
N
), for any ? the logarithm of the ?-probability of observing y
N
=
(y
1
, ..., y
N
) is
L
N
(y
N
; ?
N
, ?) =
N

n=1
log P(I(y
n
); ?
n
, ?) =
N

n=1
L(y
n
; ?
n
, ?). (5.5)
In Figure 5.1 we plot the expected likelihood function against the running parameter
?, with ? kept ?xed at ?
?
(left), and against the running parameter ? with ? kept ?xed
at ?
?
(right). Two one-dimensional problems are considered: problem I. (solid line) with
parameters
?
?
= 0.5, ?
?
2
= 0.08,
and problem II. (dashed line) with parameters
?
?
= 0, ?
?
2
= 0.1.
In all experiments we assume h = 1, and M = 10, here we apply the regressor ? = 1
for simplicity. According to these ?gures the expected likelihood function is likely to be
concave with a unique maximum.
The ML estimator
ˆ
?
N
is obtained by maximizing L
N
(y
N
; ?
N
, ?), or solving the likeli-
5.2. THE EM-METHOD FOR ESTIMATING ?
?
95
?1 ?0.5 0 0.5 1
?8
?7
?6
?5
?4
?3
?2
?1
0
?
E
x
p
e
c
t
e
d

l
o
g
?
l
i
k
e
l
i
h
o
o
d
?
?
=0.5, ?
2
=?
? 2
=0.08 ?
?
=0, ?
2
=?
? 2
=0.1
0.05 0.1 0.15 0.2 0.25 0.3
?0.8
?0.75
?0.7
?0.65
?0.6
?0.55
?0.5
?0.45
?0.4
?
2
E
x
p
e
c
t
e
d

l
o
g
?
l
i
k
e
l
i
h
o
o
d
? =?
?
=0.5, ?
? 2
=0.08 ? =?
?
=0, ?
? 2
=0.1
Figure 5.1: The expected log-likelihood function.
hood equation
d
d?
L
N
(y
N
; ?
N
, ?) = 0. (5.6)
Introducing the conditional density of x given y
?(x | y; ?, ?, ?
2
) =
?(x; ?, ?, ?
2
)
P(I(y); ?, ?)
?
I(y)
(x), (5.7)
where ?
E
(x) denotes the indicator of the set E, the likelihood equation is equivalent to
the following:
The quantized normal equation:
_
N

n=1
?
n
?
T
n
_
? =
N

n=1
?
n
_
x?(x | y
n
; ?
n
, ?, ?
2
)dx. (5.8)
Notice that this equation is non-linear in ?. To solve the likelihood equation would require
the computation of an integral in each step of the iteration, which is not feasible if ?
?
is
vector-valued.
The EM-method: This di?culty has been circumvented in Finesso et al. (1999a) by
using a Markov Chain Monte Carlo (MCMC) method for computing the integrals. Since
the likelihood of the observations x
n
= ?
T
n
?
?
+ e
n
is easily obtained, and y
n
= q(x
n
), a
natural approach to solve the likelihood equation is to use the EM-method. Following
the basic steps of the EM-method we replace the log-likelihood function by an auxiliary
96 CHAPTER 5. REGRESSION ANALYSIS BASED ON HIGH-FREQUENCY DATA
function (see e.g. Dempster et al. (1977); Mclachlan and Krishnan (1996))
Q(y; ?,
¯
?) = E¯
?
[log P(X, ?, ?
2
) | y] = E [log P(X, ?, ?
2
) | y;
¯
?], (5.9)
where
¯
? is a current best estimate, and the random variable X = ?
T
¯
? +e is the unknown
assumed state given regressor ?. Thus we have
Q(y; ?, ?,
¯
?) =
_
I(y)
?log(
?
2??) ?
(x ??
T
?)
2
2?
2
?(x; y, ?,
¯
?, ?
2
)dx. (5.10)
For N independent observations we set
Q
N
(y
N
; ?,
¯
?) =
N

n=1
Q(y
n
; ?,
¯
?) = E [log P(X
N
, ?) | y
N
;
¯
?], (5.11)
where X
n
= ?
T
n
¯
? + e
n
is the unknown assumed state at time n. The so-called M-step,
maximizing Q
N
in ?, gives an updated estimate that will replace
¯
?.
To simplify the notations consider now the case of uniform quantization without satu-
ration. Let I
k
be the k-th interval: I
k
= {x : q(x) = kh}, and let
N
j,k
= #{n : 1 ? n ? N, y
n
? I
k
, ?
n
=
¯
?
j
} (5.12)
be the number of times that kh is observed in the sequence y
N
and at the same time the
regressor takes the value
¯
?
j
. Then the M-step is equivalent to solving the linear equation
d
d?
Q
N
(y
N
; ?
N
, ?,
¯
?) =

j

k
N
j,k
_
I
k
¯
?
j
(x ?
¯
?
T
j
?)
?
2
?(x | kh;
¯
?
j
,
¯
?, ?
2
)dx = 0. (5.13)
Note that all information on the data is now contained in the counting numbers N
j,k
. We
can write (5.13) as

j

k
N
j,k
¯
?
j
¯
?
T
j
?
_
I
k
?(x | kh;
¯
?
j
,
¯
?, ?
2
)dx =

j

k
N
j,k
_
I
k
¯
?
j
x?(x | kh;
¯
?
j
,
¯
?, ?
2
)dx.
Taking into account that the integral of a density function is one, we arrive at the following
updating formula:
5.2. THE EM-METHOD FOR ESTIMATING ?
?
97
The M-step:
?
N
? =

j,k
N
j,k
¯
?
j
_
I
k
x?(x | kh;
¯
?
j
, ?, ?
2
)dx, (5.14)
where
?
N
=
_
N

n=1
?
n
?
T
n
_
.
In the course of the EM-method we set
¯
? = ?
t
, and we get ? = ?
t+1
.
Basic inequalities. The basic inequality connecting the likelihood function and the
Q-function is the following: for any y and for given ?xed
¯
? we have for any ?
L(y, ?) ? Q(y; ?,
¯
?) +D(
¯
?||?) +H(
¯
?), (5.15)
where D(
¯
?||?) ? 0 for all ?. (In fact D(
¯
?||?) is a divergence between two conditional
probability densities, and H(
¯
?) is an entropy, which depends only on
¯
?.) It follows that
the function L(y, ?) ?Q(y; ?,
¯
?) is minimized at ? =
¯
?, thus, if
¯
? is interior relative to the
the parameter domain then, we have for any N, y
N
, ?
N
Q
?
N
(y
N
; ?
N
, ?, ?) =
?
??
Q
N
(y
N
; ?
N
, ?,
¯
?)
|
¯
?=?
=
?
??
L(y
N
; ?
N
, ?). (5.16)
It follows that the solution of the likelihood equation
?
??
L
N
(y
N
; ?
N
, ?) = 0 is obtained by
solving the equation
Q
?
N
(y
N
; ?
N
, ?, ?) = 0. (5.17)
Asymptotic log-likelihood. Assuming ergodic regressors,
¯
N
j,k
= lim
N??
N
j,k
(N)
N
has a
limit for every j ? J (here we made the dependence of N
j,k
on N explicit). Then, using
Y
j
= q(
¯
?
T
j
?
?
+ e), the asymptotic log-likelihood function can be written in the following
form:
¯
L(?) = lim
N??
1
N
L
N
(y
N
, ?) = (5.18)
= lim
N??
1
N
N

n=1
log P(I(y
n
), ?) = (5.19)
= lim
N??
1
N

j?J

k?K
N
j,k
log P(I(k);
¯
?
j
, ?) = (5.20)
=

j?J

k?K
¯
N
j,k
log P(I(k);
¯
?
j
, ?). (5.21)
98 CHAPTER 5. REGRESSION ANALYSIS BASED ON HIGH-FREQUENCY DATA
Note that we have that
?
??
¯
L(?)
|?=?
? =

j,k
¯
N
j
P(I(k);
¯
?
j
, ?
?
)
?
??
P(I(k);
¯
?
j
, ?)
|?=?
?
P(I(k);
¯
?
j
, ?
?
)
= (5.22)
=

j
¯
N
j
?
??

k
P(I(k);
¯
?
j
, ?)
|?=?
? =

j
¯
N
j
· 0 = (5.23)
= 0, (5.24)
where
¯
N
j
=

k
¯
N
j,k
= limN
j
/N, N
j
= #{n : 1 ? n ? N, ?
n
=
¯
?
j
}.
Similarly, de?ne the asymptotic Q-function, with X
j
= ?
T
j
?
?
+e and Y
j
= q(X
j
), as
¯
Q(?,
¯
?) =

j,k
¯
N
j,k
Q(Y
j
= kh, ?,
¯
?) = (5.25)
=

j,k
¯
N
j,k
E [log P(X
j
, ?)|Y
j
= kh,
¯
?]. (5.26)
To relate the asymptotic conditional Q function to the asymptotic conditional likelihood
function the simplest, although formal, procedure is to divide both sides of (5.16) by N,
and take limit to get
¯
Q
?
(?, ?) =
?
??
¯
Q(?,
¯
?)
|
¯
?=?
=
?
??
¯
L(?). (5.27)
Thus the asymptotic problem of determining ?
?
can be formulated, as solving the equation
¯
Q
?
(?, ?) = 0. (5.28)
This equation could be derived directly, but the context of the EM-method gives a conve-
nient computational framework that will be exploited subsequently.
The asymptotic Fisher information. The asymptotic Fisher information for the problem
of estimating ?
?
with known ? = ?
?
will be denoted
I
?
= ?
?
2
??
2
¯
L(?)|
?=?
?. (5.29)
It is well-known that it can be expressed via the score function
?
??
L(y
n
; ?) =
_
I(yn)
?
n
(x ??
T
n
?)
?
2
?(x | y
n
; ?
n
, ?, ?
2
)dx (5.30)
5.3. A RANDOMIZED EM-METHOD 99
as
I
?
= lim
N??
1
N
N

n=1
_
?
??
L(y
n
; ?)
__
?
??
L(y
n
; ?)
_
T
|
?=?
?. (5.31)
Equivalently, we can write, taking into account (5.16),
I
?
= lim
N??
1
N
N

n=1
Q
?
(y
n
; ?
?
, ?
?
) Q
?
(y
n
; ?
?
, ?
?
)
T
, (5.32)
or as
I
?
=

j?J

k?K
¯
N
j,k
¯
?
j
¯
?
T
j
?
4
__
I(k)
_
x ?
¯
?
T
j
?
?
_
?(x|kh,
¯
?
j
, ?
?
, ?
2
)dx
_
2
. (5.33)
Note that on the limiting case, when h tends to 0, we get
I
?
=
??
T
?
2
=
??
T
(?
?
)
2
,
as expected. It is easy to see that in any case the loss in information due to quantization
decreases the Fisher information, i.e
I
?
?
??
T
?
2
. (5.34)
5.3 A randomized EM-method
The integrals on the right hand side of (5.14) are expectations with respect to a conditional
Gaussian density, and it is therefore natural to approximate them using a Markov Chain
Monte Carlo (MCMC) algorithm, see Hammersley and Handscomb (1967); Metropolis
et al. (1953). A combination of the latter with the EM-algorithm leads to a stochastic
approximation scheme called a randomized EM-method, ?rst presented in Finesso et al.
(1999a,b)). A similar method has been developed independently for the problem of log-
linear regression in Solo (1999).
The MCMC method. Thus to compute
_
I
k
x?(x | kh,
¯
?
j
;
¯
?, ?
2
) dx we generate an
ergodic Markov chain
¯
?
j,k
t
(
¯
?) on the state-space I
k
, which is an interval of length h (in case
of no saturation), such that its invariant measure is ?(x | kh,
¯
?
j
;
¯
?, ?
2
) or ?(x | I
k
,
¯
?
j
;
¯
?, ?
2
).
For this purpose we use the Metropolis-Hastings method, with un-normalized target density
?(x) = ?(x,
¯
?) = e
?(x?
¯
?
T
j
¯
?)
2
/(2?
2
)
?
I(y
k
)
(x).
100 CHAPTER 5. REGRESSION ANALYSIS BASED ON HIGH-FREQUENCY DATA
Let the initial transition kernel for the Metropolis-Hastings method be q(x, y) = 1/h for all
x, y ? I(y
k
), i.e. let the initial chain be simply an i.i.d. sequence with uniform distribution.
Then we have a classic Metropolis algorithm de?ned by the acceptance probabilities
?(x, y;
¯
?) = min
_
?(y,
¯
?)
?(x,
¯
?)
, 1
_
= min
_
e
?(y?x)(y+x?2
¯
?
T ¯
?)
2?
2
, 1
_
. (5.35)
For the generation of
¯
?
j,k
?
(
¯
?) we will need an i.i.d sequence of random vectors (U
l
, V
l
), l ?
1 with uniform distribution on [0, 1] ×[0, 1], independent also of the initial state
¯
?
j,k
0
(
¯
?) =
¯
?
j,k
0
. The ?rst component, U
l
, is used to generate the next sate of the initial chain, while
the second component, V
l
, is used to realize the acceptance or rejection. We will thus use
the following shorthand notation for the generation of
¯
?
j,k
?
(
¯
?):
The frozen parameter Markov chain on I
k
with regressor
¯
?
j
:
¯
?
j,k
?+1
(
¯
?) = F(
¯
?
j,k
?
(
¯
?), U
l+1
, V
l+1
;
¯
?). (5.36)
Here F depends on
¯
? via the acceptance probability ?(x, y;
¯
?).
Let K
N
be the set of indices of quantization domains that show up in the observation
sequence of length N, let J
N
be the set of indices of regressor values that show up in the
regressor sequence of length N. Let k ? K
N
, j ? J
N
and let the current state of the
corresponding Markov chain be
¯
?
j,k
?
(
¯
?). Then for large L a good approximation of (5.14)
is given by
?
N
? =

k?K
N

j?J
N
N
j,k
¯
?
j
1
L
L

?=1
¯
?
j,k
?
(
¯
?). (5.37)
Allowing time-variation. When the above approximation is applied in an EM-iteration
it is reasonable to run the EM-algorithm and the MCMC method in parallel. Let us
now write
¯
? = ?
t
, with ?
t
still to be speci?ed, and consider the time-varying Markovian
dynamics
?
j,k
t+1
= F(?
j,k
t
, U
t+1
, V
t+1
; ?
t
). (5.38)
Here ?
t
is the current approximation of the the maximum-likelihood estimator
ˆ
?
N
, which
5.4. A REAL-TIME RECURSIVE RANDOMIZED EM-METHOD 101
is in turn updated by an approximation of (5.37) as follows:
?
N
?
t+1
=

k?K
N

j?J
N
N
j,k
¯
?
j
1
t + 1
t+1

m=1
?
j,k
m
. (5.39)
The above algorithm, de?ned by (5.38) and (5.39) is called a randomized EM-method. The
following simple derivation yields a recursive equation for (5.39).
?
N
?
t+1
=

j,k
N
j,k
¯
?
j
1
t + 1
t+1

m=1
?
j,k
m
= (5.40)
=
t
t + 1

j,k
N
j,k
¯
?
j
1
t
t

m=1
?
j,k
m
+
1
t + 1

j,k
N
j,k
¯
?
j
?
j,k
t+1
= (5.41)
=
t
t + 1
?
N
?
t
+
1
t + 1

j,k
N
j,k
¯
?
j
?
j,k
t+1
= (5.42)
= ?
N
?
t
_
1 ?
1
t + 1
_
+
1
t + 1

j,k
N
j,k
¯
?
j
?
j,k
t+1
= (5.43)
= ?
N
?
t
+
1
t + 1

j,k
N
j,k
_
¯
?
j
?
j,k
t+1
??
N
?
t
_
. (5.44)
Multiplying with ?
?1
N
from the left yields the following equation :
A randomized EM-method:
?
t+1
= ?
t
+
1
t + 1

k?K
N

j?J
N
N
j,k
_
?
?1
N
¯
?
j
?
j,k
t+1
??
t
_
. (5.45)
Let us stress again that the number of observations is ?xed, and ?
t
is expected to converge
to
ˆ
?
N
, rather than to ?
?
.
5.4 A real-time recursive randomized EM-method
Consider now the situation when the N is not ?xed, instead we have a ?ow of data, and for
each new measurement the estimator of ?
?
will be updated. Since, for increasing N, every
integer k will eventually occur in the observation sequence, we would need to generate
an in?nite number of Markov-chains. This is not practical. Hence we con?ne ourselves
to the case of quantization with saturation. The price of this is that the state space is
non-compact, and the generation of the MCMC method below and above the saturation
102 CHAPTER 5. REGRESSION ANALYSIS BASED ON HIGH-FREQUENCY DATA
level requires extra care. We mitigate this problem by choosing a fairly wide saturation
interval, so that the probability of the state to become below or above the saturation level
is negligible. The quantization intervals will be denoted by I
k
as before with k ? K. If |K|
is large then it is unreasonable to update all the Markovian states at all time. Instead, at
any time T, we update a single Markov chain, say
¯
?
j,k
(
¯
?), where k = k
T
is the index of the
current observation and j = j
T
is the index of the current regressor value.
The ?rst step is to modify the approximation to the M-step (5.37) so as to take into
account the real time T. Let N
j,k,T
denote the number of visits to the domain I
k
for a
given regressor value
¯
?
j
up to time T = N, i.e. set
N
j,k,T
= #{n : x
n
? I
k
, ?
n
=
¯
?
j
, n ? T}. (5.46)
A convenient and reasonable approximation of (5.37) is obtained if we set L = N
j,k,T
for
the quantization domain I
k
and regressor value
¯
?
j
, namely then (5.37) reduces to:
The M-step for increasing sample size:
?
T
? =

j?J

k?K
N
j,k,T

t=1
¯
?
j
¯
?
j,k
t
(
¯
?). (5.47)
Synchronization. To synchronize the internal times of the individual Markov chains
¯
?
j,k
t
(
¯
?)
let us de?ne, for each j, k, a new, piecewise constant extension of
¯
?
j,k
t
(
¯
?) as follows: ?rst
let Z
j,k
t
be the indicator of the event (x
t
? I
k
) ? (?
t
=
¯
?
j
), i.e.
Z
k
t
= ?
I
k
(x
t
) · ?
{?t=
¯
?
j}
.
De?ne the new Markov chain
¯
?
?,j,k
t
=
¯
?
?,j,k
t
(
¯
?) so that it stays constant at any time t, unless
x
t
? I
k
and ?
t
=
¯
?
j
, and then follows the dynamics of
¯
?
j,k
t
(
¯
?). Thus we get:
¯
?
?,j,k
t+1
= Z
j,k
t
F(
¯
?
?,j,k
t
, U
t+1
, V
t+1
;
¯
?) + (1 ?Z
j,k
t
)
¯
?
?,j,k
t
. (5.48)
Let the initial condition be
¯
?
?,j,k
0
=
¯
?
j,k
0
. Then (
¯
?
?,j,k
t
, Z
j,k
t
) is a Markov-process for each
j, k, and so is
(
¯
?
?
t
, Z
t
) = (
¯
?
?,j,k
t
, Z
j,k
t
), k ? K, j ? J.
Also, the processes (
¯
?
?,j,k
t
, Z
j,k
t
) are independent as k and j vary. Thus we can write the
M-step (5.47) as
5.4. A REAL-TIME RECURSIVE RANDOMIZED EM-METHOD 103
The M-step for the synchronized Markov-chain:
?
T
? =

j?J

k?K
T

t=1
Z
j,k
t
¯
?
?j,k
t
(
¯
?)?
t
, (5.49)
Our goal in this section is to develope an on-line recursive quasi maximum likelihood
estimation algorithm, which takes the following general form:
ˆ
?
t+1
=
ˆ
?
t
?
1
t
ˆ
H
?1
t+1
g
t+1
, (5.50)
ˆ
H
t+1
=
ˆ
H
t
+
1
t
_
H
t+1
?
ˆ
H
t
_
, (5.51)
where g
t
, H
t
are on-line approximations of the gradient and the Hessian of asymptotic
negative likelihood function, respectively. In our case the EM-method is involved in the
maximum likelihood procedure, thus we consider ?rst the derivatives of
¯
Q(?,
¯
?); recall that
?
??
¯
Q(?, ?) =
¯
Q
?
(?, ?) =
?
??
¯
L(?).
Di?erentiating (5.26) with respect to ? we get
?
?
??
¯
Q(?,
¯
?) =

j,k
¯
N
j,k
¯
?
j
?
2
_
I(k)
(x ?
¯
?
T
j
?)?(kh;
¯
?
j
,
¯
?, ?
2
)dx. (5.52)
It is easy to see that in case of a stationary regressor we have EZ
j,k
t
=
¯
N
j,k
. In addition,
assuming stationary initialization for
¯
?
?j,k
t
(
¯
?), the integral in (5.52) equals
E
_
¯
?
?j,k
t
(
¯
?) ?
¯
?
T
j
?
_
.
Thus, let us de?ne for each t
G
t
(?,
¯
?) =

j?J

k?K
Z
j,k
t
¯
?
j
?
2
(
¯
?
?,j,k
t
(
¯
?) ??) (5.53)
for which we then have that
?
?
??
¯
Q(?,
¯
?) = EG
t
(?,
¯
?). (5.54)
To get a real-time randomized EM-method we proceed in the usual manner: let
ˆ
?
t
be
104 CHAPTER 5. REGRESSION ANALYSIS BASED ON HIGH-FREQUENCY DATA
the estimate of ?
?
at time t. Then generate the next state of a non-homogeneous Markov
chain (?
?,j,k
t+1
) by
?
?,j,k
t+1
= Z
j,k
t
F(?
j,k
t
, U
t+1
, V
t+1
;
ˆ
?
t
) + (1 ?Z
j,k
t
)?
?,j,k
t
. (5.55)
To update
ˆ
?
t
we use a stochastic Newton method to maximize
¯
L(?). First we estimate
the gradient
?
??
¯
L(
ˆ
?
t
) =
¯
Q
?
(
ˆ
?
t
,
ˆ
?
t
) by G
t+1
(
ˆ
?
t
,
ˆ
?
t
), which in turn is estimated on-line, see
(5.53), by
g
t+1
= ?

j?J

k?K
Z
j,k
t+1
¯
?
j
?
2
(?
?,j,k
t+1
?
¯
?
T
j
ˆ
?
t
) = ?
?
t+1
?
2
(?
j
?
,k
?
N
j
?
,k
?
,t+1
??
T
t+1
ˆ
?
t
), (5.56)
where k
?
= k
?
t+1
and j
?
= j
?
t+1
are the indexes observed at time t +1. The de?nition of the
non-homogenous Markov chain ?
j,k
t
is self-explanatory.
Recall that H
t
is speci?ed such that its real-time average
ˆ
H
t
approximates the asymp-
totic Fisher information matrix I
?
, see (5.31). Using similar arguments as for (5.54) we
can easily see that, assuming stationary initialization for
¯
?
?j,k
t
(?
?
), we have
I
?
= E
_
G
t
(?
?
, ?
?
)G
T
t
(?
?
, ?
?
)
_
.
Thus we ?rst estimate G
t
(?
?
, ?
?
)G
T
t
(?
?
, ?
?
) by G
t
(
ˆ
?
t
,
ˆ
?
t
)G
T
t
(
ˆ
?
t
,
ˆ
?
t
), which is in turn esti-
mated on-line by
H
t
= g
t
g
T
t
. (5.57)
In summary, the real-time stochastic Newton method is given by the equations (5.50)-
(5.51), (5.56)-(5.57).
The ideas behind this algorithm are similar to those presented in Finesso et al. (2000),
but without justi?cation for its convergence. The above derivation lends to a direct appli-
cation of the BMP theory, see Gerencs´er et al. (2008a).
5.5 Estimating the variance
Let us now return to our original model y
n
= q(?
T
n
?
?
+e
n
) and consider now the case when
?
?
, the variance of the additive noise is unknown. It is easy to see that the M-step of the
5.5. ESTIMATING THE VARIANCE 105
EM-method leads to the following updating formulas
?
N
? =

j?J

k?K
N
j,k
¯
?
j
_
I
k
x?(x | kh;
¯
?
j
, ?, ?
2
)dx, (5.58)
?
2
=

j?J

k?K
N
j,k
N
_
I
k
(x ??)
2
?(x | kh;
¯
?
j
, ?, ?
2
)dx. (5.59)
Notice in (5.58) that ? does not depend on ?, thus we can solve the above equations
successively. Then in analogy with the estimation of the regression coe?cient, we arrive at
a real-time, randomized EM-method, in which (5.50)- (5.51), (5.56)- (5.57) are extended
with the following gradient method estimating ?
?
:
ˆ ?
2
t+1
= ˆ ?
2
t
+
1
t + 1
((?
k
?
N
k
?
,t+1
?
ˆ
?
t+1
)
2
? ˆ ?
2
t
), (5.60)
where the dynamics of the time-varying Markov-chain now depends both on
ˆ
?
t
and ˆ ?
2
t
,
and ˆ ?
2
t+1
is applied in (5.50)- (5.51), (5.56)- (5.57).
To convert the above procedure into a fully stochastic Newton-method we need to
estimate the (d + 1) × (d + 1) Fisher information matrix, say I
?
, which now contains
elements related to ˆ ?
2
in addition, in row d +1 and column d +1, say. In order to examine
these additional elements we have carried out numerical experiments to calculate the 2 ×2
Fisher information matrix
?
I(?, ?) in the simpli?ed model
y
n
= q(? +e
n
), e
n
? N(0, ?
2
) i.i.d.,
with the scalar location parameter ? ? R, see Gerencs´er et al. (2008a). We found that the
o?-diagonal elements of
?
I(?, ?) are zero. From this ?nding and the chain rule it follows that
I
?
d+1,i
= I
?
i,d+1
= 0, i = 1 . . . d, thus we only need to estimate the scalar Fisher information
I
?
d+1,d+1
= (i
?
)
?
to get a fully stochastic Newton-method. The estimation of (i
?
)
?
can
be carried out along the lines described above for the Fisher infromation with respect to
parameter ?. We can summarize the algorithm as follows:
Real-time stochastic Newton-method estimating ?
?
and ?
?
:
ˆ
?
t+1
=
ˆ
?
t
+
1
t
ˆ
H
?1
t+1
g
t+1
, ˆ ?
2
t+1
= ˆ ?
2
t
+
1
t
g
?
t+1
/
ˆ
i
?
t+1
,
ˆ
H
t+1
=
ˆ
H
t
+
1
t
_
g
T
t+1
g
t+1
?
ˆ
H
t
_
,
ˆ
i
?
t+1
=
ˆ
i
?
t
+
1
t
_
(g
?
t+1
)
2
?
ˆ
i
?
t
_
,
g
t+1
= ?
t+1
(?
j
?
,k
?
N
j
?
,k
?
,t+1
??
T
t+1
ˆ
?
t
)/ˆ ?
2
t
, g
?
t+1
= (?
j
?
,k
?
N
j
?
,k
?
,t+1
??
T
t+1
ˆ
?
t
)
2
? ˆ ?
2
t
.
106 CHAPTER 5. REGRESSION ANALYSIS BASED ON HIGH-FREQUENCY DATA
To save computational time for the inversion of
ˆ
H, we can apply the Matrix Inversion
Lemma along the lines of the related derivation in the GARCH case in Section 2.5.
5.6 Numerical experiments
To demonstrate the viability of our method we present a simulation experiment. In the
test problem we have d = 2, the true parameters are
?
?
=
_
?0.9
1.1
_
, ?
?
= 0.3.
We generated the regressor process ?
t
according to the following model: ?
t
are independent
random variables distributed uniformly randomly over J =
_
¯
?
1
,
¯
?
2
_
with
¯
?
1
=
_
0.4
0.6
_
,
¯
?
2
=
_
0.7
0.3
_
.
Figure 5.2 depicts
ˆ
?
t
and ˆ ?
2
t
estimated recursively from simulated data of length N = 5·10
5
.
The ?gure shows that
ˆ
?
t
and ˆ ?
2
t
converge nicely to ?
?
t
and (?
?
)
2
t
, respectively.
For comparison purposes we also calculated the least squares estimator
ˆ
?
?
:
ˆ
?
?
=
_
N

n=1
?
n
?
T
n
_
?1
N

n=1
?
T
n
y
n
=
_
?0.7459
0.9136
_
.
The o?-line least squares estimator shows a signi?cant bias
?
?
?
ˆ
?
?
=
_
?0.2541
0.1864
_
,
because it does not take quantization e?ects into account (the variance of
ˆ
?
?
is less then
0.0001 when N = 5·10
5
). Note that this bias does not decrase by increasing the sample size.
The bias of the o?-line least squares estimator induced by quantization mainly depends on
the magnitude of ?
?
/h: the bias decreases as ?
?
/h increases.
5.7. CONCLUSION 107
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 10
4
?1.5
?1
?0.5
0
0.5
1
1.5
2
ˆ?
t
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 10
4
0
0.05
0.1
0.15
0.2
0.25
ˆ?
2t
t
Figure 5.2: The convergence of the estimators
ˆ
?
t
and ˆ ?
2
t
.
5.7 Conclusion
In this chapter we presented a recursive algorithm for estimating the coe?cients of a linear
regression from quantized observations. Numerical experiments indicate that the bias of the
standard linear regression estimator can be signi?cant, while our method yields unbiased
estimators.
Another important model in high-frequency ?nance is the following:
x
t+1
= µ
?
+x
t
+e
t+1
, e
t+1
? N(0, (?
?
)
2
) i. i. d.
This model can be seen as the discrete time version of the well-known Brownian motion,
where µ
?
controls the trend, ?
?
controls the volatility. An interesting (and more involved)
problem of future research is to recursively estimate (µ
?
, ?
?
) from quantized observations,
i.e. when
y
t+1
= q(x
t+1
).
108 CHAPTER 5. REGRESSION ANALYSIS BASED ON HIGH-FREQUENCY DATA
Bibliographical remarks
The algorithm presented in this chapter is an adaptation of the algorithm developed for
estimating the parameters of a Gaussian random variable from quantized observations,
published in Communications in Information and Systems, see Gerencs´er et al. (2008a), to
the quantized linear regression problem.
The original algorithm relies on the ideas of L´aszl´o Gerencs´er and Lorenzo Finesso,
numerical investigations carried out by Ildik´o Kmecs have been reproduced and extended
by Bal´ azs Torma. The algorithm has been extended with the estimation of the variance
parameter by Bal´azs Torma. The adaptation of the original algorithm to the linear regres-
sion problem presented in this chapter has been the work of Bal´azs Torma, initiated by
L´aszl´ o Gerencs´er.
Appendix A
Transaction requests
In this appendix we provide some more technical insight on agents’ order placing rules.
According to the belief ˆ p
t
the agent calculates his transaction request as follows. Recall
?rst our trading protocol, in which the agent has to submit orders before he knows at which
price p
t
he will actually transact. In the following, we derive simple order placing rules to
ensure the rational trading goal that under the assumption of exact prediction (p
t+1
= ˆ p
t
)
and self-?nancing portfolios, the transaction results in a higher wealth:
B
t
+ ˆ p
t
S
t
? B
t?1
+ ˆ p
t
S
t?1
, (A.1)
where B
t
, S
t
denotes cash and stock amounts, respectively. We can rewrite condition (A.1)
as
B
t?1
?d
t
p
t
+ (S
t?1
+d
t
) ˆ p
t
?(B
t?1
+S
t?1
ˆ p
t
) =
= d
t
(ˆ p
t
?p
t
) ? 0, (A.2)
where it is worth recalling that the transaction takes place at price p
t
. Unsurprisingly,
inequality (A.2) suggests that the agent should buy if the market price turns out to be less
than the belief, or the agent should sell if the opposite is true.
It is quite easy to achieve this using limit orders. One only has to place a limit buy
order one cent below ˆ p and simultaneously place a limit sell order one cent above ˆ p. Then,
if ˆ p > p, only the buy order executes, and if ˆ p < p, only the sell order executes, hence (A.2)
is satis?ed. If p happens to equal ˆ p, the agent does not transact. The working mechanism
of this order placing strategy is similar to the ”straddle” strategy widely used by option
traders.
109
110 APPENDIX A. TRANSACTION REQUESTS
The last component of our transaction request model is the concept of trading positions.
Opening or entering a long (short) position is a term used in practice for buying (selling)
stocks that the trader wants later to sell (buy or rebuy). A trader closes or exits a position
by transacting in the direction opposite to the opening transaction of that position. By
closing a position the trader realizes the book pro?t or loss he made on the position so
far. The aggregated e?ect of opening and closing transactions of a given position is zero
on the stock amount the trader holds, because the two transactions cancel out each other.
Assume for example, that the agent has closed all his positions by the time t
+
. Then
S
t
+ = S
0
, i.e. he has the same amount of stocks on the brokerage account as after the
initial endowment.
To complete the description of the order placing rules we have yet to specify the order
quantities. In our model, every trading position contains exactly one stock. Thus, at time
t
?
, the agent has
|S
t
? ?S
0
| (A.3)
open positions, the sign of the di?erence indicating whether long (+) or short (?) open
positions. Myopic agents would like to close open positions as soon as possible, which they
try to achieve by extending the demand in the order of the appropriate direction: an agent
increases the quantity in the sell order if he has a surplus in shares or he increases the
quantity in the buy order in case of shortfalls. Thus, the transaction request of agent i,
1 ? i ? N consists of N
i
? 2 orders
(d, b)
i,1
t
=
_
1 +
_
S
i
t
? ?S
i
0
_
?
, ˆ p
i
t
?0.01
_
,
(d, b)
i,2
t
=
_
?1 ?
_
S
i
t
? ?S
i
0
_
+
, ˆ p
i
t
+ 0.01
_
.
Appendix B
The arti?cial stock market simulator
In this appendix we brie?y describe the simulator program we developed for investigating
the agent-based model presented in Section 2.2. It deserves an introductory review because
it has been designed as a general, easily customizable object-oriented framework capable
of hosting other orderbook-level ACF models.
The program is written in the Java programing language. We next outline the main
functionalities along with the main components of the framework; related basis classes are
indicated in italics.
1. Market Exchange: The market Exchange gathers the orders of trading agents in the
OrderBook corresponding to the traded stock. It uses ClearingAlgorithm to match
the orders of agents and books the transactions on agents’ accounts.
2. Trading Agent: Trading strategies are implemented in classes of type Behavior by the
method determineDemand, which submits a single or several Orders to the market
exchange. The class TradingAccount keeps track of the stock and cash balance of the
agents and also checks budget constraints before trading.
3. Data acquisition: All market entities (orderbook, behaviors, accounts) use a uni-
?ed concept to gather data for analysis. In each trading period, classes inherited
from SimuComponent record a snapshot of the state of the market and other trading
related variables, e.g account balances, order quantities, limit prices, trading prof-
its, stock price, trading volume, bid-ask spread. Time series of these variables are
available for analysis after the simulation.
4. Market setup: Parameter settings, such as for example the length of simulation
horizon, behavior parameters and the market fractions of di?erent agent types, initial
111
112 APPENDIX B. THE ARTIFICIAL STOCK MARKET SIMULATOR
endowment of agents can be de?ned via the class SimuConf. In addition, some basic
statistics to be calculated on the generated time series can be speci?ed here.
The typical steps of analysis using the program are as follows. First the market setup
needs to be de?ned by creating a class of type SimuConf. After compiling and starting
the program a simple user interface appears where the user can start the simulation. After
running a simulation the result chart appears where various time series can be selected
and plotted. Time series can be exported into a plain text ?le for further analysis in a
statistical software environment, such as MATLAB or R. In order to save time, the user
can start simulations with di?erent parameter settings simultaneously on a multi-processor
machine.
Short summary
In this thesis I propose a detailed ACF model and I present statistical algorithms based
on technical (econometric) models. The ACF model is analysed using technical models.
The fundamental model extends well-established concepts of agent-based computational
?nance with a novel element for information arrival processes, which is modelled by a dis-
crete time version of Hawkes processes. I show by numerical experiments that in contrast to
classical ACF models, stylized facts emerge even by constant market fractions of chartists
and fundamentalists. I further validate the ACF model using a widely accepted technical
model, the General Autoregressive Heteroscedastisity (GARCH) model. In particular, a
qualitative relationship between the market structure and the best-?tting GARCH(1,1)
model is established, which motivates to apply GARCH(1,1) for indirect statistical infer-
ence on the market structure. The use of GARCH models for detecting changes in the
market structure is justi?ed by a general principle, stating that change detection is feasible
using misspeci?ed models. A real-time change detection method for GARCH processes
is presented based on the MDL (Minimum Description Length) approach to modelling. I
provide an economic interpretation of the GARCH-based change alarms. The performance
of the proposed algorithm is evaluated on simulated data. Change-alarms based on real
data are reported.
Motivated by the problem of quasi Maximum Likelihood Estimation of GARCH pa-
rameters, I propose a new e?cient nonlinear optimization method which combines the
advantages of descent methods and cutting plane approaches. I also present a technique
with which the dimension of the GARCH ?tting problem can be reduced.
For modelling the dynamics of information arrival, I propose Hawkes processes in which
the feedback path is de?ned by a ?nite dimensional linear system. I propose algorithms
for the simulation and real-time estimation of this type of Hawkes processes. I show
that some parts of the diagonal of the asymptotic Fisher information matrix in case of
the one-dimensional feedback go to in?nity, other parts of the diagonal go to zero as the
parameters approach the boundary of the stability domain. As a ?rst step I calculate the
limit distribution of the appropriately rescaled intensity process.
Finally I present a real-time method for estimating the parameters of a linear regression
from quantized observations, when the regressor is ?nite-valued. The algorithm applies
Expectation Maximization and Markov Chain Monte Carlo techniques.
113
R¨ovid ¨osszefoglal´o (in hungarian)
Az ´ertekez´esben egy ´ uj multi´agens t?ozsdemodellt ´es matematikailag kezelhet?o technikai
(¨ okonometriai) modellek alapj´an fejlesztett statisztikai algoritmusokat mutatok be, illetve
ezek seg´?ts´eg´evel elemzem a multi´agens modellt. A fundament´alis modell az irodalomban
elterjedt multi´agens modellek koncepci´oit eg´esz´?ti ki egy ´ uj h´?rfolyamatmodellel, amely
egy diszkr´et idej? u verzi´oja a j´ol ismert Hawkes-folyamatoknak. Numerikus k´?s´erletekben
megmutatom, hogy a modell reproduk´alja a piaci id?osorokban fellelhet?o tipikus mint´akat,
az ´ un. stiliz´alt t´enyeket a fundamentalist´ak ´es technikai keresked?ok ?x ar´anya mellett is,
szemben klasszikus modellekkel. A t?ozsdemodell valid´al´as´anak egy tov´abbi l´ep´esek´ent a
gener´alt ´arfolyamatot egy sz´eles k¨ orben elfogadott ¨okonometriai modell, a General Au-
toregressive Heteroscedastisity (GARCH) modell seg´?ts´eg´evel elemzem. Ezen bel¨ ul kvan-
titat´?v ¨ osszef¨ ugg´est l´etes´?tek a piac strukt´ ur´aja ´es a legjobban illeszked?o GARCH modell
k¨oz¨ ott. Ezen ¨osszef¨ ugg´es alapj´an a GARCH(1,1) modellt haszn´alom a piacstrukt´ ur´aban
bek¨ovetkez? o v´altoz´as detekt´al´as´ara, kihaszn´alva a v´altoz´as-detekt´al´as probl´ema j´ol ismert
tulajdons´ag´at, mely szerint az v´egrehajthat´o rosszul speci?k´alt modell felhaszn´al´as´aval
is. Bemutatok egy GARCH alap´ u, val´os idej? u v´altoz´as-detekt´al´o algoritmust, amely a
Minim´alis Le´?r´ohossz m´odszer elveit k¨oveti. A GARCH alap´ u riaszt´ast k¨ozgazdas´agilag
´ertelmezem. Az algoritmust sikeresen alkalmaztam a fenti t?ozsdemodellben l´etrej¨ov?o piaci
strukt´ ura megv´altoz´as´anak detekt´al´as´ara illetve futtattam val´odi ´arfolyamadatokon is.
A GARCH modell param´etereinek kv´azi Maximum Likelihood alap´ u becsl´es´ere egy ´ uj,
´altal´ anos nemline´aris optimaliz´al´o algoritmust javasolok, amely v´ag´os´?k alap´ u technik´ak
´es (kv´azi-) Newton m´odszerek kever´eke. Bemutatok egy technik´at, amellyel a GARCH
illeszt´esi probl´ema dimenzi´oja cs¨okkenthet?o.
A p´enz¨ ugyi piacokon fell´ep?o h´?rfolyamatokat egy ´ un. Hawkes folyamattal modellezz¨ uk,
melyekben a visszacsatol´ast az esem´enyfolyamat ´es az intenzit´as k¨oz¨ott egy v´eges dimenzi´os
line´aris rendszerrel ´?rjuk le. Algoritmusokat javasolok e Hawkes folyamatok szimul´aci´oj´ara
´es rekurz´?v identi?k´aci´oj´ara. Egydimenzi´os visszacsatol´as´ u Hawkes folyamatokra megmu-
tatom, hogy a Fisher inform´aci´os m´atrix diagon´alis´anak egy r´esze a v´egtelenbe tart, m´ asik
r´esze null´ ahoz, ha a param´eterekkel a stabilit´asi tartom´any sz´el´ehez tartunk. Els?o l´ep´esk´ent
kisz´ amolom a megfelel?oen ´atsk´al´azott folyamat intenzit´as´anak hat´areloszl´as´at.
V´eg¨ ul bemutatok egy algoritmust, amellyel egy line´aris regresszi´o param´etereit be-
cs¨ ulhetj¨ uk kvant´alt meg?gyel´esekb?ol, ha a regresszor ´ert´ekk´eszlete v´eges. Az algoritmus
Expectation Maximization ´es Markov Chain Monte Carlo m´odszereken alapul.
114
Bibliography
Aknouche, A., Guerbyenne, H., 2006. Recursive estimation of GARCH models. Communications in statis-
tics. Simulation and computation 35 (4), 925–938.
Alfarano, S., 2006. An agent-based stochastoc volatility model. Ph.D. thesis, Christian-Albrechts-
Universit¨at zu Kiel.
Alfarano, S., Lux, T., Wagner, F., 2005. Estimation of agent-based models: The case of an asymmetric
herding model. Computational Economics 26, 19–49.
Amilon, H., 2008. Estimation of an adaptive stock market model with heterogeneous agents. Journal of
Empirical Finance 15 (2), 342 – 362.
Armijo, L., 1966. Minimization of functions having Lipschitz continuous ?rst partial derivatives. Paci?c J.
Math 16 (1), 1–3.
Auslender, A., Silva, P. J., Teboulle, M., 2007. Nonmonotone projected gradient methods based on barrier
and euclidean distances. Comput. Optim. Appl. 38 (3), 305–327.
Basseville, M., Nikiforov, I., 1993. Detection of Abrupt Changes: Theory and Application. Prentice Hall.
Bauwens, L., Preminger, A., Rombouts, J. V., 2007. Theory and inference for a markov switching garch
model. Cahiers de recherche 07-09, HEC Montr´eal, Institut d’´economie appliqu´ee.
Benveniste, A., M´etivier, M., Priouret, P., 1990. Adaptive algorithms and stochastic approximations.
Springer-Verlag, Berlin.
Berkes, I., Gombay, E., Horv´ath, L., Kokoszka, P., 2004. Sequential change-point detection in GARCH(p,q)
models. Econometric Theory 20 (06), 1140–1167.
Berkes, I., Horv´ath, L., Kokoszka, P., 2003. GARCH processes: structure and estimation. Bernoulli 9,
201–217.
Berndt, E., Hall, B., Hall, R., Hausman, J., 1974. Estimation and inference in nonlinear structural models.
Annals of Economic and Social Measurement 3, 653–665.
Bollerslev, T., 1986. Generalized autoregressive conditional heteroscedasticity. Journal of Econometrics
31 (3), 307–327.
Bollerslev, T., Wooldridge, J. M., 1992. Quasi-maximum likelihood estimation and inference in dynamic
models with time-varying covariances. Econometric Reviews 11, 143–172.
Boswijk, P., Hommes, C., Manzan, S., 2007. Behavioral heterogeneity in stock prices. Journal of Economic
Dynamics and Control 31, 1938–1970.
Boyd, S., 2008. Notes on subgradient methods. Unpublished note for EE364B available at
www.stanford.edu/class/ee364b/.
Boyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University Press.
Bremaud, P., 1981. Point processes and queues. Martingale dynamics. Springer Series in Statistics.
Springer, New York.
Brock, W. A., Hommes, C. H., 1998. Heterogeneous beliefs and routes to chaos in a simple asset pricing
model. Journal of Economic Dynamics and Control 22, 1235–1274.
Brockett, R., Liberzon, D., 2000. Quantized feedback stabilization of linear systems. IEEE Transactions
on Automatic Control 75 (7), 1279–1289.
Campillo, F., Kutoyants, Y., Gland, F. L., 2000. Small noise asymptotics of the glr test for o?-line change
detection in misspeci?ed di?usion processes. Stochastics and Stochastics Reports 70 (1–2), 109–129.
Cesa-Bianchi, N., Lugosi, G., 2006. Prediction, Learning, and Games. Cambridge University Press, New
115
116 BIBLIOGRAPHY
York, NY, USA.
Chen, S. H., Chang, C. L., Du, Y. R., 2009. Agent-based economic models and econometrics. Knowledge
Engineering Review, forthcoming.
Chiarella, C., He, X.-Z., Hommes, C., 2006. A dynamic analysis of moving average rules. Journal of
Economic Dynamics and Control 30 (9-10), 1729 – 1753.
Cont, R., 2001. Empirical properties of asset returns: stylized facts and statistical issues. Quantitative
Finance 1, 223–236.
Crummey, T. P., Farshadnia, R., Fleming, P. J., Grace, A. C. W., Hancock, S. D., 1991. An optimization
toolbox for MATLAB. IEE conference publication 2 (332), 744–749.
Daley, D. J., Vere-Jones, D., 2003a. An introduction to the theory of point processes. Vol. I: Elementary
theory and methods. 2nd ed. Probability and Its Applications. Springer, New York.
Daley, D. J., Vere-Jones, D., 2003b. An introduction to the theory of point processes. Vol. I. Springer-
Verlag.
De Grauwe, P., Grimaldi, M., 2004. Exchange rate puzzles: A tale of switching attractors. Working Paper
Series 163, Sveriges Riksbank (Central Bank of Sweden).
Dempster, A. P., Laird, N. M., Rubin, D. B., 1977. Maximum likelihood from incomplete data via the EM
algorithm( with discussion). J. Roy. Statist. Soc. Ser. 39, 1–38.
Diks, C., van der Weide, R., 2005. Herding, a-synchronous updating and heterogeneity in memory in a
cbs. Journal of Economic Dynamics and Control 29, 741–763.
Engle, R. F., 1982. Autoregressive conditional heteroscedasticity with estimates of the variance of united
kingdom in?ation. Econometrica 50 (4), 987–1007.
Finesso, L., Gerencs´er, L., Kmecs, I., 1999a. Estimation of parameters from quantized noisy observations.
In: Proceedings of the European Control Conference ECC’99, AM-3, F589, Karlsuhe, Germany. 6p.
Finesso, L., Gerencs´er, L., Kmecs, I., 1999b. A randomized EM-algorithm for estimating quantized linear
gaussian regression. In: Proceedings of the 38th IEEE Conference on Decision and Control. CDC’99,
Phoenix, Arizona, USA. pp. 5100–5101.
Finesso, L., Gerencs´er, L., Kmecs, I., 2000. A recursive randomized EM-algorithm for estimation under
quantization error. In: Proceedings of the American Control Conference, Chicago, Illinois. IEEE Control
System Society, pp. 790–791.
Fletcher, R., 1980. Practical Methods of Optimization. Volume 1: Unconstrained Optimization, 1st Edi-
tion. John Wiley and Sons Ltd.
Gerencs´er, L., 1996. On ?xed gain recursive estimation processes. J. of Mathematical Systems, Estimation
and Control 6, 355–358, retrieval code for full electronic manuscript: 56854.
Gerencs´er, L., 2006. A representation theorem for the error of recursive estimators. SIAM J. Control and
Optimization 44, 2123–2188.
Gerencs´er, L., Baikovicius, J., 1991. Change-point detection using stochastic complexity. identi?cation and
system parameter estimation. Selected papers from the 9th IFAC-IFORS Symposium on Budapest 1,
73–78.
Gerencs´er, L., Baikovicius, J., 1992. Change-point detection as model selection. Informatica 3, 3–20.
Gerencs´er, L., Kmecs, I., Torma, B., 2008a. Quantization with adaptation, estimation of gaussian linear
models. Communications in Information and Systems 8 (The Brockett Legacy Special Issue. In Honor
of Roger W. Brockett in honor of his 75th birthday (guest eds.: John Baillieul, John S. Baras, Anthony
BIBLIOGRAPHY 117
Bloch, P.S. Krishnaprasad and Jan C. Willems)), 223–244.
Gerencs´er, L., Matias, C., V´ag´o, Z., Torma, B., Weiss, B., 2008b. Self-exciting point processes with
applications in ?nance and medicine. In: 18th International symposium on Mathematical Theory of
Networks and Systems.
Gerencs´er, L., M´aty´as, Z., 2007a. Almost sure and L
q
-convergence of the re-initialized bmp scheme. In:
Castanon, D., Spall, J. (Eds.), Proc. 46th IEEE Conference on Decision and Control, Invited session,
New Orleans, LA, USA, 13-15 December 2007. Omnipress, pp. 965–974, wEB11.4.
Gerencs´er, l., M´aty´as, Z., 2007b. A behavioral stock market model. Mathematical Methods of Operations
Research 67, 43–63.
Gerencs´er, L., Orlovits, Z., Torma, B., 2010. Recursive estimation of GARCH processes. The 19th In-
ternational Symposium on Mathematical Theory of Networks and Systems, (MTNS 2010), Budapest,
Hungary, forthcoming.
Giesecke, K., Goldberg, L., Backshall, T., 2004. Credit Risk Modeling. In: Fabozzi, F. (Ed.), Handbook
of Fixed Income Securities. Wiley.
Hammersley, J., Handscomb, D., 1967. Monte-Carlo methods. London: Methuen & Co.
Hawkes, A., 1971a. Spectra of some self-exciting and mutually exciting point processes. Biometrika 58,
83–90.
Hawkes, A. G., 1971b. Point spectra of some mutually exciting point processes. J. Roy. Statist. Soc. Ser.
B 33, 438–443.
Hawkes, A. G., Oakes, D., 1974. A cluster process representation of a self-exciting process. J. appl. Probab.
11, 493–503.
Hinkley, D., 1971. Inference about a change-point from cummulative sum tests. Biometrika 58, 509–523.
Hommes, C. H., 2006. Heterogeneous agent models in economics and ?nance. In: Tesfatsion, L., Judd, K.
(Eds.), Handbook of Computational Economics. Vol. 2. Elsevier, pp. 1109 – 1186.
Isp´any, M., Pap, G., 2007. Weak convergence of step processes and an application for critical multitype
branching processes with immigration. arXiv:math/0701803v1.
Isp´any, M., Pap, G., van Zuijlen, M. C. A., 2005. Fluctuation limit of branching processes with immigration
and estimation of the means. Advances in Applied Probability 37, 523–538.
Kalev, P. S., Liu, W.-M., Pham, P. K., Jarnecic, E., 2004. Public information arrival and volatility of
intraday stock returns. Journal of Banking and Finance 28 (6), 1441–1467.
Kaufman, P. J., 1998. Trading systems and methods. Wiley Trading.
Kelley, C. T., 1999. Iterative methods for optimization. Frontiers in applied mathematics. SIAM, Philadel-
phia, PA, USA.
Kirman, A., 1995. The behaviour of the foreign exchange market. Bank of England Quarterly Bulletin 15,
286–293.
Kozhan, R., Salmon, M., 2009. Uncertainty aversion in a heterogeneous agent model of foreign exchange
rate formation. Journal of Economic Dynamics and Control 33 (5), 1106 – 1122.
LeBaron, B., 2000. Agent-based comuptational ?nance: suggested readings and early research. Journal of
Economic Dynamics and Control 24, 679–702.
Ljung, L., S¨oderstr¨om, T., 1983. Theory and practice of recursive identi?cation. The MIT Press.
Luenberger, D. G., 1973. Introduction to linear and nonlinear programming. Addison-Wesley Publishing
Company, Inc.
118 BIBLIOGRAPHY
Lux, T., 1998. The socio-economic dynamics of speculative markets: interacting agents, chaos, and the fat
tails of return distribution. Journal of Economic Behavior and Organization 33, 143–165.
Mangelsdor?, L., Weber, M., 1994. Testing choquet expected utility. Journal of Economic Behavior &
Organization 25 (3), 437 – 457.
Masry, E., Cambanis, S., 1980. Signal Identi?cation after Noisy Nonlinear Transformation. IEEE Trans-
actions on Information Theory IT-26, 50–58.
Matteo, T. D., Aste, T., Dacorogna, M. M., 2004. Using the scaling analysis to characterize ?nancial
markets. Finance 0402014, EconWPA.
Mclachlan, G. J., Krishnan, T., 1996. The EM Algorithm and Extensions. Wiley-Interscience.
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E., 1953. Equations of state calculations
by fast computing machines. J. Chem. Phys. 21, 1087–1091.
Mitchell, M. L., Mulherin, J. H., 1994. The impact of public information on the stock market. The Journal
of Finance 49, 923–950.
Moller, J., Rasmussen, J. G., 2005. Perfect simulation of Hawkes processes. Adv. Appl. Probab. 37 (3),
629–646.
Moller, J., Rasmussen, J. G., 2006. Approximate simulation of Hawkes processes. Methodol. Comput.
Appl. Probab. 8 (1), 53–64.
Mor´e, J. J., Garbow, B. S., Hillstrom, K. E., 1981. Testing unconstrained optimization software. ACM
Trans. Math. Softw. 7 (1), 17–41.
Murray, W., Overton, M. L., 1978. Steplength algorithms for minimizing a class of nondi?erentiable
functions. Tech. rep., Stanford University, Stanford, CA, USA.
Ogata, Y., 1978. The asymptotic behaviour of maximum likelihood estimators for stationary point pro-
cesses. Ann. Inst. Stat. Math. 30, 243–261.
Ogata, Y., Akaike, H., 1982. On linear intensity models for mixed doubly stochastic Poisson and self-
exciting point processes. J. R. Stat. Soc., Ser. B 44, 102–107.
Polyak, R. A., 2007. Regularized Newton method for unconstrained convex optimization. Math.
Prog.Published online.
Prokaj, V., Torma, B., 2010. Identi?cation of almost unstable Hawkes processes. Publicationes Mathemat-
icae Debrecen, forthcoming.
Rissanen, J., 1989. Stochastic complexity in statistical inquiry. World Scienti?c Publisher.
Runggaldier, W., 2003. Jump Di?usion Models. In: Rachev, S. (Ed.), Handbook of Heavy Tailed Distri-
butions in Finance. Elsevier/North-Holland, pp. 169–209, handbooks in Finance, Book 1 (W.Ziemba
Series Ed.).
Shi, Z.-J., 2004. Convergence of line search methods for unconstrained optimization. Appl. Math. Comput.
157 (2), 393–405.
Soderstrom, T., Stoica, P., 1989. System Identi?cation. Prentice Hall International.
Solo, V., 1999. Adaptive algorithms and Markov chain Monte Carlo methods. In: Proceedings of the
38th IEEE Conference on Decision and Control, Phoenix, Arizona. IEEE Control System Society, pp.
1775–1778.
Sonnevend, G., 1988. New algorithms in convex programming based on a notion of ”centre” (for systems
of analytic inequalities) and on rational extrapolation. In: Ho?man, K., Hiriat-Urruty, J., Lemarechal,
C., Zowe, J. (Eds.), Trends in Mathematical Optimization: Proceedings of the 4th French-German
BIBLIOGRAPHY 119
Conference in Optimization in Irsee. Vol. 84. Birkhauser Verlag, pp. 311–327.
Spall, J. C., 2003. Introduction to Stochastic Search and Optimization. John Wiley & Sons, Inc., New
York, NY, USA.
Taylor, S. J., 2007. Asset Price Dynamics, Volatility, and Prediction. Princeton University Press.
Torma, B., G.-T´oth, B., 2010. An e?cient descent direction method with cutting planes. Central European
Journal of Operations Research, forthcoming.
Wakker, P. P., 2001. Testing and characterizing properties of nonadditive measures through violations of
the sure-thing principle. Econometrica 69 (4), 1039–1059.
Welch, I., 2000. Herding among security analysts. Journal of Financial Economics 58, 369–396.
Widrow, B., Koll´ar, I., 2008. Quantization Noise in Digital Computation, Signal Processing, and Control.
Cambridge University Press.
Winker, P., Gilli, M., Jeleskovic, V., 2007. An objective function for simulation based inference on exchange
rate data. Journal of Economic Interaction and Coordination 2, 125–145.
Ye, Y., 1997. Complexity analysis of the analytic center cutting plane method that uses multiple cuts.
Math. Program. 78 (1), 85–104.
Zakoian, J.-M., 1994. Threshold heteroskedastic models. Journal of Economic Dynamics and Control 18 (5),
931–955.
Zhang, H., Hager, W. W., 2004. A nonmonotone line search technique and its application to unconstrained
optimization. SIAM J. on Optimization 14 (4), 1043–1056.

doc_384922981.pdf
 

Attachments

Back
Top