Dissertation on Empirical Industrial Organization

nityaaroma · Jul 11, 2013

Description
Industrial organization is the field of economics that builds on the theory of the firm in examining the structure of, and boundaries between, firms and markets.

ABSTRACT

Title of dissertation:

ESSAYS IN EMPIRICAL INDUSTRIAL ORGANIZATION Matthew Chesnes, Doctor of Philosophy, 2009

Dissertation directed by:

Professor John Rust Professor Ginger Jin Department of Economics

Chapter 1: Capacity and Utilization Choice in the US Oil Re?ning Industry This paper presents a new dynamic model of the operating and investment decisions of US oil re?ners. The model enables me to predict how shocks to crude oil prices and re?nery shutdowns (e.g., in response to hurricanes) a?ect the price of gasoline, re?nery pro?ts, and overall welfare. There have been no new re?neries built in the last 32 years, and although existing re?neries have expanded their capacity by almost 13% since 1995, the demand for re?nery products has grown even faster. As a result, capacity utilization rates are now near their maximum sustainable levels, and when combined with record high crude oil prices, this creates a volatile environment for energy markets. Shocks to the price of crude oil and even minor disruptions to re?ning capacity can have a large e?ect on the downstream prices of re?ned products. Due to the extraordinary dependence by other industries on petroleum products, this can have a large e?ect on the US economy as a whole. I use the generalized method of moments to estimate a dynamic model of capacity and utilization choice by oil re?ners. Plants make short-run utilization rate choices to maximize their expected discounted pro?ts and may make costly longterm investments in capacity to meet the growing demand and reduce the potential for breaking down. I show that the model ?ts the data well, in both in-sample and out-of-sample predictive tests, and I use the model to conduct a number of counterfactual experiments. My model predicts that a 20% increase in the price of crude oil is only partially passed on to consumers, resulting in higher gasoline prices, lower pro?ts for the re?nery, and a 45% decrease in total welfare. A disruption to re?ning capacity, such as the one caused by Hurricane Katrina in 2005, raises gasoline prices by almost 16% and has a small negative e?ect on overall welfare: the higher pro?ts of re?neries partially o?sets the large reduction in consumer surplus. As the theory predicts, these shocks have a smaller e?ect on downstream prices when consumer

demand is more elastic, resulting in a larger share of total welfare going to the consumer. Chapter 2: Consumer Search for Online Drug Information Consumers are increasingly turning to the internet and using search engines to ?nd information on medicinal drugs. Between 2001 and 2007, the number of adults using the internet as an alternative source of health information doubled. At the same time, online and o?ine advertising spending by drug companies is growing rapidly. I seek to understand how consumers use search engines to ?nd drug information and how this activity is in?uenced by direct to consumer advertising. I utilize a database of user click-through data from America Online to analyze the search behavior of consumers seeking drug information online. Compared with other searches, users submitting drug-related queries are more likely to click on more than one result in a search session, and when they do, they click more rapidly through the results and tend to migrate away from dot-com sites and toward those ending in dot-org and dot-net. O?ine advertising on a drug serves to increase the frequency and intensity of these searches. Chapter 3: Drug Information via Online Search Engines This paper utilizes a database of organic and sponsored search results from four large search engines to analyze the supply of drug-related information available on the internet. I show that the information varies signi?cantly across search engines, domain extensions, and between organic and sponsored results. Regression results reveal that websites with relatively more promotional content are pushed down in the search results while informational sites (including those ending in dot-gov and dot-org) are more likely to appear on page one of the results.

ESSAYS IN EMPIRICAL INDUSTRIAL ORGANIZATION

by Matthew William Chesnes

Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial ful?llment of the requirements for the degree of Doctor of Philosophy 2009

Advisory Committee: Professor John Rust, Co-Chair Professor Ginger Jin, Co-Chair Professor Peter Cramton Professor Pablo D’Erasmo Professor Erik Lichtenberg

c Copyright by Matthew William Chesnes 2009

Acknowledgments

I thank my advisors, John Rust, Ginger Jin and Peter Cramton, for their invaluable guidance, as well as Pablo D’Erasmo and Erik Lichtenberg for participating in my defense. I am grateful to the Department of Economics at the University of Maryland and the Economic Club of Washington for their ?nancial support. I also want to thank John Shea, Mark Duggan, Adam Copeland, David Givens, Ariel BenYishay, and seminar participants at the University of Maryland, the Federal Reserve Board of Governors, the International Industrial Organization Conference, and La Pietra-Mondragone Workshop in Economics for their suggestions and comments. All remaining errors are my own.

ii

Table of Contents
List of Tables List of Figures List of Abbreviations 1 Capacity and Utilization Choice in the US Oil Re?ning Industry 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The US Oil Re?ning Industry . . . . . . . . . . . . . . . . 1.2.1 Competition . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Capacity and Utilization . . . . . . . . . . . . . . . 1.2.3 Re?nery Maintenance and Outages . . . . . . . . . 1.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 A Firm’s Problem . . . . . . . . . . . . . . . . . . . 1.4.2 Per-Period Pro?t . . . . . . . . . . . . . . . . . . . 1.4.3 Demand . . . . . . . . . . . . . . . . . . . . . . . . 1.4.4 Probability of Breakdown . . . . . . . . . . . . . . 1.4.5 Production and Investment Costs . . . . . . . . . . 1.5 Empirical Estimation Strategy . . . . . . . . . . . . . . . . 1.5.1 Demand . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Breakdown Probability . . . . . . . . . . . . . . . . 1.5.3 Production Cost Parameters . . . . . . . . . . . . . 1.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Model Fit . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 First Stage Estimates: Demand and Breakdown . . 1.6.3 Second Stage Estimates: Costs . . . . . . . . . . . 1.6.4 Policy Function . . . . . . . . . . . . . . . . . . . . 1.7 Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 Methodology . . . . . . . . . . . . . . . . . . . . . 1.7.2 Results of Experiments . . . . . . . . . . . . . . . . 1.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Consumer Search for Online Drug Information 2.1 Introduction . . . . . . . . . . . . . . . . 2.2 Data . . . . . . . . . . . . . . . . . . . . 2.3 Descriptive Analysis . . . . . . . . . . . 2.4 Regression Analysis . . . . . . . . . . . . 2.4.1 Frequency Regressions . . . . . . 2.4.2 Depth Regressions . . . . . . . . 2.5 Conclusion . . . . . . . . . . . . . . . . . v vii ix 1 1 5 7 10 14 16 19 20 23 25 26 26 28 29 30 31 34 34 37 38 40 41 43 45 49 52 52 55 59 67 67 71 73

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

iii

3 Drug Information via Online Search Engines 3.1 Introduction . . . . . . . . . . . . . . . 3.2 Data . . . . . . . . . . . . . . . . . . . 3.3 Descriptive Analysis . . . . . . . . . . 3.3.1 Supply . . . . . . . . . . . . . . 3.3.2 Content . . . . . . . . . . . . . 3.3.3 Rank and Content Comparisons 3.3.4 Kernel Density Plots of Content 3.3.5 Probit Analysis . . . . . . . . . 3.4 Conclusion . . . . . . . . . . . . . . . . A Chapter 1 Supplement A.1 The Distillation Process A.2 Crude Oil Quality . . . . A.3 Estimation Algorithm . . A.4 Additional Tables . . . . B Chapter 2 Supplement C Chapter 3 Supplement Bibliography

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

76 76 79 83 83 86 89 90 93 95 98 98 99 102 103 106 108 112

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

iv

List of Tables

1.1 1.2 1.3 1.4 1.5 1.6 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.1 3.2

Re?nery Downtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Industry Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Demand Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Breakdown Probability Estimates . . . . . . . . . . . . . . . . . . . . 39 The E?ect of a 20% Increase in the Crude Oil Price . . . . . . . . . . 46 The E?ect of a 25% Loss in Capacity . . . . . . . . . . . . . . . . . . 47 Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Search Activity by Drug Class . . . . . . . . . . . . . . . . . . . . . . 61 Search Activity by Drug Age . . . . . . . . . . . . . . . . . . . . . . . 62 Search Activity by Drug Type . . . . . . . . . . . . . . . . . . . . . . 63 Transitions between extensions . . . . . . . . . . . . . . . . . . . . . 64

Transitions between ranks . . . . . . . . . . . . . . . . . . . . . . . . 65 Regression Results - Frequency of Search . . . . . . . . . . . . . . . . 68 Regression Results - Depreciation Analysis . . . . . . . . . . . . . . . 70 Regression Results - Depth of Search . . . . . . . . . . . . . . . . . . 72 Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Regression Results: Probit of Pr(Page 1) . . . . . . . . . . . . . . . . 94

A.1 Crude Qualities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 A.2 Industry Concentration . . . . . . . . . . . . . . . . . . . . . . . . . . 104 A.3 Cost Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 B.1 20 Most Actively Searched Drugs . . . . . . . . . . . . . . . . . . . . 106 B.2 20 Most Advertised Drugs . . . . . . . . . . . . . . . . . . . . . . . . 107 v

B.3 Variables Used in Regressions. . . . . . . . . . . . . . . . . . . . . . . 107 C.1 List of Search Queries . . . . . . . . . . . . . . . . . . . . . . . . . . 109 C.2 Keywords Used in Classi?cation Algorithm . . . . . . . . . . . . . . . 110 C.3 Variable De?nitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

vi

List of Figures

1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

Production Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average Yields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Re?nery Locations (Scaled by Capacity) . . . . . . . . . . . . . . . . Major Re?ned Product Pipelines . . . . . . . . . . . . . . . . . . . .

6 7 8 9

Capacity and Number of Re?neries . . . . . . . . . . . . . . . . . . . 11 Non-Zero Changes in Capacity, All Plants, 1986-2007 . . . . . . . . . 12 Capacity Utilization Rate and Crack Spread . . . . . . . . . . . . . . 13 District Breakdowns . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Model Fit (In Sample) . . . . . . . . . . . . . . . . . . . . . . . . . . 35

1.10 Model Fit (Out of Sample) . . . . . . . . . . . . . . . . . . . . . . . . 36 1.11 Estimated Production and Investment Cost Functions . . . . . . . . . 40 1.12 Optimal Utilization Rate Versus Month . . . . . . . . . . . . . . . . . 41 1.13 Optimal Utilization Rate Versus Month and Crude Price . . . . . . . 42 1.14 Price Elasticity of Demand . . . . . . . . . . . . . . . . . . . . . . . . 43 1.15 Crude Oil Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1.16 Loss in Capacity: Hurricane Katrina . . . . . . . . . . . . . . . . . . 45 1.17 Crude Oil Counterfactual: Simulation . . . . . . . . . . . . . . . . . . 46 1.18 Capacity Counterfactual: Simulation . . . . . . . . . . . . . . . . . . 48 2.1 2.2 2.3 2.4 2.5 Total DTCA Spending on all Prescription Drugs . . . . . . . . . . . . 57 DTCA Breakdown by Media Type . . . . . . . . . . . . . . . . . . . 57 Extension Popularity in the First 10 Clicks . . . . . . . . . . . . . . . 63 Rank Popularity in the First 10 Clicks . . . . . . . . . . . . . . . . . 64 Drill Down Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 vii

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9

Distribution of Organic and Sponsored Results . . . . . . . . . . . . . 83 Extension Popularity - Organic Results . . . . . . . . . . . . . . . . . 84 Extension Popularity - Sponsored Results . . . . . . . . . . . . . . . . 85 Average Rank by Extension - Organic Results . . . . . . . . . . . . . 85 Content of Summary Field - Organic Results . . . . . . . . . . . . . . 87 Content of Summary Field - Sponsored Results . . . . . . . . . . . . 87 Content of Summary Field - Organic Results - By Extension . . . . . 88 Content of Summary Field - Sponsored Results - By Extension . . . . 88 Rank Comparison - Organic Results - Google vs Yahoo . . . . . . . . 90

3.10 Summary Content Comparison - Organic Results - Google vs Yahoo . 91 3.11 Organic Rank Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.12 Kernel Density of Summary Content . . . . . . . . . . . . . . . . . . 92 A.1 Re?nery Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 A.2 Average Crude Oil Quality: Heavier and More Sour . . . . . . . . . . 101 C.1 Kernel Density of Summary Content, Organic Results, By Extension 108 C.2 Kernel Density of Summary Content, Sponsored Results, By Extension111

viii

List of Abbreviations
API DTCA EIA FCC FDA GMM HHI HTML IANA MTBE NAMCS NDC OPEC OTC PADD RLD URL American Petroleum Institute Direct-To-Consumer Advertising Energy Information Administration Fluid Catalytic Cracking Food and Drug Administration Generalized Method of Moments Her?ndahl-Hirschman Index HyperText Markup Language Internet Assigned Numbers Authority Methyl Tertiary Butyl Ether National Ambulatory Medical Care Survey National Drug Code Organization of the Petroleum Exporting Countries Over-The-Counter Petroleum Administration for Defense Districts Reference Listed Drug Uniform Resource Locator

ix

Chapter 1 Capacity and Utilization Choice in the US Oil Re?ning Industry 1.1 Introduction
The United States is the largest consumer of crude oil in the world and this resource accounts for 40% of the country’s total energy needs.1 Although a majority of this oil comes from foreign sources, almost all is re?ned domestically. Re?neries distill crude oil into a large number of products such as gasoline, distillate (heating oil), and jet fuel. While much attention has been paid to the upstream crude oil production industry (see Hamilton (1983) and Hubbard (1986)), and the downstream retail sector (see Borenstein (1991 & 1997)), very little research has focused on the role of the re?ning industry. Two important dynamic decisions faced by re?ners are their investment in capacity and the utilization rate at which they run their plant. These choices are de?ned over di?erent time horizons.2 The optimal choice of capacity accumulation, i.e., the increased ability to distill crude oil into higher valued products, is a long-term decision. Capacity is expensive to build and may take time to come online so forecasts of future market conditions are crucial. A shorter-term problem involves a re?ner’s choice of capacity utilization. This rate measures the intensity with which a ?rm uses its capital, which for a re?nery may include the use
Source: 2007 Annual Energy Review, Energy Information Administration (EIA). In addition, they must solve a complicated linear programming problem because their relative output prices are constantly changing and they have the choice of utilizing di?erent types of crude oil, some of which are better adapted to producing certain products.
2 1

1

of boilers, distillation columns, and downstream cracking units.3 The re?ner’s problem is further complicated by changing market conditions, geopolitical tensions, and unexpected events, such as hurricanes. The largest component of re?ners’ output is gasoline. New alternative technologies, such as hybrid cars, and changing perceptions on the environmental impact of gas-powered vehicles has a?ected the sensitivity of consumer demand to the price of gasoline.4 This affects the ability of re?ners to pass through shocks to the price of crude oil resulting from, for example, reduced production from OPEC countries or a war in the Middle East. With about one-half of US re?ning capacity located along the Gulf of Mexico, the potential for hurricanes can also dramatically a?ect the ability of the industry to supply a consistent ?ow of gasoline and other products to the rest of the country. This paper develops and estimates a new dynamic model of the operating and investment decisions of US oil re?ners. These re?ners face the possibility of breaking down if they run their plant too intensively, so they make costly investments in capacity to reduce this potential and to meet the growing demand for their products. My model assumes that ?rms are Cournot competitors in the re?ned product market. With many small ?rms, each is approximately a price-taker in the market, so the model of Kreps and Scheinkman (1983), with quantity pre-commitment (capacity choice) and Bertrand price competition, is similar to my approach. The model enables me to predict how shocks to crude oil prices and re?nery shutdowns (e.g., in response to hurricanes) a?ect the price of gasoline, re?nery pro?ts, and overall
More details on the re?ning process can be found in section 2 and in appendix A. Knittle et al. (2008) and Espey (1996) both study the recent changes in consumers’ price elasticity of demand for gasoline.
4 3

2

welfare.5 I also estimate how a change in the price sensitivity of consumers may a?ect the results of these shocks, particularly in regards to the division of welfare between the re?ner and the consumer. I estimate a fully dynamic model of the oil re?ning industry incorporating key decisions made by plants which a?ect both contemporaneous and future pro?tability. The re?ning industry is inherently forward-looking and decisions made today rely heavily on forecasts of future market conditions. A static model would not, for example, account for the increased breakdown potential of a plant from high utilization rates or the appropriate long-term investments of a re?ner facing rising crude oil costs and uncertain demand. My estimation algorithm involves classic policy function iteration nested inside a GMM optimization, which allows me to compute the equilibrium value and policy functions.6 This approach allows me to run various counterfactual experiments and determine the optimal policy and future discounted pro?ts of each ?rm. Several recent papers, including Bajari et al. (2007) and Ryan (forthcoming), estimate dynamic models of ?rm behavior using a 2-step method that reduces the computational complexity of ?nding the structural parameters, but does not allow one to compute the equilibrium under counterfactual environments. My model predicts that a 20% increase in the price of crude oil is only partially passed on to consumers, resulting in a 13% increase in gasoline prices, lower pro?ts for the re?nery, and a 45% decrease in total welfare. The pass-through result is fairly close to the historic rate of about 50%.7 Consumer surplus falls following the
I de?ne total welfare to be the sum of consumer surplus and re?ner pro?t. See Rust (2008). 7 See Borenstein and Shepard (1996) and Goldberg and Hellerstein (2008) for related literature on price pass-through.
6 5

3

shock, but the change in the overall distribution of welfare depends on the sensitivity of consumer demand to the prices of re?ned products. More sensitive consumers sacri?ce less and receive a larger share of the (smaller) surplus. I also show that a disruption to re?ning capacity, such as the one caused by Hurricane Katrina in 2005, raises gasoline prices by almost 16% and has a small negative e?ect on overall welfare: the higher pro?ts of operating re?neries partially o?set the large reduction in consumer surplus. When Hurricane Katrina hit the Gulf Coast in August 2005, the actual wholesale gasoline price rose by 14% the following month. Much of the literature on retail gasoline markets has focused on the asymmetric response of gasoline prices to crude oil shocks, the so-called rockets and feathers phenomenon (for example, see Borenstein (1997), Bacon (1991), and Noel (2007)).8 Recent research on the wholesale gasoline market includes Hastings et al. (2008), which analyzes wholesale prices and the e?ects of new environmental regulations, and studies by The Government Accountability O?ce (2006), the Federal Trade Commission (2006), and the Energy Information Administration (2007). To my knowledge, this is the ?rst dynamic model of the US oil re?ning industry. Re?ners play an important role as an intermediary between upstream crude suppliers and downstream retail markets. A complete analysis of the oil industry must account for the important e?ects of the re?ners’ dynamic decisions. I show that the model ?ts the data well and can be used to generate insights into the pass-through of crude oil shocks and the impacts of re?nery shutdowns on consumers. The model’s
The market power gained by the re?ning industry due to a tight capacity environment is one potential explanation. Others include search costs in the retail market, inventory management by consumers who may ?ll their tank more frequently as prices rise, but are less eager to “top-o?” when prices are falling, and adjustment costs at the re?nery.
8

4

main features include a dynamic decision process, long-term investment choices, and the possibility of plant break-down. The framework could be applied to other energy markets as well as industries, such as shipping, that make large investments in capacity based on expectations of future market conditions. The remainder of this paper is organized as follows. In section 2, I provide an overview of the oil re?ning industry to better understand the complicated problem facing the re?ner. I describe my data in section 3 and lay out a dynamic model of the industry in section 4. Section 5 provides the details of my empirical strategy and I summarize the ?t and results of the model in section 6. Finally, in section 7, I use my estimated parameters to run several counterfactual experiments involving shocks to the price of crude oil, re?ning capacity, and consumers’ price elasticity of demand. Section 8 concludes and provides a discussion of potential extensions.

1.2 The US Oil Re?ning Industry
The oil industry is broadly comprised of several vertically oriented segments. They include crude oil exploration and extraction, re?neries which distill crude oil into other products, pipeline distribution networks, terminals which store the ?nished product near major cities, and tanker trucks which transport products to retail outlets.9 The largest re?ned product, gasoline, accounts for about 50% of total production, while distillate makes up another quarter. A full 68% of output from the oil re?ning industry is used in the transportation industry. Figures 1.1
9 75% of terminals in the US are owned by companies not involved in the upstream exploration and re?ning.

5

and 1.2 provide a description of the production process and average product yields. The main distillation process produces some ?nal products like gasoline, but it is complemented by other units that extract more of the highest valued products. Technical details of the re?ning process and background on the types of crude oil available can be found in the appendix.

Figure 1.1: Production Process

The market for re?ned oil products is large and growing, with the US consuming 388 million gallons of gasoline each day and one quarter of the world’s crude oil.10 Aside from re?ning crude oil into gasoline, re?neries produce many products that are important inputs into other industries. Retail gasoline prices have recently experienced increased variability in the US and in summer 2008 hit an all time high of $4.11 per gallon. Wholesale prices peaked around $3.40 a gallon in the same period.11 Many justify the high prices as a result of the growing demand for gasoline
Annual world consumption of crude oil totals 30 billion barrels, of which 7.5 billion barrels comes from the US. About 60% of crude oil used by re?neries is imported and US consumption of re?ned gasoline represents 40% of world consumption. 11 US regular gasoline, source: EIA.
10

6

Other 9% Residual Fuel Oil 4%

Petroleum Coke 5%

Kerosene Jet Fuel 9%

Gasoline 46%

Distillate 27%

October 2007

Figure 1.2: Average Yields and supply limitations, including the scarcity of crude oil, Middle East uncertainty, hurricanes, and the OPEC cartel. Others claim the high prices result from coordinated anticompetitive behavior by big oil companies. It may be that the strategic capacity investment and utilization choices by oil re?neries play a signi?cant role in a?ecting downstream prices, pro?ts, and consumer welfare.

1.2.1 Competition
Concentration The re?ning industry is fairly competitive, with 144 re?neries owned by 54 re?ning companies in January 2006. About one-half of US production occurs near the Gulf of Mexico in Texas and Louisiana, though there are signi?cant operations in the Northeast, the Midwest, and California. During World War II, the country was

7

divided into Petroleum Administration for Defense Districts (PADDs) to aid in the allocation of petroleum products. Figure 1.3 displays a map of re?nery locations along with delineations of PADDs and PADD districts. PADDs are often used by regulators such as antitrust authorities when assessing market concentration. See table A.2 in appendix D for concentration ratios and Her?ndahl-Hirschman Indices (HHIs) for various PADDs and regions at the re?ner level. The degree of market concentration is clearly dependent upon how one de?nes the relevant geographic market.12

PADD District I 1 2 II 3 4 5 III 6 7 IV 8 V 9

Area East Coast Midwest Upper Midwest Central Plains Louisiana Texas New Mexico Rockies West Coast

Figure 1.3: Re?nery Locations (Scaled by Capacity)

Market De?nition While retail markets for gasoline tend to be very small, markets for wholesale gasoline are relatively large due to the extensive pipeline network use to transport most re?ned products. While a PADD may have roughly approximated a market in 1945, these delineations were made before the pipeline network had been fully developed,
At the national level, the top four re?ners (who each own multiple re?neries) controlled 44.1% of the market in 2007. The HHI for re?ners on the Gulf Coast was about 1,100, which would be classi?ed as moderately concentrated according to the Horizontal Merger Guidelines.
12

8

so they are now just a convenient way to report statistics on the industry.13 A map of major crude oil and production piplines is shown in ?gure 1.4. With important pipelines connecting the Gulf Coast production center to the population centers in the Northeast and the Midwest, I combine PADDs 1, 2, and 3 into one large market for wholesale gasoline. I denote the Rocky Mountain region, PADD 4, as another market, because it is isolated from the rest of the country and imports only limited re?ned product from other regions. Finally, my third market is the West Coast, PADD 5, which includes California, a state that, due to strict environmental regulations, is limited in its ability to use products that are re?ned in other states.

Figure 1.4: Major Re?ned Product Pipelines

Aside from the domestic re?ning industry, US re?ners face limited competition from abroad. While the US is very dependent on foreign oil, domestic production accounts for about 90% of US gasoline consumption, though the import share has
For instance, the Colonial pipeline, which runs from the Gulf Coast up to the Northeast, was built in 1968. Pipelines now carry 70% of all re?ned products shipped between PADDs.
13

9

grown since the mid 1990s. These imports come primarily into the Northeast, which receives 45% of its supply from sources, such as the US Virgin Islands, the United Kingdom, the Netherlands, and Canada. Recent US regulations limiting certain types of fuel additives combined with increased European dependence on diesel fuel has limited the ability of US markets to rely on foreign imports.

1.2.2 Capacity and Utilization
Capacity utilization rates at US re?neries have been steadily rising and are now at their maximum sustainable levels. From 2000 to 2008, the average utilization rate in US manufacturing industries was 77%, while in the re?ning industry it was 91%.14 At the same time, no new re?neries have been built in the US since 1976. In fact, many plants have closed and the number of re?neries has fallen from 223 in 1985 to just 144 today. However, most of these closures were small and ine?cient plants, and those that remain have expanded, so total operable capacity has grown from 15.6 million barrels per day (bbl/day) in 1985 to almost 17 million bbl/day today. However, this ?gure is lower than in 1981, when capacity was 18.6 million bbl/day. The overall number of re?neries along with their production capacity are displayed in ?gure 1.5. The average plant size has increased from 74,000 bbl/day in 1985 to almost 124,000 bbl/day in 2007. Building a new re?nery is very expensive, and environmental requirements and permits create signi?cant hurdles.15 Evidence from a 2002 US Senate hearing
See http://www.federalreserve.gov/releases/G17/caputl.htm. One of the few new plants in development is in Yuma, Arizona. The builder of the 150,000 bbl/day re?nery has spent 30 million dollars over 6 years to acquire all the permits. If not blocked,
15 14

10

19.5 19.0 18.5 Capacity (M BBL / Day) 18.0 Total Refining Capacity Number of Refineries

350

300

250

17.0 16.5 16.0 15.5 15.0 14.5 14.0
19 82 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 0 20 1 02 20 03 20 04 20 05 20 06 20 07

200

150

100

Year

Figure 1.5: Capacity and Number of Re?neries estimated the cost of building a 250,000 bbl/day re?nery at around 2.5 billion dollars, with a completion time of 5-7 years (Senate (2002)). This assumes the various environmental hurdles and community objections are satis?ed. No one wants a dirty re?nery operating near them.16 In May 2007, the chief economist at Tesoro, Bruce Smith, was quoted as saying that the investment costs in building a new re?nery are so high that “you’d need 10 to 15 years of today’s margins [at the time, around 20%] to pay it back.”17 Even without new re?neries, existing re?neries have invested to expand capacity. The distribution of historical investment rates is shown in ?gure 1.6. While the mean investment has been 1.3% per year, the median is zero
construction on the new re?nery will begin in 2009. 16 Commonly referred to as “NIMBY,” an acronym for Not In My Back Yard. 17 The National Petrochemical & Re?ners Association estimates that the average return on investment in the re?ning industy between 1993-2002 was 5.5%. The S&P 500 averaged over 12% for the same period. See “Lack of Capacity Fuels Oil Re?ning Pro?ts” available online at http: //www.npr.org/templates/story/story.php?storyId=10554471 (downloaded: 09/13/2008).

11

No. of Refineries

17.5

as plants tend to make very infrequent investments. Even restricting the sample to non-zero changes as shown in the graph, investments tend to be small, with almost 85% of the non-zero changes less than 10%.
300 0.09

0.08 250 0.07

200

0.06

Frequency

150 0.04

100

0.03

0.02 50 0.01

0 ?100

?80

?60

?40

?20 0 20 Investment (Percent)

40

60

80

0 100

Figure 1.6: Non-Zero Changes in Capacity, All Plants, 1986-2007

Although oil re?ning has historically been an industry plagued by thin pro?t margins, oil producers are now starting to make higher pro?ts from their re?ning business. One simple measure of the pro?t margin at a re?nery is the “crack spread.” For every barrel of crude oil the re?nery uses, technological constraints require that about half of it goes into gasoline production and about a quarter into distillate. So the crack spread, expressed in dollars per barrel, is calculated as:

Crack =

1 ? P rice(distillate) + 2 ? P rice(gasoline) ? 3 ? P rice(crude oil) (1.1) . 3 12

Density

0.05

The crack spread along with the utilization rates of re?neries are shown in ?gure 1.7. The crack spread hit a record high of nearly $30 per barrel in July 2006. Some argue that based on this measure of pro?tability, it is surprising that more re?ners have not overcome the setup costs and entered this industry. The increase in the crack spread after 2000 occurred after the utilization rate had already been at a very high level. This may imply that a re?ner’s ability to pass through their crude oil cost has changed since 2000, perhaps due to the scarcity of crude oil, an increase in industry concentration, or an increase in the demand for gasoline.
100% Utilization Rate 95% Crack Spread Crack Spread ($/Bbl) 25 Utilization Rate 90% 20 15 10 80% 5 75% 1983 1985 1987 1989 1991 1993 1995 Year 1997 1999 2001 2003 2005 2007 0
Crack Spread = 1*Heating Oil+2*Gas-3*Crude July of each year, 2006 Dollars

35 30

85%

Figure 1.7: Capacity Utilization Rate and Crack Spread

While total re?ning capacity has risen in the past 10 years, it has not kept up with demand growth. Capacity of oil re?ners has increased by 10% in the past 10 years, while demand for gasoline has increased about 17%. The gap has been ?lled by higher utilization rates and, to a lesser degree, growing imports. New

13

regulations requiring the shift from MTBE18 oxygenates to ethanol poses a problem for this segment of supply because foreign re?ners have not invested in the facilities to produce ethanol blended gasoline. With capacity tight and supply alternatives limited, even a minor supply disruption (or a major one like Hurricane Katrina) can have a large price impact.19

1.2.3 Re?nery Maintenance and Outages
An oil re?nery is a complex operation that requires frequent maintenance, ranging from small repairs to major overhauls.20 The regular maintenance episodes tend to be short and have minimal impact on production as they are strategically scheduled for low demand periods. Unplanned major outages, by de?nition, can take place at any time and can have a major impact on production capability. The EIA divides re?nery outages into four classes, summarized in table 1.1.

Table 1.1: Re?nery Downtime Type Planned Shutdowns Unplanned Shutdowns Planned Turnarounds Emergency Shutdowns Source: EIA.
Methyl Tertiary Butyl Ether. Following Hurricane Katrina on 9/23/05, capacity fell by 5 MBbl/Day. This represented a full one third of US re?ning capacity. Inventories are also limited as there is only about 20-25 days worth of gasoline in storage at any time. 20 Re?nery maintenance is crucial not only for production sustainability, but also for the safety of the plant. A 2005 ?re at BP’s Texas City re?nery killed 15 workers and injured over 100 more.
19 18

Typical Length of Outage 1-2 Weeks 2-4 Weeks 3-9 Weeks Varies

Frequency Every year Every 3-5 years -

14

Planned turnarounds are major re?nery overhauls, while planned shutdowns bridge the gap between turnarounds. Unplanned shutdowns involve unexpected issues that may allow for some strategic planning of the downtime, but often may force a re?nery to reduce production sub-optimally. Finally, emergency shutdowns are those that cause an immediate plant breakdown like a re?nery ?re. Organization for planned turnarounds typically start years in advance, and cost millions of dollars to implement, in addition to the revenue lost from suspending production. Due to the hiring of outside personnel, major re?neries often have to plan these turnarounds at di?erent times because of the shortage of skilled labor to implement them. Given the typical seasonal variation in product demand, the ideal periods for maintenance are the ?rst and third quarter of the year, though in some northern re?neries, cold winter weather forces shifts in planned downtimes. Even though re?neries consist of several components, such as distillation columns, reformers and cracking units, these components are dependent on one another so a breakdown of any one component can a?ect the production capability of the entire re?nery. Downstream units include hydrocrackers, reformers, ?uid catalytic cracking (FCC) units, alkylation units, and coking units. They are responsible for breaking down hydrocarbons into more valuable products and removing impurities such as sulfur. For example, in a typical re?nery, only 5% of gasoline is produced from the primary distillation process; the rest comes hydrocrackers (5%), reformers (30%), FCC and alkylation units (50%), and coking units (10%). Not all re?neries have all of these components, so such re?neries are even more a?ected when one component goes down (EIA (2007)). 15

At the PADD level, EIA reports that in the 1999-2005 period, re?neries experienced reductions in monthly gasoline and distillate production of up to 35% due to outages. At the monthly frequency, there is little e?ect of outages on product prices. This is primarily because most (planned) outages occur during the lowdemand months when markets are not tight; most outages last less than a month; and the availability of imports, increased production from other re?neries, and inventories provide a cushion to supply. However, major outages, like those caused by a hurricane, still a?ect the downstream prices and pro?tability of all re?neries. Overall, the oil re?ning industry features several economic puzzles, some of which I explore in this paper. While the industry is relatively competitive, re?ners have recently been earning signi?cant pro?ts, as measured by the growing crackspread. However, entrants have yet to overcome the regulations and costs of setting up a new plant and existing ?rms have been cautious in their expansion. As a result, plants run at high rates of utilization, which leads to instability in the face of unexpected capacity disruptions.

1.3 Data
The EIA publishes data on the oil re?ning industry at various frequencies and levels of aggregation.21 I observe monthly district level data, which is publicly
Although monthly plant level data is collected from individual re?neries on EIA form 810, this data remains proprietary and unavailable to academic researchers. A new program, joint with the National Institute for Statistical Sciences (NISS), called the NISS-EIA Energy Micro Data Research Program, may allow access to this data (http://www.niss.org/eia/niss-eia-microdata.html). The dataset includes monthly observations for all re?neries in the US on production, capacity, utilization, and inputs into production. The program is currently on hold.
21

16

available on EIA’s website.22 For every month in the years from 1995 to 2006, and for each of the 9 re?ning districts, I have the following data: • Wholesale gasoline production, sales, and prices. • Wholesale distillate production, sales, and prices. • Crude oil ?rst purchase price and inputs into re?neries. • The capacity utilization rate. This provides 1,296 observations. I also have annual ?rm level data for the same years on the capacity to distill crude oil. The reported capacity, called the atmospheric crude oil distillation capacity, measures the number of barrels of crude oil that a re?nery can process through the initial distillation process. This measure is calculated on a stream-day basis.23 There are 246 unique plants in the dataset, with 179 active in 1995 and 144 active in 2006. Overall, I observe a total of 1,959 plant-year observations. Table 1.2 summarizes the data by district and indicates the market de?nitions I use in my estimation. The number of plants and aggregate capacity are for January 2006.

Proceeding with the district level data on production and utilization combined with capacity at the ?rm level requires some discussion. Implicitly, I must make the
See http://tonto.eia.doe.gov/dnav/pet/pet_pnp_top.asp. There are 9 re?ning districts, including the East Coast, the Midwest, the upper Midwest, the Central Plains, Louisiana, Texas, New Mexico, the Rockies, and the West Coast. 23 Capacity reported in barrels per stream-day equals the maximum number of barrels of oil that a re?nery can process on a given day under optimal operating conditions. Calendar-day capacities assume usual rather than optimal operating conditions, though these two numbers are frequently reported as identical.
22

17

Table 1.2: Industry Summary

Market 1

District 1

States CT, DE, DC, FL, GA, ME, MD, MA, NH, NJ, NY, NC, PA, RI, SC, VT, VA, WV IL, IN, KY, MI, OH, TN MN ND, SD, WI IA, KS, MO, NE, OK TX AL, AR, LA, MS NM CO, ID, MT, UT, WY AK, AZ, CA, HI, NV, OR, WA

No. Plants 14

Ref. Cap. (Mbbl) 659

1 1 1 1 1 2 2 3

2 3 4 5 6 7 8 9

14 4 8 23 27 3 16 35 144

913 171 306 1,812 1,353 42 232 1,220 6,709

strong assumption that all ?rms within a district are identical and respond the same way to shocks. When aggregating to the district, one ?rm that increases production may be cancelled out by another that breaks down. Thus, results from this approach will be meaningful only in terms of assessing the “average” behavior of a ?rm within a district. However, there is signi?cant variation in district production levels as well as in the breakdown episodes described below. Also, aggregating to the district level when I estimate my model avoids having to account for the complicated linear programming problem faced by an individual re?nery. These idiosyncratic di?erences should be smoothed out in the higher level data.

18

1.4 Model
Firms make annual investments to increase or decrease their available capacity. I assume these investments increase or decrease capacity immediately and that ?rms then choose their utilization rates each month. While empirically, some plants make major investments in capacity that take years to complete, the average investment is small and can be completely quickly.24 Though plants require a certain minimum level of maintenance each year (usually carried out just before the summer driving season), running a plant at a high utilization rate in one month increases the probability of a plant breakdown or an extended maintenance episode in the next month. Thus, faced with relatively high product prices or low crude oil input prices (a high re?ning margin or crack spread ), ?rms may want to run their plants at a high rate of utilization to maximize pro?ts. However, this intensive use of capital may increase the possibility of a breakdown next month when prices may be even higher. I model the competitive environment by assuming that plants are price-takers in the market for crude oil but are Cournot competitors with some (small) market power in the downstream re?ned products market. Since I do not observe plant level production choices, the model is best described as a representative-agent Cournot model. In each period, a ?rm optimally chooses its utilization rate in response to its estimate of the aggregate production of its competitors. With the development of a network of pipelines across the US after World War II, markets tend to be large and feature many ?rms producing a homogeneous
24 These small investments, known as capacity creep, include both additional infrastructure and improved through-put of existing capital.

19

product. Firms are di?erentiated not only by their capacity to turn crude oil into gasoline and other products, but also by their technical capabilities to utilize varying types of crude oil in their production. I focus on the capacity di?erentiation and average ?rm behavior to smooth over the technical production heterogeneity.

1.4.1 A Firm’s Problem
Consider the problem of ?rm i in month m.25 I will focus only on gasoline and distillate production by re?neries, since these account for about three-quarters of the production of an average re?nery. Denote production of gasoline and distillate
g d , and the capacity of the re?nery as q iy , where y indexes the current as qim and qim

year. Given the investment behavior of ?rms, I assume that investments in capacity are made only once per year and the resulting capacity is ?xed for the entire year. Let riy denote the investment of the ?rm, expressed as the proportional increase or decrease in capacity. A ?rm’s problem can be written as:
?

M ax

{riy }? y =0

E
y =0

? y ?iy (riy ; xiy ) ,
12

(1.2)

?iy = M ax{uim }12 E m=1
m=1

µm?1 ?im (uim ; xim , q iy ) .

(1.3)

I assume capacity evolves according to:

q iy = q i,y?1 (1 + riy ),
25

(1.4)

I assume that ?rms are individual plants and use the two terms interchangeably.

20

where riy is net of any depreciation of existing capital. The utilization rate can be expressed as:

uim =

qim , q iy

(1.5)

g d . While this is not a classic utilization rate, in that it does where qim = qim + qim

not assess the proportion of available inputs that are actively being used, technical constraints on the proportion of total capacity that can be used to produce gasoline and distillate makes this ratio approximately a scaled down version of the actual rate. ?im (·) is the per-period pro?t function, xim and xiy are vectors of state variables, and ? and µ are the discount rates, with ? = µ12 . Note that q iy appears as a state variable in equation 1.3 and equals last year’s capacity plus or minus the investment made at the beginning of the current year. Throughout a given year, state variables observable to the ?rm include the following:

c Pjm

The price of crude oil An indicator equal to 1 if the ?rm is in a breakdown episode The estimated aggregate competing production by other ?rms in the market A ?rm’s capacity Month & year

Bim Q?i,m q iy T ime

I explicitly include a district j index on the crude oil price because, while I assume this price is exogenous, there are di?erences in the quality and price of oil in di?erent 21

districts. The competing production state is needed to calculate the price of a ?rm’s output. With the large number of ?rms in the industry, each ?rm has only a small impact on the prices of gasoline and distillate.26 Firms form a statistical forecast of competing production as follows:

E [Q?i,m ] = Q?i,m?1 (1 + gm ),

(1.6)

where gm is the historical growth rate of production in the market between months m ? 1 and m. The month of the year is included to capture the obvious and important seasonal e?ects. For example, a re?nery operator may forgo preventative maintenance measures during the summer high-demand period to capitalize on the high prices and pro?t margins. The expectation operator is taken over the future pro?le of the state variables, some of which are deterministic (month and year), others of which evolve according to the ?rm’s choices (capacity and breakdown), and still others are stochastic, for which ?rms base their expectations on historical values (the crude price and competing production). Due to breakdowns, only a portion of q iy will be available in a given month. I denote the available capacity as q ? iy . Because the numerator in equation 1.5 is the volume of downstream products and the denominator is the number of barrels of crude oil that a re?nery can distill, the utilization rate may be greater than 1 in some cases. This occurs because chemicals called blending components are added in the distillation process (such as oxygenates like MTBE and ethanol).
26 With plant-level production data, I could explicitly solve for the (asymmetric) Cournot equilibrium in each period. I plan to adopt this approach in future research.

22

Note that the ?rm’s objective function can be written recursively. Denote V (·) to be the present discounted value of the stream of re?ner’s pro?ts with optimal choices. Then, after dropping subscripts and discretizing the state space, the Bellman equation can be written:

V (x) = M axr ?(r; x) + ?
x

V (x )P (x |x, r) .

(1.7)

Here P (·) is the annual probability transition matrix and it re?ects the transition between average annual values of the state variables. To solve for ?(r; x), I apply backward induction from December back to January. For example, the expected value of a re?ner’s aggregate discounted pro?t from July onward is:

W6 = M axu6 ?6 (u6 ; x6 , q ) + µ
x7

W7 (x7 )P ? (x7 |u6 , x6 , q ) .

(1.8)

Here, P ? (·) is conditional on u and q because plants that do not invest in new capacity and choose to operate more intensively increase their probability of breaking down.

1.4.2 Per-Period Pro?t
Prices are determined at the market level, which I index by k . Per-period pro?t is de?ned as gasoline and distillate revenue less production costs and investment

23

costs. Thus, in month m, pro?ts of ?rm i are:

g g c g , , Bim , Q?i,m , q iy , m, y ) = uim q ? ?im (uim ; Pjm iy [(yield )Pkm (Qkm ; m, y ) (1.9) d + (1 ? yieldg )Pkm (Qd km ; m, y )] c , q? ? Cim (uim ; Pjm iy )

?

1 r C (riy ), 12 iy

where, q? iy = ? ? ? ? q iy if Bim = 0 (1.10)

? ? ? ?q iy if Bim = 1. The term yieldg represents the proportion of available capacity that can be distilled into gasoline. It is ?xed over time and across ?rms. Functional forms for the demand and cost functions will be speci?ed below. The last term in the pro?t function is the investment cost, which is spread equally across the 12 months of a year. Note that ? ? [0, 1) re?ects the percentage reduction in capacity that a re?nery experiences during a breakdown. While I allow this term to vary stochastically, the data suggest this value averages around 0.9 and can fall as low as 0.7. In other words, district level breakdowns occur that result in a 30% reduction in capacity relative to normal levels. It should be noted that a 25% capacity reduction in a given month could result from one week of complete breakdown and three weeks of optimal operation.

24

1.4.3 Demand
The prices of gasoline and distillate are determined at the “market” level. The three markets de?ned earlier are: the East Coast, Midwest and Gulf Coast; the Rocky Mountain region; and the West Coast. The ?rst is by far the largest, with several large pipelines connecting the major production area near the Gulf of Mexico with the population centers on the East Coast and in the Midwest. I estimate the demand for wholesale gasoline (and similarly for distillate) according to:

g g g g g g g log Qg km (Pkm ) = ?0 + ?1 (M onth) + ?2 (Y ear ) + ?3 (log Pkm ? Y ear ) + (1.11) km

P g and Qg are the price and sales of wholesale gasoline. Here I specify a log-linear demand equation with month and year ?xed e?ects to account for the strong seasonal variation and the growth in demand over time. I allow the price elasticity of demand to vary by year to account for the changes in the sensitivity of consumers to prices. Note that the East Coast receives a signi?cant amount of their re?ned product from abroad (mostly from Europe and the Caribbean). Imports increase in periods of high demand or tight supply, as the price must be high enough to justify the transportation costs. Thus the demand for re?ned products from US re?neries may be a?ected by the availability of imports, though robustness checks reveal that the e?ect is small relative to the size of the East Coast’s overall market (which includes the Midwest and Gulf Coast).

25

1.4.4 Probability of Breakdown
Consider the following speci?cation for the likelihood of a plant breakdown or extended period of maintenance beyond the regular minimum level:

P r(breakdown in month m) = F (?ui,m?1 ) =

exp(?0 + ?1 ui,m?1 ) ,(1.12) 1 + exp(?0 + ?1 ui,m?1 )

which assumes the probability follows the logistic distribution. The same speci?cation is used to model the likelihood that a plant recovers from a breakdown next period, conditional on being broken down this period. With more detailed ?rmlevel data, an ordered probit may be the ideal speci?cation, as it would account for both the magnitude and length of the breakdown episode. Modeling the breakdown dynamics based solely upon last month’s utilization rate, and not, say, the average rate over the last six months, is primarily a computational simpli?cation. The results using only last month’s utilization rate are robust to other speci?cations.27 See below for how I de?ne a breakdown using district-level production data.

1.4.5 Production and Investment Costs
I assume the following production cost speci?cation:

c 2 c Cim (uim ; Pjm , q? iy ) = ?0 ? qim + ?1 ? qim + ?2 ? qim ? Pjm ,
27

(1.13)

Speci?cations involving the prior 3-month average rate or last month’s deviation from historical rates yielded similar results. With ?rm-level data on production, one could also include the age of the re?nery and perhaps the length of time since the last signi?cant maintenance period.

26

where qim = uim q ? iy , the ?rm’s actual production of gasoline and distillate in the current month. I assume ?rms face increasing costs as they near their capacity constraint. To model this, I suppose ?rms have a quadratic production cost function and also include a term, ?2 , re?ecting the major input of the re?ner, crude oil. Re?ners take this crude oil price as exogenous since the price is determined on the world market. As ?rms produce near their capacity, they may face increasing costs due to less time for maintenance, excess wear on their capital, and other e?ects that raise their marginal costs. Investments in capacity are available immediately, and capacity is ?xed within the year. This is a strong assumption since ?rms likely make investment decisions far in advance and spread the costs over a long time period. In future work, I will relax this assumption, allowing for a one-year “time-to-build.” Investments come at a cost:

r Ciy (riy ) = ?3 (q i,y?1 riy ) + ?4 (q i,y?1 riy )2 .

(1.14)

The parameters, ?3 and ?4 , re?ect the cost of capacity expansion. They embody both the cost of physical expansion and any regulatory costs faced by the plant. Unfortunately, I will not be able to di?erentiate these two components with currently available data. Note that the investment cost parameters re?ect the cost of a change in the number of barrels of a capacity that is created or destroyed. Large plants may bene?t from economies of scale in capacity expansion as compared with 27

smaller plants, but since I am estimating my model for an average capacity ?rm, this consideration is not necessary.

1.5 Empirical Estimation Strategy
In general, I split the estimation into two stages. I ?rst estimate the demand
g g g g d d d d , ?0 , ?1 , ?2 , ?3 ), via GMM. This is a static relationship be, ?3 , ?2 , ?1 parameters, (?0

tween the market price and quantity. I also estimate the logit parameters governing the probability of breakdown, (?0 , ?1 ), via maximum likelihood. In the second stage, I take the demand and breakdown coe?cients as given and solve the ?rms’ dynamic utilization and investment choice problem using a nested ?xed-point GMM algorithm to recover the cost parameters (?0 , ?1 , ?2 , ?3 , ?4 ) for each market. I allow for the cost parameters to vary each year to re?ect changes in technology over time. I assume an annual discount rate of ? = 0.95, implying a monthly rate of µ = 0.996. When a ?rm enters a breakdown episode, I assume their capacity is reduced by a random amount, ?, which follows a beta distribution with mean 0.9.28 The ?rms’ dynamic problem can be thought of as a ?nite-horizon monthly utilization choice problem nested inside an in?nite-horizon annual investment choice problem. The annual investments in capacity can raise or lower the optimal utilization rate throughout the year, (e.g., a larger investment allows for the same level of
28

Formally, ? ? B(9, 1).

28

output with a lower level of utilization). Recall that the problem can be written:
?

M ax{riy }? E y =0
y =0

? y ?iy (riy ; xiy ) ,
12

(1.15)

?iy = M ax{uim }12 E m=1
m=1

µm?1 ?im (uim ; xim , q iy ) .

(1.16)

The aggregate discounted pro?ts of the ?rm over the course of the year becomes the per-period (annual) payo? of the investment choice problem. Given the frequency with which re?ners adjust their capacity and their utilization rate, this modeling strategy is not only realistic, but it is computationally appealing. Solving the ?nite horizon problem in equation 1.16 is simply a matter of backward induction. The state variables available to the ?rm are the same in both sub-problems, aside from the month of the year, which is only relevant in the utilization choice problem. For the annual investment choice, the ?rm considers the average values of last year’s crude oil price and market production, the proportion of time the re?nery was broken down in the last 12 months, and the current level of capacity.

1.5.1 Demand
The demand parameters, the ?’s, are estimated in the ?rst stage using 2-stage least squares with appropriate instruments. Given the endogeneity of P and Q, I need to ?nd instruments, Zkm , that are correlated with the price, Cov (Pkm , Zkm ) = 0, and unrelated to error term, Cov (
29

km , Zkm )

= 0.29 An obvious cost shifter in

Essentially, I need cost shifters that move around the supply curve to trace out a demand curve.

29

the oil re?ning industry is the price of crude oil, which should be exogenous as it’s determined in the world market. However, it is likely that the market for crude oil and the market for re?ned products are both subject to the same demand shocks, which invalidates the contemporaneous crude oil price as a good instrument. Therefore, I instrument for the price of wholesale products with the lagged crude oil price, indicators of supply disruptions (such as those caused by hurricanes and pipeline outages), and the inventories of gasoline, distillate, and crude oil. These are industry-wide inventories, not just at the re?nery. These should all be related to the price of a re?ner’s products though unrelated to the downstream demand. I can use the R2 from the ?rst stage to test for the correlation between my instruments and the endogenous price. Since I have instrumented for price in the ?rst stage, in the second stage I regress the log of Qkm on the ?tted log price, along with year and month ?xed e?ects.

1.5.2 Breakdown Probability
The parameters of the breakdown logit, ?0 and ?1 , are estimated by maximum likelihood. This is done separately for estimating the likelihood that a breakdown occurs and for the likelihood that a plant recovers once broken down. I de?ne a “breakdown” in district j as a month when the observed utilization rate ujm (published by EIA, re?ecting gross inputs of crude oil divided by the capacity to

30

distill crude oil) drops below ujm , de?ned as:
9 4

ujm

1 = min 9

i=1

1 uim , 4

uj,m?12i .
i=1

So the threshold is the smaller of the contemporaneous average across all districts and the average of the selected district’s production in the same month for the last 4 years. So a breakdown is only triggered when 1) a district is producing relatively less than all other districts in the current month, and 2) the district is producing relatively less than it has historically in the same month. Figure 1.8 displays the breakdown dynamics for districts that experience a breakdown. The plots show that districts that run their plants more intensively in one month are more likely to break down the following month. Once a breakdown episode is started, a district may stay below the threshold for a period of months. The data show that median episode length is 1 month, the mean is 2.3 months, and the maximum is 15 months.30

1.5.3 Production Cost Parameters
The cost parameters, (?0 , ?1 , ?2 , ?3 , ?4 ), are estimated by GMM in the second stage dynamic optimization. In order to solve for the production and investment cost parameters, I need to solve a dynamic optimization problem. To achieve this,
30 The 15 month episode occurred in district 9 (the West Coast) from February 1999 - May 2000. It resulted from two California re?nery ?res at the Tosco Re?nery in Avon on 02/23/99 and at the Chevron Re?nery in Richmond on 03/25/99. The fall in gasoline production from these two ?res was only 7% but due to California’s strict environmental standards for gasoline, shipments from other (less regulated) districts were impossible so prices rose by about 25%. This implies a demand elasticity for retail gasoline of ?0.28.

31

1

0.95

All Months 45 Breakdown Months

Utilization Rate in the Month t+1

0.9

0.85

0.8

0.75

0.7 0.7

0.75

0.8

0.85 0.9 Utilization Rate in the Month t

0.95

1

Figure 1.8: District Breakdowns

32

I ?rst discretize the state space, which includes deterministic time states. The transition probability for the crude price is found using the empirical distribution of its historical series. The transition probabilities between breakdown states depend on the choice variable in the previous period according to the logit estimation done in the ?rst stage. In a given year, the transition matrix for months re?ects moving from one month to the next with certainty. Therefore, I can simplify the analysis by taking advantage of the cyclic nature of the month state. This dramatically reduces the computational time; see Rust (forthcoming). Further details of the estimation algorithm can be found in appendix C. For a candidate parameter vector, I iterate on the policy function until convergence. I then interpolate the policy function on the actual states in my data and estimate the utilization rate for each district-month observation. Since the optimization is preformed at the ?rm level, I aggregate to the market level and form the following moments:

M1 = J ?1
j

(umj ? u ˆmj ) (rijy ? r ˆijy )
i

M2 = Nj?1

where u ˆmj is the average utilization rate in district j and month m and r ˆijy is the estimated investment rate by ?rm i located in district j in year y . I average the utilization rate moments over districts and the investment rate moments over ?rms and then stack them to form a moment vector: M (? ) = (M1 , M2 ) . I then

33

numerically solve the following problem:

M in?

M (? ) ??1 M (? ) ,

(1.17)

where ? is the variance-covariance matrix of the moment vector. With estimated parameters in hand, I estimate the standard errors of the cost estimates using Hansen’s GMM estimator of the VC matrix. Given the matrix G of numerical derivatives, where (for parameter k and moment l)31 ,

Glk =

Ml (? k ) ? Ml (? k ) ?k ? 1%

,

(1.18)

I can then compute:

V C (? ) =

1 (G ??1 G)?1 . N

(1.19)

1.6 Results 1.6.1 Model Fit
I ?rst assess the ?t of the dynamic model by plotting actual and estimated values of key variables in ?gure 1.9. This is an in-sample analysis and shows that, on average, the estimated values match the data fairly well. Prices are estimated very precisely due to the ?exibility gained by including monthly ?xed e?ects. The estimated utilization rate is more variable than the actual rate though the month31

For a 1% window, I perturb the parameter by 0.5% above and below the estimate.

34

Gas Price ($/Bbl)

80 60 40 20 0 1996

Actual Estimated

Distillate Price ($/Bbl) 2006

100

100 80 60 40 20 0 1996 1998 2000 2002 2004 Year 2006

1998 2000 2002 2004 Year

1.5 Firm Capacity Investment (MBbl) 1996 1998 2000 2002 2004 Year 2006 Utilization Rate

15 10 5 0 ?5

1

0.5

0

1996

1998 2000 2002 2004 Year

2006

400 350 300 250 200 150 1996 1998 2000 2002 2004 Year 2006 Crack Spread ($/Bbl) Aggregate Production (MBbl)

30 20 10 0 ?10

1996

1998 2000 2002 2004 Year

2006

Figure 1.9: Model Fit (In Sample)

35

to-month ?uctuations are approximated well. The model does not do as well at predicting the level of investment because ?rms tend to make lumpy investments every few years instead of updating their plant continuously. This means the median investment in any given year is zero and the reduced variation makes identi?cation more di?cult.
Distillate Price ($/Bbl) 2008

Gas Price ($/Bbl)

100 80 60 40 20 0 2006 Actual Simulated 2007 Year

100 80 60 40 20 0 2006 2007 Year 2006 2007 2008

1.5 Firm Capacity Investment (MBbl) 2007 Year 2008 Utilization Rate

5

1

0

0.5

0 2006

?5

1 2 3 4 5 6 1 2 3 4 5 6 Refining Districts

Crack Spread ($/Bbl) 2007 Date 2008

Aggregate Production (MBbl)

400 300 200 100 2006

40 20 0 ?20 2006

2007 Year

2008

Figure 1.10: Model Fit (Out of Sample)

Finally, though the model tracks the movements in the crack spread very well, it tends to predict a value that is below the actual spread. This occurs because the estimated prices of gasoline and distillate are also biased down, because I do not account for inventories in my model. Since a small portion of re?nery production is

36

stored, my estimates of downstream demand are biased up, which pushes down the estimated prices. In ?gure 1.10, I do an out-of-sample test of the model, where I use the parameter estimates based on data through 2006 and simulate the investment and utilization policy of ?rms in 2007. The predicted prices of gasoline and distillate are close to the data for the beginning of 2007 but then begin to deviate. This pattern, also shown in the crack spread plot, is partially a result of unprecedented levels of the price of crude oil in 2007. The model predicts that re?neries should optimally respond to these high input prices by cutting their utilization rate to drive up their product prices and maintain their pro?t margin.

1.6.2 First Stage Estimates: Demand and Breakdown
Tables 1.3 and 1.4 present the results of the ?rst stage demand and breakdown estimations. Most of the demand coe?cients are signi?cant at the 1% level and have the expected signs. The monthly ?xed e?ects estimates show the peak in gasoline demand during the summer months and distillate toward the fall. The elasticity estimates show a growing sensitivity to wholesale gasoline prices over the years. These estimates are higher than those reported for retail gasoline in other studies (see Knittel (2008)). However, unlike the branded retail product, wholesale gasoline is very homogeneous and downstream buyers can more easily substiute to a competing supplier. Also, the ability to store gasoline at terminals would imply the wholesale elasticity should be higher than the retail estimate. The R2 from the

37

?rst stage regression of price on the instruments is 0.87. The logit estimation of breakdown reveals an increasing probability of breakdown as a re?ner runs the plant more intensively. Estimating the probability of breakdown next period conditional on being broken down this period reveals that re?ners with more severe breakdowns are less likely to recover in the next period.

1.6.3 Second Stage Estimates: Costs
The cost coe?cients are generally signi?cant and re?ect a production cost function that is increasing and convex. I display the cost functions at the average values of the estimates in ?gure 1.11 and report all estimates in appendix D, table A.3. The cost functions show that ?rms in market 2, the isolated Rocky Mountain region, are the most sensitive to production changes and have the highest overall production costs. Market 1 enjoys relatively easy access to crude supplies in the Gulf region and has the lowest production costs. The curvature of the production cost functions shows that re?ners face increasing marginal costs as they approach the limitations of their capacity. I use a constant crude oil price of $50/bbl in my estimated production cost function. The estimates of investment cost functions re?ect an almost linear relationship, with the quadratic term often insigni?cant. While the ?gure shows the average investment costs over time, table A.3 displays the increase in expansion costs that re?ners have faced in recent years. The Senate’s (2002) estimated cost of building a new 2,700 barrel/day re?nery was about $27 million. I estimate the cost of the

38

Table 1.3: Demand Estimates
Parameter Gasoline Coefficient Std. Err. Distillate Coefficient Std. Err.

Constant -0.27 0.44 5.20*** 1.73 Year '95 2.51*** 0.60 -0.99 2.43 Year '96 2.64*** 0.64 0.93 2.70 Year '97 3.49*** 0.62 0.90 2.55 Year '98 3.08*** 0.56 -1.38 2.28 Year '99 3.44*** 0.58 -1.82 2.35 Year '00 3.18*** 0.67 -0.19 2.81 Year '01 3.19*** 0.63 0.55 2.62 Year '02 3.59*** 0.61 -0.09 2.47 Year '03 3.72*** 0.65 1.58 2.75 Year '04 3.65*** 0.65 0.52 2.84 Year '05 3.55*** 0.65 1.33 2.96 Year '06 2.84*** 0.61 1.24 2.86 Month 2 0.05*** 0.01 0.04 0.04 Month 3 0.11*** 0.01 0.13*** 0.04 Month 4 0.17*** 0.01 0.18*** 0.04 Month 5 0.22*** 0.01 0.14*** 0.04 Month 6 0.25*** 0.01 0.14*** 0.04 Month 7 0.25*** 0.01 0.11*** 0.04 Month 8 0.28*** 0.01 0.27*** 0.04 Month 9 0.21*** 0.01 0.33*** 0.04 Month 10 0.19*** 0.01 0.39*** 0.05 Month 11 0.13*** 0.01 0.26*** 0.04 Month 12 0.11*** 0.01 0.13*** 0.04 Log(P)*Year '95 -0.81*** 0.13 -1.79*** 0.57 Log(P)*Year '96 -0.81*** 0.14 -2.25*** 0.64 Log(P)*Year '97 -1.07*** 0.13 -2.27*** 0.59 Log(P)*Year '98 -1.03*** 0.12 -1.73*** 0.52 Log(P)*Year '99 -1.08*** 0.12 -1.50*** 0.53 Log(P)*Year '00 -0.91*** 0.14 -1.76*** 0.63 Log(P)*Year '01 -0.93*** 0.13 -2.02*** 0.58 Log(P)*Year '02 -1.06*** 0.12 -1.90*** 0.54 Log(P)*Year '03 -1.04*** 0.13 -2.25*** 0.61 Log(P)*Year '04 -0.96*** 0.12 -1.80*** 0.59 Log(P)*Year '05 -0.88*** 0.12 -1.82*** 0.58 Log(P)*Year '06 -0.69*** 0.10 -1.74*** 0.53 ***, **, * Significant at the 1%, 5%, and 10% level respectively. Dependent variables: log of gasoline and distillate sales. First stage regression of price on hurricane and pipeline disruptions, lagged crude oil price, and stocks of crude oil, gasoline and distillate.

Table 1.4: Breakdown Probability Estimates
Parameter Constant (?0) Utilizationt-1(?1) Conditional on No Breakdown Coefficient Std. Err. -2.40*** 0.44 0.74 0.62 Conditional on Breakdown Coefficient Std. Err. 0.91** 0.45 -4.03*** 0.67

Maximum likelihood estimates. ***, **, * Significant at the 1%, 5%, and 10% level respectively. Dependent variable = breakdown indicator.

39

1200 Market 1 Market 2 Market 3 Cost (Millions of Dollars) 5 Production (MBbl) 10

14

1000 Cost (Millions of Dollars)

12

10

800

8

600

6

400

4 200

2

0

0

0

0

0.5 1 Capacity Investment (MBbl)

Figure 1.11: Estimated Production and Investment Cost Functions same size expansion at around $10 million, further evidence that expanding existing sites is more cost-e?ective than building a new plant.

1.6.4 Policy Function
In ?gure 1.12, I plot the optimal policy function over the course of a year at the average values of the other state variables. The optimal utilization rate increases during the late winter and early spring but then falls o? around April and May, before rising again to a peak in August. A likely explanation is that re?ners, anticipating the high demand summer driving season in July and August, scale back operations in the late spring to prevent the possibility of a breakdown occurring during the peak. This pattern is replicated in most markets and years. Figure 1.13 displays the optimal policy function in 3-dimensional space, varying by 40

both the month of the year and the crude oil price. It shows that re?ners cut back production when the oil price rises, a competitive response to a rising input price. The pattern across months is replicated at each crude oil price.
0.7 Breakdown = No Breakdown = Yes

0.65

Optimal Utilization Rate

0.6

0.55

0.5

0.45

1

2

3

4

5

6 Month

7

8

9

10

11

12

Figure 1.12: Optimal Utilization Rate Versus Month

1.7 Counterfactuals
With a fully estimated dynamic model of the US oil re?ning industry, I can now use the model to determine the e?ects of various shocks that may occur. There are many interesting questions that could be examined with my model given the importance of oil re?ning in US and global energy markets. I focus on three stylized facts that I believe to be particularly important in the following analysis: crude oil prices are rising to unprecedented levels; there is little to no excess capacity in the oil re?ning industry; and end-use consumers of re?ned products are becoming 41

0.7

0.68

0.66

0.64 Utilization Policy

0.62

0.6

0.58

0.56

0.54

0.52

0.5 40 30 20 0 10 Crude Price 2 4 6 Month 8 10 12

Figure 1.13: Optimal Utilization Rate Versus Month and Crude Price increasingly sensitive to the prices they face (See Knittel et al. (2008)). Elasticities may be changing due to the availability of other fuels or because of changing perceptions of the environmental impact of oil usage (see ?gure 1.14). As a result, I will consider 2 experiments: 1. What are the e?ects of an increase in the crude oil price and how do the results change when the demand for re?ned products is more elastic? 2. What are the e?ects of a fall in available capacity and how do the results change when the demand for re?ned products is more elastic?

42

2.5 Gasoline Distillate 2

Absolute Elasticity

1.5

1

0.5

0 1986

1988

1990

1992

1994

1996 Year

1998

2000

2002

2004

2006

Figure 1.14: Price Elasticity of Demand

1.7.1 Methodology
Both counterfactuals are based on the coe?cients and policy functions from 2006, the most recent year in my data. I shock the crude oil price in May to determine the e?ects throughout the peak demand summer months. The shock is permanent and I compute the average e?ects throughout the remainder of the year. I shock capacity in August to approximate the e?ects of a late summer hurricane hitting the Gulf of Mexico. I compute impacts assuming both the actual estimated elasticity in 2006 and an elasticity that is higher by 2.5% (in absolute terms) for both gasoline and distillate. Even this small increase in the sensitivity of consumers is enough to induce a dramatic response. In my sample, the maximum observed real crude oil price is around $70/bbl. However, as shown in ?gure 1.15, crude oil prices have been driven to record levels 43

120 110 100 90 80 70 60 50 40 2005

Real Crude Price ($/Bbl)

2006 Year

2007

2008

Figure 1.15: Crude Oil Price more recently, exceeding $115/bbl (in real 2006 dollars). Thus, I simulate the e?ects of a 20% increase in the price of crude oil to determine the impact on prices of gasoline and distillate and the resulting crack spread. Since the price elasticity of demand is one of the parameters estimated in the ?rst stage and it in?uences the per-period payo? of the ?rm, I must solve my model at each new elasticity estimate. The optimal policy functions change as a result. Since the crude oil price is a state variable, I extrapolate my policy functions to the new crude prices. About one-half of the US re?ning capacity is located on the Gulf of Mexico. Major hurricanes like Katrina and Rita in 2005, and more recently, Gustav and Ike in 2008, reduced US oil re?ning capacity by 25% to 35% and had a major impact on downstream prices and re?ners’ pro?t margins (see ?gure 1.16). Therefore, in my second counterfactual experiment, I simulate the e?ects of a 25% reduction

44

100%

35

95%

30

90% Utilization Rate

25 Crack Spread ($/Bbl)

85%

20

80%
Katrina

15

75%

70%

Utilization Rate Utilization Rate (Gulf) Crack Spread Crack Spread (Gulf)

10

5

65% 1 2 3 4 5 6 7 Month 8 9 10 11 12

0

Figure 1.16: Loss in Capacity: Hurricane Katrina in capacity on downstream prices, the crack spread, re?ner pro?ts, and consumer welfare.

1.7.2 Results of Experiments
The e?ect of a 20% increase in the price of crude oil (from 2006 prices) is shown in ?gure 1.17 and summarized in table 1.5. Note, the price and crack spread changes in the table are the average changes relative to the baseline prediction following the shock for the remainder of the year. The changes in surplus, pro?t and welfare are based on totals for the remainder of the year following the shock. The graphs in ?gure 1.17 show the future path of product prices, the utilization rate, and the crack spread through the remainder of the year.

45

80 Crude Price Actual Counterfactual 60

40

1

2

3

4

5

6 Month

7

8

9

10

11

12

Actual Elasticity Gas Price Gas Price 100 80 60 2 1 4 6 8 Month 10 12 1 100 80 60 2 4

More Elastic

6 8 Month

10

12

Utilization Rate

0.5

Utilization Rate 2 4 6 8 Month 10 12

0.5

0

0

2

4

6 8 Month

10

12

30 Crack Spread 20 10 0 Crack Spread 2 4 6 8 Month 10 12

30 20 10 0

2

4

6 8 Month

10

12

Figure 1.17: Crude Oil Counterfactual: Simulation Table 1.5: The E?ect of a 20% Increase in the Crude Oil Price Actual Elasticity 12.7 8.1 -10.8 -58.3 -37.1 -45.2 More Elastic 10.2 6.7 -30.1 -34.1 -70.8 -49.7

Percent Change Gasoline Price Distillate Price Crack Spread Consumer Surplus Re?ner Pro?t Total Welfare

46

The ?rst column of graphs corresponds to the actual estimated elasticity (in 2006) and the second column of graphs assumes more sensitive demand estimates. The price of gasoline and distillate both rise following the crude oil price shock, though the price increases do not cover the entire cost increase as re?ner pro?ts fall after the shock. The amount of the increase that can be “passed through” to consumers appears to vary over the year. The crack spread graph re?ects this, as it shows that although re?ners are immediately hurt by the crude oil shock, they recover during the summer months by reducing their utilization rates before the spread falls again in September with weaker product demand.

Table 1.6: The E?ect of a 25% Loss in Capacity Actual Elasticity 15.9 9.8 47.9 -69.0 15.4 -11.1 More Elastic 3.0 2.0 11.9 -17.6 -4.8 -11.3

Percent Change Gasoline Price Distillate Price Crack Spread Consumer Surplus Re?ner Pro?t Total Welfare

Comparing the two levels of demand sensitivity, we see that re?ners are less able to pass on the crude price increase to more sensitive consumers, and thus their crack spread is dramatically reduced immediately following the shock. In addition to analyzing the e?ects on prices and pro?t margins, it is interesting to calculate the distribution of welfare between consumers and re?ners. Total welfare declines by 45% in the months following the shock. According to table 1.5, overall welfare falls 47

for both the actual and more sensitive elasticity estimates, although more sensitive consumers end up with a larger share of the surplus following the shock.
65 Capacity 60 55 50 45 40 1 Actual Counterfactual 2 3 4 5 6 Month Actual Elasticity 120 Gas Price 100 80 60 2 1 4 6 Month 8 10 12 1 Gas Price 120 100 80 60 2 4 6 Month 8 10 12 More Elastic 7 8 9 10 11 12

Utilization Rate

0.5

Utilization Rate 2 4 6 Month 8 10 12

0.5

0

0

2

4

6 Month

8

10

12

50 Crack Spread 40 30 20 10 0 2 4 6 Month 8 10 12 Crack Spread

50 40 30 20 10 0 2 4 6 Month 8 10 12

Figure 1.18: Capacity Counterfactual: Simulation Figure 1.18 and table 1.6 display the results of my second counterfactual experiment, in which I reduce the size of the average re?nery by 25%. Again, the table shows the average response to the shocks and ?gure 1.18 shows the longerterm e?ects for di?erent levels of demand sensitivity. My counterfactual assumes that all re?ners are hit equally hard by the shock, though in reality, some plants close completely while others operate even more intensively following events like Katrina. 48

The impact of the shock on the crack spread depends strongly on the demand elasticity. With the crude oil price the same in both cases and the percentage increases in the prices of gasoline and distillate about ?ve times higher at the actual elasticity, the re?ners facing more sensitive consumers bene?t immediately following the shock, though the longer-term crack spread is higher for the less sensitive consumer group. Utilization rates change only slightly following the shock and the real cost is borne by consumers in the form of gasoline prices, which rise by almost 16%, reducing consumer surplus by 69%. In terms of the distribution of welfare, the overall pie decreases by about the same amount in both cases, but at the actual elasticity, the increase in pro?ts at operating re?neries partially o?sets the loss in consumer surplus. However, the more sensitive consumers retain a larger proportion of welfare following the shock. It’s important to note that my measure of total welfare puts equal weight on consumer surplus and re?ner pro?t and makes no consideration for the variability of prices faced by consumers. Given the economy’s extraordinary reliance on gasoline, an extra dollar per gallon paid at the pump may hurt consumers more than it helps re?ners.

1.8 Conclusion
In this paper, I have developed and estimated a new dynamic model of the US oil re?ning industry. Energy markets, and in particular, the production and distribution of gasoline, are a hot topic in both academic research and the popular

49

media. While the focus has tended to be on the upstream supply of crude oil (from both foreign and domestic sources) and the downstream retail stations, relatively little attention has been given to the role that oil re?ners play in the industry. My analysis helps clarify and quantify the crucial role of the re?ners in the transmission of crude oil and capacity shocks into downstream product prices, re?ner pro?ts, and consumer surplus. The model matches the historical data and provides reasonably good out-ofsample predictions of key variables. I show that re?ners are only partially able to pass through crude oil shocks to consumers and this ability varies across months of the year. As consumers have become more sensitive to changes in the price of gasoline, re?ners face an even tougher competitive environment. Capacity disruptions, such as those caused by hurricanes, increase industry pro?ts because the resulting price increase outweighs the loss in pro?ts caused by reduced production. The e?ect on overall welfare is negative, though fairly small because the large loss in consumer surplus is partially o?set by a rise in re?ner pro?ts. My analysis not only models the behavior of re?ners and the role they play in an important energy market, it also may have policy implications regarding optimal environmental regulations. In conversations with re?ners, I found that current regulatory policies regarding both the building of new plants and the expansion of existing sites is the main hurdle that managers face when making their investment decisions. Regulatory policies have, at the very least, contributed to the current situation where capacity is tight and small shocks can have large e?ects. Realizing the importance of production ?exibility in the re?ning industry means that new poli50

cies must balance responsible environmental concerns with incentives for capacity investment to meet the growing demand for re?ned products. There are many extensions to this work that could provide further insights into the industry, though some require access to plant-level data which the EIA is considering making available. While this paper only addresses the production and investment decisions of active ?rms, including the possibility of exit may improve the model. Firms would likely follow a cut-o? rule, exiting if the expected discounted stream of future pro?ts fell below some critical level. Another potentially important determinant of ?rm behavior in this industry is a re?ner’s relationship with upstream crude oil producers. Currently, 60% of re?ners are part of an integrated oil company, and although they bene?t from a consistent supply of their major input, they are also constrained by having to exhaust their partner’s stream of crude oil before seeking other, potentially more cost-e?ective sources. Independent re?ners tend to invest in technologies that allow them to utilize di?erent types of crude oil more ?exibly, though may su?er relatively more when there is a supply disruption. Modeling the decisions of each type of re?ner and the interaction between the two could help clarify the role of these vertical relationships. I leave these extensions for future work.

51

Chapter 2 Consumer Search for Online Drug Information 2.1 Introduction
There is a growing availability of medicinal drug information on the internet. A consumer seeking this complicated information faces the additional hurdle that the providers, e.g., drug companies, government regulators, and informational websites, all may have di?erent incentives for providing accurate and unbiased information. While consumers formerly relied on their doctor as the primary source of information about the drugs they were taking, now they increasingly turn to the internet.1 Use of the internet worldwide doubled between 2004 and 2008.2 When consumers go online, they are more likely to start with a search engine as the number of internet users accessing a search engine grew 69% between 2002 and 2008.3 Thus, it is clear that search engines like Google and Yahoo are an important gateway to the internet. Also between 2002 and 2007, spending on Direct To Consumer Advertisting (DTCA) for prescription drugs by pharmaceutical companies doubled, with a small but growing portion of the online spending via banner ads and paid search
“In 2007, 56% of American adults – more than 122 million people – sought information about a personal health concern from a source other than their doctor, up from 38%, or 72 million people, in 2001.” (HSC August 2008). According to another survey, “approximately 40% of respondents with internet access reported using the internet to look for advice or information about health or health care in 2001.” (JAMA 2003). 2 http://www.allaboutmarketresearch.com/internet.htm. 0.757 billion in May 2004 compared to 1.463 billion in June 2008. 3 http://pewinternet.org/pdfs/PIP_Search_Aug08.pdf. Pew Internet and the American Life Project (2008)
1

52

advertising.4 The pharmaceutical company GlaxoSmithKline spent $2.5 billion dollars on advertising in 2007, of which $29 million (1.1%) was online spending.5 A policy initiated by the Food and Drug Administration (FDA) in 1997 allowed detailed drug information to move to the internet with only essential side-e?ects and information provided in a television advertisement. The following year, DTCA on television more than tripled.6 My goal is to determine how consumers search for information and what characteristics of their query may determine how they navigate through the engine’s results. This analysis focuses on the click behavior of consumers using AOL’s internet search engine. I look at searches for brand name prescription drugs and those for consumer electronics as a comparison group. There are many reasons that consumers search and, like drug queries, a search for an electronics product may be motivated by a desire for product information which may lead to a purchase decision. Restricting to a speci?c group of products also allows me to de?ne search sessions, discussed below, which are more di?cult to determine in the entire universe of search queries. Within drug queries, I analyze the e?ects of DTCA, drug age, and other drug characteristics (such as drug class) on consumer search. Given that consumers search, I also analyze how they do it: how in-depth (length of a search session, number of clicks, session time) and which types of links they click (extensions, ranks). I also analyze the di?erent drill-down behavior between drug and electronics
Source: TNS Media Intelligence. Source: www.Adage.com. Note this does not include paid search advertising. 6 Television DTCA increased from $168 million in 1997 to $613 million in 1998.
5 4

53

searches. This is the frequent practice by users of submitting a query, processing the results, which may include clicking on one or more links, and then revising their initial search query. I focus on drug-related search because typical consumers have limited information about the drugs that they are taking or are thinking about taking. The information also has many dimensions such as e?cacy, side-e?ects, and interactions with other medications. Consumers face a wide variety of information sources both online and o?ine. My study complements the analysis in Day (2006), which investigates how consumers process and understand drug information via o?ine DTCA, though I only consider their initial search. As a result, consumers’ understanding of the information they ?nd is only relevant for this study in how it a?ects the way they search. For example, if consumers are frequently unsuccessful in ?nding the information they seek on dot-com sites, they may be more likely to click on other extensions in future search sessions. The remainder of this paper is organized as follows. In section 2, I provide a description of the data which includes click-through data from AOL, drug information from the FDA, and advertising data from TNS Media Intelligence. Sections 3 and 4 include a descriptive and regression analysis on which types of search results are popular with consumers and how DTCA a?ects online search behavior, both in terms of the frequency and the intensity of search. Section 5 concludes and provides some directions for future work.

54

2.2 Data
AOL Click-Through Data I focus on search and click-through data from AOL which spans a period from March to May, 2006. The data come from AOL Research, who posted the data on the web for research purposes on August 4, 2006. Due the privacy concerns, AOL later removed their own link to the data, but it is still available for download on many other websites.7 The data has been used to study several topics including the determinants of search and how social networks could improve search engine performance.8 To ensure privacy protection, I do not use any information speci?c to individual users and only report aggregate statistics in this paper. The data are a representative sample of over 650,000 AOL users and includes an anonymous user id, a date/time-stamp, a search query, and if the user clicked on a result, the domain portion of the click-through URL and its rank.9 An overview of this data can be found in Chowdhury et. al. (2006). In this analysis, I use the term query for a search event, which may or may not be followed by a click-through on a subsequent search result. If a user submits a query, clicks on a result, and then returns to the same search page (e.g., by clicking back on her internet browser) and clicks again, two observations are reported in the dataset with the same time stamp. If a user clicks on a result on page one of the search results (ranks 1-10), and then moves to page two, two observations are reSee http://www.gregsadetsky.com/aol-data/. See http://www.cond.org/applications/paper3.pdf and http://www.stanford.edu/ koutrika/res/Publications/2008_wsdm.pdf . ~ 9 If a user clicked on the link www.fda.gov/drug/warnings.html, only www.fda.gov is reported.
8 7

55

ported, but with di?erent time stamps. Only organic results and not sponsored/paid results are included in the AOL database. Drug Information To create the database of queries, I use the FDA’s Orange Book, which includes all drugs that have been approved by the FDA and attributes of each. These include the drug’s age (years since FDA approval), drug class (16 broad classes), drug type (prescription, over-the-counter (OTC), or discontinued), and an indicator if drug is the Reference Listed Drug (RLD).10 I select queries appearing in the AOL database that contain an FDA brand name somewhere in the query (i.e., it may appear among other terms). Of the 23,390 drug brand names appearing in the FDA Orange Book, 514 appear in the AOL database and account for 65,038 queries. Advertising Data I also gather data on DTCA for each drug in the sample. I have monthly data from 1994 through 2008 from TNS Media Intelligence. In 2008, the data include 327 drugs and the advertising expenditure is broken down by media type. Figures 2.1 and 2.2 display the growth of DTCA over time and the distribution of 2008 spending across media types. The growth of total DTCA is clearly evident and although TV and magazine advertising accounts for over 95% of total expenditures, spending on the internet is a new and growing outlet. TNS only reports internet ad spending on display or banner ads which appear, for example, across the top of many websites and some search engines. It does not include spending on sponsored/paid search
A drug is an RLD if it is used as the chemical standard when generic versions of the drug are developed. New drugs have to be “bio-equivalent” to the RLD to gain approval by the FDA.
10

56

results which is reported to be twice the size of display ad spending.11
Total Annual Direct to Consumer Advertising (All Rx Drugs)
$6,000.00

Radio
$5,000.00

TV
$4,000.00 Millions of Dollars s

Magazines

Internet

Newspapers

Radio

Internet

$3,000.00

Magazines Newspapers

$2,000.00

$1,000.00

TV

$? 1994 1995 1996 1997 1998 1999 2000 2001 Year 2002 2003 2004 2005 2006 2007 2008

Figure 2.1: Total DTCA Spending on all Prescription Drugs

2008 DTCA By Media
Newspapers 2.1% Internet 2.9% Radio 0.4% Outdoor 0.1%

Magazines 32.0%

TV 62.6%

Figure 2.2: DTCA Breakdown by Media Type

Electronics For electronics queries, I combined lists from consumer reports on popular electronics
11

“Gap Widens in Online Advertising,” The Wall Street Journal, September 4, 2008.

57

product and brand names with a list of manufacturers from tigerdirect.com, a major seller of consumer electronics. This resulted in 804 potential consumer electronics queries, of which 126 appear in the AOL database and account for 509,833 queries. Generating Search Sessions One challenge with analyzing search behavior on the internet is to group a sequence of potentially changing queries and click-throughs together to form a search session. Grouping identical queries together is frequently insu?cient because users often revise their queries throughout a session. Therefore, I consider the following three approaches for de?ning a search session: 1. A sequence of queries with or without click-through such that the query is identical and the time between queries is less than one hour. The query needs to contain one of the drug brand names or electronics product words, but it may also include other words. However, the overall query may not change within a session which means the list of search results that the user sees is not changing. I use this de?nition for determining the popularity and transitions between website extensions and ranks. 2. A sequence of queries with or without click-through such that two adjacent queries are in the same session if any of the words appearing in the ?rst query also appear in the second query. The time between queries is less than one hour.12 I use this de?nition for the “All Queries” column of ?gure 2.1.
There is a potential weakness in this de?nition. The three queries, “?ights to Europe”, “discount ?ights to London”, “hotels in London” would all be classi?ed in the same session though it is likely that the intent of the search changed in third query.
12

58

3. A sequence of “query-topics” with or without click-through where a keyword (such as a drug brand name) appears in all queries though other words may appear and change throughout the session. Again the time between adjacent queries must be less than one hour. This de?nition captures the drill-down behavior that users often exhibit when performing a search. I use this session de?nition for all other tables and regressions in the analysis.

2.3 Descriptive Analysis
Table 2.1 displays basic descriptive statistics of the AOL database including all queries and breakdowns for electronics and drug-related queries. Note that for the ?rst column, I de?ne sessions using method two, while for the drug and electronics sessions, I use method three. The method used for the all queries column is the most liberal in grouping adjacent queries together in a session, which should increase the number of multiple-query sessions. However, there are also many single-query sessions in the overall sample, which actually results in relatively fewer multiplequery sessions compared with drugs and electronics. Compared with electronics sessions, drug sessions are more likely to feature multiple clicks, are shorter in time, though are longer in the number of clicks per session. As a result, the turn-over or average time between clicks is shorter for drug sessions than for electronics. This may be the result of a user in a drug-related session seeking a speci?c piece of information while an electronics session may involve a user attempting to get general information about a product. Electronics queries

59

Basic Statistics: AOL User Data
Observations (Queries) Click-Throughs Num. Session Users Unique Query-Topics Mean Users Per Query-Topic Mean Queries per Session Mean Session Length (Minutes) Mean Time Between Queries Multiple Query Sessions Num. Session Proportion Mult Query Session Mean Queries per Session Mean Session Length (Minutes) Mean Time Between Queries All Queries 35,383,114 19,133,334 16,548,366 651,559 Electronics Queries 509,833 281,557 245,988 110,261 126 875.09 2.07 3.55 3.32 95,298 39% 3.77 9.16 3.31 Drug Queries 65,038 44,885 28,679 17,459 514 33.97 2.27 3.00 2.36 12,442 43% 3.92 6.90 2.36

2.14 2.71 2.38 6,232,843 38% 4.02 7.19 2.38

Table 2.1: Basic Statistics are also dominated by several very popular terms as on average there are 875 users searching each query topic compared with only 34 for drug-related searches. Table 2.2 is a breakdown of search activity and advertising by drug class.13 The two largest classes account for over 42% of the search sessions but only 26% of the advertising spending. The lack of strong correlation between advertising and search is surprising if television ads (the largest component of DTCA) direct consumers to seek more information on the web. I will further investigate this relationship in the regression section below. Two further slices of the data appear in tables 2.3 and 2.4 which show search and advertising activity by age and drug type respectively. Interestingly, though there is fairly high spending for younger drugs (1-3 years old), there is also relatively large DTCA on older drugs with the most on drugs that are 8 years old. Search
Tables displaying the 20 most actively searched and advertised drugs in the sample can be found in tables B.1 and B.2 in the appendix.
13

60

Search Activity by Drug Classes
Class Num. Of Num. Of Drug Class Num. Drugs Sessions central nervous system agents 1 90 7,065 psychotherapeutic agents 2 33 5,080 metabolic agents 7 37 2,752 anti-infectives 5 74 2,560 cardiovascular agents 8 50 2,192 miscellaneous agents 3 21 1,825 hormones 6 48 1,709 antineoplastics 9 36 1,299 respiratory agents 10 21 1,283 gastrointestinal agents 11 23 1,213 topical agents 4 49 1,014 coagulation modifiers 12 7 460 not applicable 16 18 95 nutritional products 14 5 91 immunological agents 13 2 41 biologicals 15 Total 514 28,679 Ad spending is total expenditure on all forms of DTCA in 2005. Num. Of Queries 17,040 11,670 5,869 5,437 4,378 5,189 3,996 3,113 2,514 2,083 2,215 1,040 280 151 65 65,040 Mean Queries Per Session 2.41 2.30 2.13 2.12 2.00 2.84 2.34 2.40 1.96 1.72 2.18 2.26 2.95 1.66 1.59 2.27 Ad Spending (Millions) $666.70 $220.17 $474.76 $92.69 $32.24 $353.92 $344.22 $61.10 $238.51 $417.57 $452.17 $110.21 $0.10 $0.45 $0.17 $0.00 $3,464.98

Table 2.2: Search Activity by Drug Class activity is fairly evenly spread among the di?erent aged drugs though most activity is, again, on the 8 year old subset. The breakdown by drug type reveals far more activity on prescription drugs and even those classi?ed as discontinued as compared with OTC drugs.14 Non-innovator drugs, or drugs that are not designated as the RLD, receive a large share of both search activity and slightly more advertising spending. Next, I analyze the search activity in the sample by looking at the popularity and transitions between various website extensions and ranks. Figures 2.3 and 2.4 display the percentage of clicks on each extension class of website and the percentage on clicks (within the ?rst 10 clicks of a session) on each search result rank. I see that users in drug sessions click on relatively fewer dot-com results compared with electronics related searches. As expected, there is more attention paid to dot-gov and dot-org/net/info sites and this continues to grow in longer sessions (not shown).
The advertising data only includes spending on prescription drugs (hence the $0 for OTC advertising spending), though in some cases I see spending on an OTC drug that has the same trade name as a prescription drug.
14

61

Search Activity by Drug Age
Age <1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Num. Of Drugs 35 26 30 31 25 36 25 30 31 38 23 18 17 17 16 9 6 8 Num. Of Sessions 609 1,636 1,171 2,062 1,064 1,438 1,555 1,693 2,771 2,352 1,285 708 1,110 1,349 1,424 271 280 131 Num. Of Queries 1,368 3,434 2,668 4,437 2,481 2,954 3,503 4,364 6,706 4,724 3,262 1,588 2,344 2,938 2,976 484 561 310 Ad Spending (Millions) $1.02 $442.72 $393.88 $287.30 $110.56 $263.13 $35.76 $8.73 $589.13 $317.41 $249.17 $115.27 $8.70 $323.53 $95.25 $5.30 $0.77 $0.22

18 8 570 1,273 $0.52 19 8 164 398 $0.32 20 9 566 1,389 $108.24 21 8 164 314 $0.61 22 1 384 1,127 $0.00 23 3 123 258 $0.00 24 3 316 867 $0.20 >24 53 3,483 8,312 $107.23 Total 514 28,679 65,040 $3,464.98 Ad spending is total expenditure on all forms of DTCA in 2005. Age is equal to years since FDA approval to March 2006.

Table 2.3: Search Activity by Drug Age The rank popularity ?gure shows that attention by rank in drug sessions is less skewed toward the number one ranked sites. Users are more likely to click further down in the search results. The spike in the rank one popularity for electronics sessions is mostly driven by navigational searches (e.g., a search for apple.com and

62

Search Activity by Drug Type
Num. Of Num. Of Num. Of Ad Spending Type Drugs Sessions Queries (Millions) Prescription 411 23,467 52,796 $3,163 Over-the-Counter 10 584 1,289 N/A Discontinued 93 4,628 10,955 $302 Non-Innovator/RLD 267 18,023 41,214 $1,915 Innovator/RLD 247 10,656 23,826 $1,550 Ad spending is total expenditure on all forms of DTCA in 2005. An innovator is the original developer (pioneer) of the drug.

Table 2.4: Search Activity by Drug Type
90.0% 80.0% 70.0% Percent o of Clicks 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% com gov edu org/net/info us/uk/ca other Drugs Electronics

Extension

Figure 2.3: Extension Popularity in the First 10 Clicks immediate click on the ?rst search result, www.apple.com). I also investigated how users make the transition between extensions and ranks in multiple click sessions. For tables 2.5 and 2.6, I generate sessions using method one to guarantee that the set of search results seen by the user on each click is identical. The table shows that transitions within extensions are less likely in drug search sessions compared with electronics sessions, though they both feature approximately the same exit rates. Transitions from other extensions to dot-gov and 63

45.0% 40.0% 35.0% Percent o of Clicks 30.0% 25.0% 20.0% 15.0% 10.0% 5.0% 0.0% 1 2 3 4 5 6 Rank 7 8 9 10 11+ Drugs Electronics

Figure 2.4: Rank Popularity in the First 10 Clicks
Drug Queries
EXTENSION (t+1) com 52.1% 41.2% 36.6% 40.6% 37.1% 40.9% gov 2.9% 14.2% 3.1% 3.6% 3.3% 1.6% edu 1.2% 1.2% 13.9% 2.2% 1.8% 1.8% org/net/ us/uk/ info ca 9.6% 1.6% 11.5% 1.5% 12.3% 2.3% 20.7% 2.0% 11.7% 11.1% 9.0% 2.5% other 3.0% 1.8% 4.4% 2.9% 4.8% 17.1% exit 29.6% 28.6% 27.3% 28.1% 30.3% 27.1%

com gov edu org/net/info us/uk/ca other

100% 100% 100% 100% 100% 100%

Electronics Queries
EXTENSION (t+1) org/net/ us/uk/ edu info ca 0.5% 3.9% 2.0% 3.1% 10.2% 2.9% 22.8% 6.3% 3.6% 1.1% 21.6% 2.4% 1.2% 5.2% 14.8% 0.7% 5.6% 2.6%

EXTENSION (t)

com gov edu org/net/info us/uk/ca other

com 61.0% 31.9% 36.5% 42.8% 49.0% 39.7%

gov 0.1% 20.8% 1.1% 0.4% 0.2% 0.1%

other 1.9% 1.5% 3.5% 3.2% 3.0% 20.7%

exit 30.6% 29.5% 26.3% 28.5% 26.7% 30.6%

100% 100% 100% 100% 100% 100%

These tables include the probability of transitioning from one extension to another during a search session. All search sessions are included as long as the user clicked on at least one link. A session is defined as a sequence of clicks following the identical query where the time between clicks is less than 60 minutes.

EXTENSION (t)

Table 2.5: Transitions between extensions dot-org/net/info sites are more likely in drug searches. The table on rank transitions reveals that users in electronics session are more likely to ?nd what they are looking for on the rank one result as they are more 64

Drugs

1 2 3 4 5+

1 8.5% 11.2% 8.7% 6.8% 3.5%

2 26.6% 5.0% 6.1% 4.6% 2.2%

RANK (t+1) 3 4 14.8% 9.4% 19.6% 11.5% 4.2% 18.2% 4.5% 4.6% 1.7% 1.9%

5+ 23.5% 26.4% 34.4% 49.0% 56.3%

exit 17.1% 26.3% 28.4% 30.6% 34.4%

100% 100% 100% 100% 100%

Electronics

RANK (t)

1 2 3 4 5+

1 25.0% 12.5% 8.9% 7.0% 4.3%

2 17.4% 11.8% 6.3% 4.3% 2.2%

RANK (t+1) 3 4 9.9% 5.9% 17.6% 9.1% 9.6% 16.6% 4.9% 8.0% 2.1% 2.0%

5+ 14.9% 21.0% 30.8% 46.0% 55.9%

exit 27.0% 28.1% 27.7% 29.7% 33.5%

100% 100% 100% 100% 100%

These tables include the probability of transitioning from one rank to another during a search session. All search sessions are included as long as the user clicked on at least one link. A session is defined as a sequence of clicks following the identical query where the time between clicks is less than 60 minutes.

RANK (t)

Table 2.6: Transitions between ranks likely to revisit it immediately (rank 1 to rank 1). They are also more likely to exit following a click on rank one. Given that sessions here are de?ned as unique queries, it may also be that electronics users are more likely to reformulate/re?ne their query after clicking on the rank one result which is classi?ed as an exit. This turns out to be the case, as shown in the analysis of the potential for query reformulation, or “drill-down,” in ?gure 2.5. Several key features can be seen in the ?gure depicting drill-down behavior. Drug sessions are more likely than electronics session to involve a query followed by a click versus a query without a resulting click. Following a query with or without a click, users in drug sessions are more likely to issue the same query, less likely to

65

Query with Click            (69%/55%) 54% 47% 6% 9% 40% 44%

Drug Queries Electronics Queries

Same Query

Revised Query

Exit

Query without Click         (31%/45%) 33% 27% 14% 19% 54% 54%

Same Query

Revised Query

Exit

Percentages represent the likelihood of each event conditional on a query with (or without) a click on the last observation.

Figure 2.5: Drill Down Behavior revise their query, and approximately equally like to exit as electronics users. If a user submits a query and clicks on a result, they are more likely to maintain the same query on the next click, less likely to revise and less likely to exit the search. It appears that query revisions are an important part of search behavior, but relatively more popular in electronics sessions compared with drug-related sessions. Next, I turn to a more detailed analysis of consumers’ search behavior where I investigate the determinants of both the frequency and intensity of drug related search. Without data on product attributes or advertising for consumer electronics products, the next section will focus only on drug-related search.

66

2.4 Regression Analysis
In this section, I report regression results explaining the determinants of consumers’ search patterns. I look at drug-level regressions to determine how drug attributes a?ect search. Then in the session-level probit regressions, I determine how the intensity of search is a?ected by drug attributes and DTCA. A description of all variables included in these regressions is shown in table B.3 in the appendix.

2.4.1 Frequency Regressions
In the following set of results, I assess how DTCA and drug characteristics a?ect the frequency of search using drug-level data. I include the e?ects of both overall DTCA and also each individual media category that is available. This breakdown by media is important because DTCA via di?erent media channels may have di?erent e?ects on consumer search patterns. Television advertising, especially since the new FDA regulations in 1997 lessening the requirements on what needs to be conveyed during the ad, tends to only highlight the main bene?ts and potential side e?ects of a drug. Magazine ads usually include two pages: one with the highlights of the drug in full color and dramatic fonts, and the other with the details in ?ne print. The internet ads captured in the data are so-called banner ads and would likely have a similar e?ect to television DTCA with only the highlights presented. The same could said for radio and outdoor ads while DTCA in newspapers is likely presenting similar information to magazine ads. Therefore television, internet, radio and outdoor ads, given their lack of detailed information, may have a stronger

67

positive e?ect on search compared to magazines and newspapers. Since I only observe DTCA on prescription drugs, I restrict the analysis by excluding OTC drugs. I investigate the e?ects of a drug’s age on search as well as if the drug is designated as the RLD. Table 2.7 displays the results of a regression where the dependent variable is “total sessions,” or the total number of search sessions I observe in the dataset over the three-month period for a given drug. Sessions are de?ned using method three, so they allow for keywords to change throughout the session as long as the drug name appears in each query.
Dependent Variable: Total Sessions DTCA - Stock Components - Stock Parameter Estimate SE Estimate SE Intercept 59.19*** 23.10 155.30*** 27.82 age 0.26 0.57 1.20** 0.56 dtca 5.17*** 0.75 alltv 2.71*** 0.75 allmags 0.85 0.71 allnewsp -0.26 0.77 allradio 2.15** 0.98 outdoor 1.51 1.42 internet 3.74*** 0.80 rld -11.70* 8.33 -16.06** 7.86 510 510 observations R2 0.21 0.31 Notes: ***, **, * Significant at the 1%, 5%, and 10% level respectively. Drug class fixed effects included but not shown. All advertising variables are in logs.

Table 2.7: Regression Results - Frequency of Search

The two columns of the table each contain a di?erent breakdown of DTCA. Speci?cation one includes the cumulative stock of advertising on a drug from January 1994 through February 2006, just prior to the time-frame of the AOL click-through data. The second speci?cation includes a breakdown of spending by media type. I

68

attempt to limit the endogeneity that may exist between DTCA and search activity by including only DTCA spending prior to the period I observe the search sessions. In addition, most DTCA is o?ine with only 3% of total DTCA in the form of online spending.15 Overall DTCA is positive and signi?cant meaning that increased ad spending leads to an increase in the number of search sessions performed on a drug. Focusing on the breakdowns by media category reveal that television, internet, and radio have positive and signi?cant e?ects, consistent with the notion that these ads provide relatively less detailed information and may leave a consumer wanting to seek out additional sources. Spending on outdoor advertisements is insigni?cant, though the result may be misleading due to this category being the smallest of the types. Also as expected, newspaper and magazine spending is largely insigni?cant which may imply that consumers are able to ?nd all the information they need in these ads. The drug’s age since original FDA approval is positive in both regressions (and signi?cant in the second), which is somewhat surprising, but it may be driven by a few older but very popular drugs. Finally, a drug that is the innovator or pioneering version of a medicine reduces the number of search sessions and drug class ?xed e?ects (not shown in the table) are largely insigni?cant, though central nervous system drugs and psychotherapeutic agents are searched upon relatively more frequently. In Jin and Iizuka (2005), they ?nd that the e?ect of a drug’s DTCA on the propensity of consumers to visit their doctor regarding that drug, depreciates by only
15

Online spending on paid-search advertising is larger, though not included in the data.

69

about 4% per month. However, in Jin and Iizuka (2007), they ?nd that the e?ect of DTCA on the likelihood that a doctor prescribes a drug is small and depreciates almost immediately. In table 2.8 I present the results of two additional speci?cations which assess the rate at which DTCA spending depreciates in terms of its in?uence on search activity.
Dependent Variable: Total Sessions. Depreciation Analysis DTCA - Quarters Components - Quarters Parameter Estimate SE Estimate SE Intercept 115.30*** 23.26 431.57*** 118.21 age 1.77*** 0.55 1.74*** 0.52 dtca_1_qtrb4 3.15*** 1.16 dtca_2_qtrb4 3.29*** 1.39 dtca_3_qtrb4 0.76 1.35 dtca_4_qtrb4 0.65 1.09 alltv_1_qtrb4 8.04*** 2.22 alltv_2_qtrb4 -6.54** 3.01 alltv_3_qtrb4 -1.37 3.10 alltv_4_qtrb4 2.64 2.30 allmags_1_qtrb4 1.25 1.24 allmags_2_qtrb4 2.02* 1.47 allmags_3_qtrb4 -0.38 1.41 allmags_4_qtrb4 -1.43 1.17 allnewsp_1_qtrb4 8.57*** 2.63 allnewsp_2_qtrb4 0.60 2.18 allnewsp_3_qtrb4 -3.77* 2.34 allnewsp_4_qtrb4 2.61 2.61 allradio_1_qtrb4 -1.34 3.87 allradio_2_qtrb4 -4.48 3.69 allradio_3_qtrb4 10.24** 4.47 allradio_4_qtrb4 -2.53 3.97 outdoor_1_qtrb4 15.84*** 4.40 outdoor_2_qtrb4 -8.74 8.10 outdoor_3_qtrb4 -9.17** 4.58 outdoor_4_qtrb4 12.42** 5.63 internet_1_qtrb4 3.30** 1.73 internet_2_qtrb4 0.04 2.07 internet_3_qtrb4 0.13 2.32 internet_4_qtrb4 2.63* 1.99 rld -22.50*** 7.94 -18.15*** 7.54 observations 510 510 R2 0.30 0.43 Notes: ***, **, * Significant at the 1%, 5%, and 10% level respectively. Drug class fixed effects included but not shown. All advertising variables are in logs.

Table 2.8: Regression Results - Depreciation Analysis

The ?rst regression speci?cation includes overall DTCA separately for the

70

last four quarters and the second displays the results of a similar regression on the components of DTCA. The results show that after two quarters (or six months), the positive and signi?cant e?ect of DTCA disappears. However, in the regression on the advertising components, I see that although television and internet advertising are very e?ective in the most recent quarter, the e?ect is zero or even negative for quarters two through four. While magazine spending remains insigni?cant, spending on newspapers in the most recent quarter is strongly positive and signi?cant and then fades for less recent spending.

2.4.2 Depth Regressions
In addition to investigating the in?uence of DTCA on search frequency, I also present evidence on how advertising a?ects the intensity of search. Consumers who are exposed to a drug advertisement on television may go to a search engine for additional details about the drug or information relating to price and purchase availability. This information may come from multiple sources such as the pharmaceutical companies (e.g., p?zer.com), government sites (e.g., FDA.gov), and advertising-driven medical information sites (e.g., webmd.com). Di?erent forms of o?ine advertising may a?ect how intensively a consumer searches these sites. I analyzed several measures of search intensity including the number of clicks in a search session, the length of a session in minutes, and the number of query revisions (drill downs) preformed. One important measure, which is reported here, is the likelihood that a search session goes beyond the ?rst page of results. Since

71

drug information tends to be complicated and has many dimensions, consumers may be more prone to search deep into the results to ?nd accurate and unbiased information about a drug, especially following an advertisement that provides very little detailed information. Table 2.9 displays the results of a probit regression modeling the probability that a user clicks on a result beyond the ?rst page in a search session.
Dependent Variable = 1 if user clicked beyond page 1 of the search results DTCA - Prev. Quarter Components - Prev. Quarter Parameter Mean dY/dX z-stat dY/dX z-stat age 10.000 0.0006*** 2.9482 0.0008*** 4.2705 dtca -10.754 0.0005*** 2.5683 alltv -12.711 -0.0008*** -3.1519 allmags -11.397 -0.0002 -0.7384 allnewsp -13.419 0.0018*** 5.6919 allradio -13.561 0.0022*** 5.2635 outdoor -13.777 -0.0294 -0.0765 internet -12.278 0.0006** 1.7678 rld 0.481 0.0016 0.5704 0.0016 0.6100 28,679 28,679 observations percent concordant 52.1 55.3 Notes: ***, **, * Significant at the 1%, 5%, and 10% level respectively. Drug class fixed effects included but not shown. All advertising variables are in logs.

Table 2.9: Regression Results - Depth of Search

Similar to the regressions in the last section, I report two speci?cations including the e?ects of overall DTCA and its components, focusing on the quarter immediately prior to the search sessions. I report the mean of the variables and as well as their marginal e?ects (i.e., the predicted change in the probability for a one unit change in the independent variable at the mean). The z-statistics are also reported.16
Note that to measure the ?t of the model, I report the percent concordant, which is the percent of observation pairs such that the observation with the higher ordered response corresponds to the higher predicted response. In my sample, only about 5% of sessions involve clicks beyond page one, so the dependent variable in my regression is unbalanced and the predicted probabilities are skewed towards zero. Calculating a pseudo-R2 by de?ning a correct prediction of success as a
16

72

Overall DTCA has a positive and signi?cant e?ect on the likelihood of a more intense search session. Considering the breakdown by media category in the second speci?cation, positive and signi?cant e?ects are found for newspaper, radio and internet ads, which is the same result I found in the regressions on search frequency. The negative e?ect of television ads is consistent with the notion that television ads refer a consumer to the drug’s website for more information and this site often appears high in the ranks. Therefore a consumer simply using the search engine as a navigational tool to reach a predetermined page (e.g. lipitor.com) instead of typing in the URL directly, will have a less intense search session.

2.5 Conclusion
The analysis has shown that consumers seek diverse information about prescription drugs online and their behavior is in?uenced by the online and o?ine advertising to which they are exposed. O?ine advertising not only increases the likelihood that a user searches for a drug, but also increases the depth of search within a search session. Consumers searching for drug information also behave differently than those seeking information about consumer products like electronics. Overall, drug sessions tend to feature more clicks on di?erent search results and these clicks come faster than in electronics sessions. It may be that consumers are seeking speci?c information about a drug and can quickly determine if a search result is going to provide it.
predicted probability above 0.5, as Greene and others suggest, would result in a very high measure of ?t, but only because most of the observed and predicted outcomes are zero.

73

Among the drug searches, activity is evenly spread among younger and older drugs. Advertising spending on those drugs is slightly skewed toward younger drugs though there is still signi?cant spending on drugs that are 8-10 years old. Click patterns within a search session reveal, as expected, more clicks on dot-gov and dot-org/net/info results and the popularity of these sites grows in longer sessions. Consumers may be immediately clicking on the ?rst or second result (usually a dotcom) but then will make a transition away to results with other extensions, perhaps in an e?ort to seek unbiased information. The distribution of clicks by search result rank also reveals that consumers are more likely to click on lower ranked results further down the results page. In the regression analysis, I analyzed the e?ects of DTCA on search frequency and depth. Overall DTCA increases both the frequency and depth of search, though the various types of DTCA (via di?erent media), each a?ect search di?erently. DTCA that provides only a major statement regarding a drug and few additional details such as television and internet banner ads, increase the frequency of search. This may be the result of the FDA regulation stating that these ads must direct consumers to seek additional details at the drug company’s website. If they are simply using the search engine to ?nd this site, their search session will likely be very short and the evidence suggests this e?ect with a signi?cant and negative coe?cient on television DTCA in the depth regression. Finally, the analysis of depreciation shows that the e?ects of DTCA spending disappear after about six months. Television and internet advertising have a strong positive e?ect on search in the near term, though quickly fades even after just three 74

months. Moving forward, I plan to focus on the e?ects of television ads, the largest class of DTCA, on search activity using a detailed dataset from TNS which includes the exact time and placement of an ad during a broadcast. With the growing accessibility of laptop computers including netbooks, consumers are likely reacting quickly to television advertisements and immediately seeking further information on the internet. Combining this with either the AOL dataset analyzed here, or a new dataset from comScore which also tracks household internet use, I can determine the e?ects of television DTCA, including how varying demographics in?uence the e?ects of an ad on consumer search behavior.

75

Chapter 3 Drug Information via Online Search Engines 3.1 Introduction
Search engines are the gateway to the internet as 94% of internet users access engines to ?nd information on the web.1 According to Nielsen Rankings, over 9.5 billion searches were executed on the top 10 search engines in the US in March of 2009, 16.7% higher than the year before. The ?ve largest engines by number of searches are Google (64.2%), Yahoo (15.8%), MSN (10.3%), AOL (3.7%), and Ask (2.1%), with Google driving most of the growth in search.2 The availability of health care and drug information on the internet is arguably one of the more important areas in need of study given the important public health consequences. In this paper, I document the supply and content of this type of information on four large search engines and across time. Given the vast amount of information on the internet, one could study the supply of search results related to many di?erent industries, though I focus on the prescription drug market. A 2008 Nielsen study found that health websites are consumers’ second most important source of medical information behind their doctor. About 50% of the US internet population visited a health-related website in July of 2008. In the Nielsen study, 82.6% of subjects reported having visited a
1 2

See Ghose and Yang (2008). See www.nielsen-online.com.

76

website for health information at some time in the past, and a third of those used a search engine to ?nd the information they were seeking. Overall, drug queries involve the potential for users seeking a wide variety of complicated information, so the summary text and the source (domain and extension) of a search result will likely be important determinants of a user’s attention and click behavior. A complication that one faces when studying the supply and demand of information via a search engine is that it is a very dynamic market that is constantly evolving. The supply (search results) in?uence the demand (consumer search behavior) and vice versa by way of the engine’s ranking algorithm, and this creates an endogeneity problem for the analysis. Since the algorithms are proprietary, it is impossible to know how much, for example, the rank of a search result is purely a function of its relevance to the search query versus a function of the attention garnered from being of a certain rank in the past. One way to mitigate this problem is to average certain metrics across time, which I do frequently in the analysis. There are two types of search results that appear on a search engine when a user submits a query. Organic results are those generated by the engine’s algorithm as being the most relevant to the user’s query. Relevance is determined di?erently by each engine and may include determinants such as past click tra?c and the number of inbound links to a site from other relevant websites. The title and summary text appearing on the search engine is determined endogenously by the engine itself. Sponsored or paid results are those that appear (at times) above, below, and to the right of the organic results. See Athey and Ellison (working paper) and Varian (2007) for details on the auction mechanism and optimal bidding strategies for 77

sponsored results.3 Their placement is driven both by relevance and by the amount that the advertiser has paid to be listed. The title and summary text is chosen by the advertiser. It is often di?cult to distinguish between organic and sponsored results, undoubtedly because the search engine generates revenue from them only when a user clicks on a sponsored result.4 I will analyze the di?erent content and domain extensions between the two types of results, though it is clear that sponsored results tend to be more promotionally driven and, for drug searches, dominated by online pharmacies. Ghose and Yang (working paper) analyze the substitution pattern between organic and sponsored links for a speci?c website address, or Uniform Resource Locator (URL), and generally ?nd that there are positive and asymmetric spillovers from one type of link to the other.5 I consider four large search engines: Ask, Google, MSN, and Yahoo. I do not include AOL’s search engine, which has a similar market share to Ask, though through a partnership with Google, AOL uses Google’s algorithm to generate both their organic and sponsored links.6 Ask also partners with Google to display their sponsored links in addition to Ask’s self-generated links. In the analysis, I show the popularity of di?erent website extensions, also called top-level domains, such as dot-com and dot-gov. Consumers may choose to click relatively more frequently on, for example, a dot-gov site in order to ?nd accurate and unbiased information, knowing that only the US government can register a
See also: Edelman, et. al. (2007) and Ghose and Yang (2008). Sponsored results often appear with a slightly di?erent background than the organic results and in my experience, it is increasingly di?cult to tell them apart. 5 In future work, I hope to extend this type of substitution analysis to drug-queries. 6 See http://www.nytimes.com/2007/04/09/technology/09iht-aol.1.5197096.html?_r= 1.
4 3

78

website with a dot-gov extension.7 These extensions are maintained by the Internet Assigned Numbers Authority (IANA), who regulate which sites can have an address ending in each extension.8 The remainder of this paper is organized as follows. Section 2 provides a description of the data including the list of drugs I use and a method for generating the content of each search result. A descriptive and regression analysis of the data is developed in section 3, including the di?erences in supply and content across engines and the dynamics of a URL’s rank over time. Section 4 concludes and provides some directions for future work.

3.2 Data
Drug Selection To select the list of queries, I started with the 2004 National Ambulatory Medical Care Survey (NAMCS), and determined the 20 most popular drug classes based on “drug visits” which is the number of visits to a doctor in which a given drug is prescribed.9 Of these, I decided to focus on the top 95% of drugs in three National Drug Code (NDC) classes: antidepressants, cholesterol, and diabetes, due to their relatively high advertising intensity online and o?ine. Since NAMCS only contained drugs approved through 2004, I supplemented the list with recently approved drugs
See Huh and Cude (2004), which analyzes medical-related websites to calculate a measure of bias based on the type of information appearing on each page. 8 A complete list of top-level domains and their requirements can be found at: http://www. iana.org/domains/root/db/. 9 See http://www.cdc.gov/nchs/about/major/ahcd/ahcd1.htm.
7

79

in each of the three classes from FDA’s Orange Book.10 Starting in 2006, NAMCS started using a di?erent coding system for all drugs. Each drug can belong to up to four categories which sometimes span the classes from the old NDC system. I use the old class codes in this paper as they are broadly in-line with the new system. This yielded 99 unique brand names that formed the basic search list. I supplemented these queries in several ways. First I paired the top ?ve drugs in each class (based on total search results) with each other to assess queries where a consumer was seeking comparison information about two similar drugs. I also added keywords to the top ?ve drugs in each class where the keywords where determined using Google’s Adwords tool.11 These include risk-related keywords like “interactions” and “side e?ects” as well as sales promotion keywords like “discount” and “price.” Finally, for brand name comparisons and brand names paired with keywords, I included searches with and without quotes. In all, this yielded 458 search queries.12 Crawler Data With the help of two excellent research assistants,13 we designed a web crawler that submitted the list of 458 search queries to four large search engines (Ask, Google, MSN, and Yahoo) every day at 12:00pm during the period from February - September 2007. The crawler saved the top 100 organic search results which appeared on ?rst 10 pages. These ?rst 10 pages also contain sponsored search
See http://www.fda.gov/cder/orange/obreadme.htm. See https://adwords.google.com/select/KeywordToolExternal. 12 See the appendix for the complete list. 13 Chien (Daniel) Yin and Chris Wasko.
11 10

80

results, the number of which varies depending on the query.14 Since the crawler program returned the raw HTML ?les containing the search results for each engine-day-query, we then wrote a parsing program to separate out the following ?elds/variables for each result: rank, title text, summary text, URL (displayed and actual)15 , result type (organic or sponsored), and result position (for sponsored results).
Basic Statistics
Organic Results Ask 4,200,829 Google 10,604,000 MSN 10,724,339 Yahoo 10,725,091 Sponsored Results Ask 3,384,565 Google 2,667,023 MSN 3,949,351 Yahoo 6,364,854 Date Range Feb - Sep 30, 2007* Unique Queries 458 Query Types Drug Name Only 99 Drug + Informational Keyword 195 Drug + Promotional Keyword 96 Drug + Drug 68 Drug Classes Depression 161 Cholesterol 133 Diabetes 164 *Data from the Ask search engine is only available through May and does not include the organic links ranked 91-100.

Table 3.1: Basic Statistics Table 3.1 displays the basic statistics of the data collected by the crawler. Due to a parsing error, only a limited sample was gathered from the Ask search engine.
We faced several challenges in collecting the data including adapting to formatting changes on each engine that occurred during the time period and adding a random time increment between queries to avoid the search engine (correctly) ?agging us as a crawler. We assume our own search activity has minimal impact on the supply of search results. 15 These are frequently di?erent especially for sponsored results which are routed through the search engine ?rst (so the engine can charge the advertiser) before taking the user to their destination page.
14

81

Classi?cation Algorithm In order to determine the type of search results that were appearing following each query, I devised an algorithm to classify each search result as being either informational, promotional, or neutral. This was accomplished with the following steps: 1. For all 4 engines and for one week, ?rst collect all words appearing in the top 100 organic and all sponsored search results following two types of queries: • drug name + “buy” or drug name + “cheap” (likely promotional sites) • drug name + “information” or drug name + “side e?ects” (likely informational sites) where drug name was one of the 99 brand names in the sample. I do this separately for titles and summaries and for organic and sponsored results, which provides 8 lists of words. 2. Create a frequency table of all of the words appearing in each list and save the top 200 most popular words in each list. 3. Eliminate any words that appear in both categories (informational and promotional) and save the top 50 unique words in each category.16 4. Analyze every search result in the database and calculate the proportion of words in each text ?eld that also appear in the corresponding top 50 list. E.g., an organic summary text ?eld may have 25% promotional words and 10% informational words.
The uniqueness requirement also eliminates common words that frequently appear in text ?elds, but are unhelpful in classifying content.
16

82

With these proportions in hand, I can form a metric called the “average content” of a search result which is simply the di?erence between the proportion of words that are promotional and the proportion that are informational. I can also create a binary indicator of content and, for example, classify a result as promotional if it contains a relatively higher proportion of promotional keywords. The keywords used in the classi?cation are shown in table C.2 in the appendix. Note that some of the words are actually numbers, which are very common in promotional results, and therefore helpful in their classi?cation.

3.3 Descriptive Analysis 3.3.1 Supply
Total Supply 100% 90% 80% 70% Percent of Resul lts 60% 50% 40% 30% 20% 10% 0% Ask Google Msn Search Engine Yahoo
Organic Sponsored

Figure 3.1: Distribution of Organic and Sponsored Results

Figure 3.1 show the overall supply of results on each engine. Note that even

83

with the limited sample from the Ask engine, it has relatively more sponsored results than the other engines given that it displays its own results along with those from Google. Of the other 3 engines, while they all have about the same number of organic results, Google has the largest proportion. There are usually 100 organic results collected per query-day, but for some queries there are fewer.17
Extension Supply - Organic Links 100% 90% 80% 70% Percent of URL Ls 60% 50% 40% 30% 20% 10% 0% com gov edu org/net/info us/uk/ca Extension other
Ask Google Msn Yahoo

Figure 3.2: Extension Popularity - Organic Results

Organic and sponsored result popularity by extension are shown in ?gures 3.2 and 3.3 respectively. Google’s organic results feature the fewest dot-com and the most dot-gov, dot-edu and dot-org/net/info results.18 MSN has the largest percentage of dot-com results and fewest dot-govs. Among sponsored results, most have dot-com extensions, except for MSN who has relatively more dot-org/net/info
In theory, with 235 days and 458 search queries, I could observe a maximum of 235*458*100 = 10,763,000 observations per engine. For Google, MSN, and Yahoo, I observe 99% of this theoretical maximum. 18 The di?erences between the engines are largely statistically signi?cant. For example, Google’s percentage of dot-gov results is statistically higher than each of the other three engines.
17

84

Extension Supply - Sponsored Links 100% 90% 80% 70% Percent of URL Ls 60% 50% 40% 30% 20% 10% 0% com gov edu org/net/info us/uk/ca Extension other
Ask Google Msn Yahoo

Figure 3.3: Extension Popularity - Sponsored Results extensions among their sponsored results.
Average Rank
Google Msn Yahoo

60

50

40 Average Ran nk

30

20

10

0 com gov edu org/net/info Extension us/uk/ca other
*I have omitted the results from ASK in this graph because there were often missing data from page 10 (results 91-100)

Figure 3.4: Average Rank by Extension - Organic Results

I ?nally break down the average rank of organic results by extension in ?gure

85

3.4. I omit Ask because of the parser problem. If the results were spread evenly, they should have a mean of 50, but here dot-gov sites tend to be pushed toward the top of the page (lower numbered ranks). Of the three engines, the dot-gov sites on Yahoo are most likely to appear high in the search results.

3.3.2 Content
Using the rank popularity from the AOL click-through database (among all queries), I calculate an attention index for each organic rank because links appearing toward the top of the results are more likely to receive a click than those lower in the results. The index is simply the proportion of clicks on each organic rank, from 1 to 100.19 Then I calculate the percent of organic results weighted by the attention index for which their summaries are classi?ed as promotional, information, or neutral. E.g., a result is promotional if it contains a higher proportion of promotional keywords compared with informational. Figure 3.5 displays the attention weighted content of each engine. MSN’s results tend to be more promotional than other engines and Google’s are more informational. Classi?cation re?ecting the actual proportions are reported in the kernel density ?gures. Figure 3.6 is the same breakdown for sponsored results. Here, Google and Yahoo tend to be relatively more promotional and Ask and MSN are more informational. Figures 3.7 and 3.8 display the organic and sponsored summary content broFor example, because users click on the ?rst result much more often than other results, the ?rst rank receives a weight of 0.423 while the ?fth rank has a weight of 0.049.
19

86

Attention Weighted Organic Summary Content 100% 90% 80% 70% Percent of URL LS 60% 50% 40% 30% 20% 10% 0% Promotional Informational Class Other
Attention determined by populatity of ranks from AOL database.

Ask

Google

Msn

Yahoo

Figure 3.5: Content of Summary Field - Organic Results
Attention Weighted Sponsored Summary Content 100% 90% 80% 70% Percent of URL LS 60% 50% 40% 30% 20% 10% 0% Promotional Informational Class
Attention determined by populatity of ranks from AOL database.

Ask

Google

Msn

Yahoo

Other

Figure 3.6: Content of Summary Field - Sponsored Results ken down by extension. For organic results, dot-com and dot-gov sites are more informational, while surprisingly, dot-edus are relatively more promotional for all engines. Further investigation revealed that, e.g., the engines are picking up on comments left on university bulletin boards by online pharmacies trying to sell their

87

Ask's Organic Links - Summary Content by Extension
60% 50% Proportion of Links 40% 30% 20% 10% 0% -10% com gov edu org/net/info us/uk/ca other Promotional Proportion of Links Informational 60% 50% 40% 30% 20% 10% 0% com E Extension i

Google's Organic Links - Summary Content by Extension
Promotional Informational

gov

edu

org/net/info E t i Extension

us/uk/ca

other

MSN's Organic Links - Summary Content by Extension
60% Promotional 50% Propor rtion of Links 40% 30% 20% 10% 0% com gov edu org/net/info Extension us/uk/ca other Informational Proport tion of Links 50% 40% 30% 20% 10% 0% 60%

Yahoo's Organic Links - Summary Content by Extension

Promotional Informational

com -10%

gov

edu

org/net/info

us/uk/ca

other

Extension

Figure 3.7: Content of Summary Field - Organic Results - By Extension
Ask's Sponsored Links - Summary Content by Extension
100% 90% 80% Proportion of Links 70% 60% 50% 40% 30% 20% 10% 0% com gov edu org/net/info us/uk/ca other Extension Promotional Informational Proportion of Links 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% com gov edu org/net/info Extension us/uk/ca other Promotional Informational

Google's Sponsored Links - Summary Content by Extension

MSN's Sponsored Links - Summary Content by Extension
100% 90% 80% Pro oportion of Links Prop portion of Links 70% 60% 50% 40% 30% 20% 10% 0% com gov edu org/net/info us/uk/ca other Promotional Informational 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% com

Yahoo's Sponsored Links - Summary Content by Extension

Promotional

Informational

gov

edu

org/net/info Extension

us/uk/ca

other

Extension

Figure 3.8: Content of Summary Field - Sponsored Results - By Extension drugs. As for sponsored results, Ask, Google and Yahoo feature mostly dot-coms and these tend to be relatively promotional as expected. Sponsored results ending

88

in dot-org/net/info tend to be informational in content. MSN is unique in that all of its sponsored results tends to be more informational, in line with the large proportion of their results ending in dot-org/net/info.

3.3.3 Rank and Content Comparisons
An additional approach to comparing the results from a query across search engines is to analyze the rank and contents of a set of organic results. In ?gures 3.9 and 3.10, I display a comparison of Google and Yahoo. The ?rst scatter shows the ranks of identical URLS (following the same query on the same day). The di?erences in algorithms is clear from the ?gure and a weak correlation of 0.37. However, when comparing the proportion of promotional keywords on the same two engines, a stronger correlation (0.44) is revealed. Thus, though the algorithms di?er in how they rank the results, the process for selecting which words and phrases to include in the summary text is similar. Repeating this for other engine comparisons reveals roughly the same pattern though the correlations are less strong. Though the rank of a search result may be very di?erent on a given day, there may be some relationship between the changes in the rank over time. In ?gure 3.11, I track the rank of the same URL (following the same query) across time for the three engines for which I have complete data. Here I see that the ranks on MSN and Google are fairly stable though there are frequent spikes in Yahoo’s rank. These may be due to algorithm testing by the engine throughout the year.20
In the future, I will analyze how exogenous shocks, such as a FDA news story about a drug, a?ect the rank dynamics of speci?c URLs or extension classes across search engines.
20

89

Rank Comparison

100 90 80 70 Yahoo Ran nk 60 50 40 30 20 10 0 0 20 40 60 80 100 Google Rank

Figure 3.9: Rank Comparison - Organic Results - Google vs Yahoo

3.3.4 Kernel Density Plots of Content
As a ?nal analysis of the content di?erences between search engines and across di?erent extensions and result types, I estimate Gaussian kernel density distributions using the di?erence between the proportions of promotional and informational keywords in each result. I ?rst drop the search results that have no promotional or informational keywords (i.e., those that would be classi?ed as neutral/other). The variable plotted (PropPromo - PropInfo) ranges from -1 to +1 with -1 corresponding to a result that is completely informational and +1 meaning the result was completely promotional. A value of zero means that a result contained an equal (non-zero) number of informational and promotional keywords. Figure 3.12 shows that organic results on all engines tend to be more infor90

Promotional Proportion Comparison
20% 18% 16% 14% Yahoo Prop portion 12% 10% 8% 6% 4% 2% 0% 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% Google Proportion

Figure 3.10: Summary Content Comparison - Organic Results - Google vs Yahoo
URL Dynamics, Query = zocor
Google 60 MSN Yahoo!

nlm.nih.gov/medlineplus/druginfo/medmaster/a692030.html
50

40 Rank

30

20

10

0 20070601 20070209 20070223 20070309 20070323 20070406 20070420 20070504 20070629 20070727 20070824 20070907 20070921 20070518 20070615 20070713 20070810

Date

Figure 3.11: Organic Rank Dynamics

91

Organic Summary 12 10 Density 8 6 4 2 0 ?0.5 ?0.4 ?0.3 ?0.2 ?0.1 0 0.1 0.2 <?? Informational Promotional ??> Sponsored Summary 8 6 4 2 0 ?0.5 Ask Google MSN Yahoo! 0.3 0.4 0.5 Ask Google MSN Yahoo!

Density

?0.4

?0.3

?0.2 ?0.1 0 0.1 0.2 <?? Informational Promotional ??>

0.3

0.4

0.5

Figure 3.12: Kernel Density of Summary Content mational with the spike around -0.05, though MSN has the highest density of more promotional sites. Sponsored results tends to be either very information or very promotional, as revealed by the heavy tails in each distribution. MSN tends to have the most results with informational sponsored results (consistent with the large percentage of dot-org/net/info sites in their sponsored results). In the appendix, I display a breakdown of summary content of organic and sponsored results by extension (see ?gures C.1 and C.2). For organic results, dotcom and dot-gov results are again shown to be more informational for all engines. Dot-edu sites display about just as many promotional as informational sites, with the heaviest tail for promotional content on Google. For dot-org/net/info, most

92

of the results are informational as expected. Among the sponsored results, dotcom sites again account for most of the sponsored results and tend to be either very informational or very promotional. There are very few dot-gov and dot-edu results among the sponsored results and the dot-org/net/info sites tend to be very informational.

3.3.5 Probit Analysis
Finally, I report the results of a simple probit regression analyzing the determinants of rank in a search engine’s results. While I could consider the likelihood that a URL achieves a given rank using an ordered probit approach, since most users do not venture beyond the ?rst page of search results, I only consider the probability that a result appears on the ?rst page. In this regression, I consider organic results from March 2007 on all 4 engines in the sample. Since characteristics of individual drugs (like drug age and advertising intensity) do not vary across the ranks in the search results, I cannot analyze the in?uence of these variables on where a URL appears. However, I can interact them with the extension of a search result’s URL since they do vary by rank. I can then determine, for example, how the age of a drug may a?ect the likelihood of a dot-gov URL appearing on page one of the search results. Table 3.2 displays the result of a probit estimation on the probability that a result appears on page one as a function of website extensions, extension/age and extension/advertising interactions, and result summary contents. De?nitions for

93

Parameter Intercept dotcom dotgov dotedu dotorgnetinfo dotintl dotcom_age dotgov_age dotedu_age dotorg_age dotcom_dtc dotgov_dtc dotedu_dtc dotorg_dtc prop_promo_summary prop_info_summary prop_promo_title prop_info_title observations percent concordant

Ask Estimate -1.711*** 0.492*** 1.049*** 0.357*** 0.465*** 0.289*** 0.001 -0.016*** 0.021*** -0.007*** -0.222*** 0.530*** -2.036*** 0.215*** -0.730*** 1.468*** 1.893*** 0.771*** 148,718 61.6

Dependent Variable: Pr(Page1) Google MSN SE Estimate SE Estimate 0.023 -2.220*** 0.039 -1.704*** 0.025 0.670*** 0.040 0.312*** 0.049 0.810*** 0.051 0.732*** 0.110 -0.481*** 0.107 0.127 0.033 0.553*** 0.046 0.222*** 0.030 0.069* 0.049 -0.033 0.001 0.002** 0.001 0.007*** 0.003 0.006*** 0.002 -0.010*** 0.006 0.067*** 0.005 -0.034*** 0.002 0.000 0.002 0.010*** 0.030 -0.220*** 0.030 0.098*** 0.104 0.379*** 0.095 -1.284*** 0.500 -0.219 0.173 0.870*** 0.064 0.675*** 0.058 -0.663*** 0.135 -0.945*** 0.155 -2.393*** 0.086 5.894*** 0.075 4.500*** 0.086 -2.113*** 0.153 -2.220*** 0.039 1.467*** 0.031 1.524*** 176,700 176,431 72.5 65.9

SE 0.032 0.033 0.053 0.158 0.040 0.046 0.001 0.003 0.011 0.002 0.025 0.211 0.331 0.074 0.145 0.105 0.133 0.046

Yahoo Estimate -1.730*** 0.220*** 0.804*** -0.141** 0.247*** -0.121*** 0.003*** 0.001 0.035*** 0.010*** -0.060** -0.181** -1.625*** 0.120** -0.163 4.642*** 0.091 0.921*** 176,680 65.7

SE 0.023 0.025 0.041 0.069 0.032 0.035 0.001 0.003 0.004 0.002 0.029 0.099 0.249 0.061 0.143 0.083 0.086 0.034

Notes: ***, **, * Significant at the 1%, 5%, and 10% level respectively. Omitted categories: extension = other, summary content = other, title content = other, drug class = diabetes. Organic results from March 2007.

Table 3.2: Regression Results: Probit of Pr(Page 1) all variables used in the regression are summarized in table C.3 in the appendix. Promotional sites are uniformly pushed down and informational sites are more likely to appear on page one. Dot-gov sites are the most likely of all extensions to appear on page one with the greatest e?ect for Google’s engine. In all but Ask’s engine, dot-edu and international sites tend to get pushed o? of page one. The interaction terms reveal that, for most engines, older drugs are more likely to have dot-com sites appearing on page one. The reason may be that younger drugs have few promotional dot-com sites appearing high in the ranks.21 I am unable to perform a similar analysis predicting the probability that a sponsored link appears on page one because the dataset does not include the page
21

Note that to assess the ?t of the model, I report the percent concordant, as explained in chapter

2.

94

on which a sponsored link appears and there are often a di?erent number of sponsored links on each page. However, since I do observe the rank, an ordered probit predicting sponsored links’ overall rank revealed that, as expected, dot-com sites are more likely to appear high in the ranks and advertising intensity does not have a consistent e?ect on rank (via its interaction with the website extensions).

3.4 Conclusion
In addition to many o?ine sources, there is a large and diverse quantity of prescription drug information accessible online. Consumers are likely ?ltering this information and making their decisions about which sites to visit based on a search engine’s results page, which includes the result’s rank, title and summary text, classi?cation as organic or sponsored, and the extension of the URL. I have shown that the information varies signi?cantly across engines, over time, and between di?erent website extensions. The descriptive analysis shows that Ask has relatively more sponsored links compared with other engines, perhaps because of their agreement to deliver sponsored links from Google along with those generated from their own algorithm. Google’s organic results feature relatively more dot-gov and dot-edu links and MSN’s engine returns the most dot-com results. On all engines, dot-gov sites appear higher in the ranks compared with other extensions, because the engines’ algorithms rank them higher for their relevance and/or because users frequently click these results. I also analyze the content of the summary text in order to classify individual

95

results as informational, promotional or neutral. Overall, Google’s results are relatively more informational and MSN’s the most promotional, in line with the popular extensions on each engine. However, classifying websites solely on their extension may be misleading as I found that dot-edu sites actually tend to be more promotional. Among sponsored links, dot-com results are by far the most popular and, as expected, they tend to be relatively more promotional. Kernel density estimates con?rm these results and also show that sponsored links tend to be either very informational or very promotional, as revealed by heavy tails in the distribution. Since the website owners are paying for each click on a sponsored link, they are likely trying to provide a very clear summary of what information the user will ?nd if they click on the result. Finally, the probit analysis revealed that informational sites are more likely to appear on page one of the results. Dot-gov sites are also relatively more likely to appear high in the results and the e?ect is largest for Google’s engine. By including interaction terms of the drugs’ ages and website extensions, I also show that younger drugs are less likely to have dot-com results high in the ranks. In future work, I hope to track the dynamics of speci?c URLs (e.g., an fda.gov site) following a major news story about a drug being issued by the FDA. I expect to see a displacement of more promotional dot-com sites by the informational sites. While some analysis can be accomplished with the current dataset, other research will be possible once I have a complete picture of both the supply and demand for drug information from the same time period. I will soon have access to data from comScore’s Media Metrix product which includes individual click-through behavior 96

for a set of consumers in the same time period as the crawler data. Matching these two data sources will allow me to investigate both the probability of a click as a function of result characteristics (e.g., rank, content, and extension) as well as determine the substitution/complementary e?ects of organic and sponsored links appearing in the same set of search results.

97

Appendix A Chapter 1 Supplement A.1 The Distillation Process
Since the various components of crude oil have di?erent boiling points, a re?nery’s essential task is to boil the crude oil and separate it into the more valuable components. Figure A.1 displays a simpli?ed diagram of a typical re?nery’s operations. The ?rst and most important step in the re?ning process is called fractional distillation. The steps of fractional distillation are as follows:

1. Heat the crude oil with high pressure steam to 1, 112 degrees fahrenheit. 2. As the mixture boils, vapor forms which rises through the fractional distillation column passing through trays which have holes that allow the vapor to pass through. 3. As the vapor rises, it cools and eventually reaches its boiling point at which time it condenses on one of the trays. 4. The substances with the lowest boiling point (such as gasoline) will condense near the top of the distillation column.

While some gasoline is produced from pure distillation, re?neries normally employ several downstream processes to increase the yield of high valued products by removing impurities such as sulfur. Cracking is the process of breaking down large hydrocarbons into smaller molecules through heating and/or adding a catalyst. Cracking was ?rst used in 1913 and thus changed the problem of the re?ner from 98

Figure A.1: Re?nery Operations choosing how much crude oil to distill into choosing an appropriate mix of products (within some range). Re?neries practice two main types of cracking: • Catalytic cracking: a medium conversion process which increases the gasoline yield to 45% (and the total yield to 104%). • Coking/residual construction - a high conversion process which increases the gasoline yield to 55% (and the total yield 108%). The challenge of choosing the right input and output mix given the available technology creates a massive linear programming problem.

A.2 Crude Oil Quality
Crude oil is a ?ammable black liquid comprised primarily of hydrocarbons and other organic compounds. The three largest oil producing countries are Saudi 99

Arabia, Russia and the United States.1 Crude oil is the most important input into re?neries and this raw material can vary in its ability to produce re?ned products like gasoline. The two main characteristics of crude that determine its quality are American Petroleum Institute (API) gravity and sulfur content. The former is a measure (on an arbitrary scale) of the density of a petroleum liquid relative to water.2 Table A.1 summarizes these characteristics and includes some common crude types and their gasoline yield from the initial distillation process.

Table A.1: Crude Qualities API Gravity < 22? 22? ? 38? > 38? Sulfur Content < 0.7% Heavy Sweet Medium Sweet Light Sweet - 30% yield (WTI, Brent) > 0.7% Heavy Sour - 14% yield (Maya, Western Canadian) Medium Sour - 21% yield (Mars, Arab light) Light Sour

Source: EIA.

Worldwide, light/sweet crude is the most expensive and accounts for 35% of consumption. Medium/sour is less expensive and accounts for 50% of consumption while heavy/sour is the least costly and accounts for 15%. Figure A.2 show how the average crude oil used by US re?ners is becoming heavier and more sour over time. This means that the production costs of a gallon of gasoline are changing as re?neries must invest in more sophisticated technology in order to process lower
Production in this sense refers to the quantity extracted from a country’s endowment. Technically, API gravity = (141.5/ speci?c gravity of crude at 60? F) ?131.5. Water has an API gravity of 10? .
2 1

100

quality crude oil. Since crude oil by itself has very little value to any industry, the price of a barrel of oil re?ects the net value of the downstream products that can be created from it. The two major sources of movements in the crude oil price are upstream supply shocks (due to OPEC’s quotas and hurricanes a?ecting oil rigs in the Gulf of Mexico) and downstream demand shocks (due to consumer’s demand for re?ned products). The other source often sited by industry experts are re?nery inventories of crude oil. Maintaining stocks of crude oil allow the re?nery to respond quickly to downstream shocks like an unexpectedly cold winter increasing the demand for heating oil.
1.5 Sulfur Content API Gravity 33

1.4

1.3 API Gravity (Degrees) Sulfur Content (%) 32 1.2

1.1 31 1.0

0.9

0.8
19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 20 02 20 03 20 04 20 05 20 06

30

Year

Figure A.2: Average Crude Oil Quality: Heavier and More Sour

Within the various types of crude oil, the prices of each quality respond differently to shocks. The “light/heavy” di?erential is one measure that indicates the bene?t a re?ner can achieve by investing in sophisticated equipment to process

101

heavier crude oil into highly-valued re?ned products. The di?erential has varied signi?cantly over the last 10 years from 3 dollars per barrel to almost 20 dollars per barrel. An oil re?nery faces a unique decision when making its production choice, one that provides for both ?exibility and complexity. One one hand, consumers do not care about the type of crude oil, oxygenates, or distillation process used to make, for example, the gasoline they put in their cars. They just want their car to run well. While this would appear to make a re?ner’s problem easier, choosing their heterogeneous inputs, such as crude oil, satisfying federal, state and city environmental regulations, and all while maximizing pro?ts, makes for an enormously complex optimization.

A.3 Estimation Algorithm
My estimation strategy involves matching utilization and investment moments. This requires that I solve for a policy function for each of these decisions and interpolate the functions to the realizations of the state variables in the data. The monthly utilization choice problem is a simple ?nite horizon dynamic program that I am able to solve by backward induction. So, for a given level of investment which induces a capacity for the plant, I can write the problem as:

12

?iy = M ax{uim }12 E m=1
m=1

µm?1 ?im (uim ; xim , q iy ) .

(A.1)

Then, ?iy , the aggregate discounted annual pro?t of the plant, becomes the payo? function for the in?nite horizon problem. The Bellman equation for that problem 102

is:

V (x) = M axr ?iy (r; x) + ?V (x )P (x |x, r) .

(A.2)

To solve this equation, I could have used several di?erent methods including successive approximations or collocation, but I chose policy function iteration, also known as the Howard Policy Improvement Algorithm. The ?rst step is to guess a candidate policy function, which I call, ?t (x), where t indexes the iteration. Since this policy governs investment which a?ects optimal utilization, which in turn a?ects the probability of breakdown, I have to calculate the transition matrix given the policy: P (x |x, ?t (x)). Then comes the “policy evaluation step” which is to solve A.2, i.e.:

Vt (x) = [I ? ?P (x |x, ?t (x))]?1 ?iy (?t (x); x).

(A.3)

For a size K state space, this involves the inversion of a KxK matrix which makes it di?cult to estimate the with too ?ne of a discretization. With the value function in hand, I move to the “policy improvement step” which updates the policy function:

?t+1 (x) = argmaxr

?iy (r; x) + ?Vt (x )P (x |x, r) .

(A.4)

Finally, I compare ?t+1 (x) to ?t (x) and repeat the process until convergence.

A.4 Additional Tables

Table A.2: Industry Concentration
1970 US 4-Firm (%) 8-Firm (%) HHI PADD 1 4-Firm (%) 8-Firm (%) HHI PADD 2 4-Firm (%) 8-Firm (%) HHI PADD 3 4-Firm (%) 8-Firm (%) HHI PADD 4 4-Firm (%) 8-Firm (%) HHI PADD 5 4-Firm (%) 8-Firm (%) HHI California 4-Firm (%) 8-Firm (%) HHI Gulf Coast 4-Firm (%) 8-Firm (%) HHI PADDs 1 & 3 4-Firm (%) 8-Firm (%) HHI PADDs 2 & 3 4-Firm (%) 8-Firm (%) HHI PADDs 1, 2, & 3 4-Firm (%) 8-Firm (%) HHI 31.4 52.2 437.0 59.2 88.7 1,225.0 38.3 59.7 37.4 60.0 39.3 65.0 675.0 36.3 58.5 578.0 55.8 83.6 1,080.0 53.8 74.2 965.0 58.9 82.5 1,184.0 40.2 61.6 611.0 80.7 99.0 2,158.0 50.9 75.6 961.0 48.4 66.5 851.0 58.1 86.9 1,179.0 60.2 86.9 1,148.0 68.7 95.1 1,481.0 44.4 69.4 728.0 76.7 97.9 1,943.0 57.1 82.6 1,063.0 56.3 78.8 1,018.0 46.1 81.2 944.0 62.4 92.7 1,246.0 66.2 96.3 1,475.0 43.0 68.4 727.0 85.8 99.4 2,505.0 57.1 82.6 1,059.0 56.0 78.2 1,005.0 45.7 80.4 935.0 62.4 92.8 1,247.0 66.5 96.3 1,475.0 45.8 72.0 776.4 87.3 99.4 2,537.5 59.6 85.0 1,114.0 57.8 81.2 1,052.2 50.9 85.5 1,047.7 59.1 89.5 1,162.2 62.3 92.1 1,354.9 59.1 83.5 1,107.9 40.9 62.3 35.0 55.0 36.7 57.2 561.0 30.7 56.5 455.0 35.2 58.0 30.7 49.2 30.2 53.6 460.0 44.6 65.3 741.0 42.5 64.9 681.0 39.4 63.5 638.0 54.6 76.1 919.0 46.2 75.6 826.0 45.9 73.1 789.0 52.5 75.5 890.0 45.9 75.2 818.0 44.5 72.6 783.0 55.4 79.5 967.9 50.0 79.9 894.6 49.2 78.3 872.7 44.1 69.5 730.3 87.3 99.4 2,540.2 55.5 80.9 1,031.3 56.0 77.6 976.7 50.7 85.2 1,031.5 59.2 89.6 1,168.7 62.5 93.2 1,367.2 60.1 83.1 1,110.5 54.0 76.6 991.1 47.5 76.2 822.7 47.1 75.1 807.9 41.2 63.7 644.2 87.0 99.4 2,524.7 50.5 75.9 950.8 50.9 73.2 909.2 58.7 84.3 1,405.5 61.8 89.4 1,195.7 63.0 93.2 1,368.8 53.7 76.7 995.0 50.2 72.8 861.2 44.4 70.3 742.9 43.9 69.6 731.4 1980 1991 2001 2004 2005 2006 2007 2008

44.0 64.8

36.2 54.5

53.5 81.7

48.0 75.3

66.5 95.2

54.4 76.5

Source: EIA. Concentration based on operating capacity of crude oil distillation measured per calendar day on January 1st of the given year. The FTC generated the table through 2004 and I extended it through 2008. Upper Midwest: Illinois, Indiana, Kentucky, Michigan, and Ohio. Increase from 2004 to 2005 HHI's in PADDs I and III primarily due to the merger between Valero and Premcor. Capacities used in this table are at the corporate level (multiple refineries owned by the same corporation are aggregated).

104

Table A.3: Cost Estimates
Year Parameter
Q (?0) Q2 (?1) 1995 Q*Pc (?2) Investment (?3) Investment2 (?4) Q (?0) Q2 (?1) 1996 Q*Pc (?2) Investment (?3) Investment2 (?4) Q (?0) Q2 (?1) 1997 Q*Pc (?2) Investment (?3) Investment2 (?4) Q (?0) Q2 (?1) 1998 Q*Pc (?2) Investment (?3) Investment2 (?4) Q (?0) Q2 (?1) 1999 Q*Pc (?2) Investment (?3) 2 Investment (?4) Q (?0) 2 Q (?1) 2000 Q*Pc (?2) Investment (?3) Investment2 (?4) Q (?0) Q2 (?1) 2001 Q*Pc (?2) Investment (?3) Investment2 (?4) Q (?0) Q2 (?1) 2002 Q*Pc (?2) Investment (?3) Investment2 (?4) Q (?0) Q (?1) 2003 Q*Pc (?2) Investment (?3) Investment2 (?4) Q (?0) 2 Q (?1) 2004 Q*Pc (?2) Investment (?3) Investment (?4) Q (?0) Q2 (?1) 2005 Q*Pc (?2) Investment (?3) 2 Investment (?4) Q (?0) Q2 (?1) 2006 Q*Pc (?2) Investment (?3) Investment2 (?4)
2 2

Market 1 Coefficient Std. Err.
3.45*** 2.70*** 0.29*** 4.41*** -4.41*** 3.48*** 6.19*** 0.03*** 4.01*** -1.27*** 0.05* 5.14*** 0.08*** 4.25*** -0.81*** 0.17*** 1.00*** 1.00*** -17.65 25.35 2.70*** 5.79*** 0.01*** 4.65 -0.82 6.19*** 5.89*** 0.00 5.65* -2.82*** 0.32*** 5.75*** 0.02*** 4.56*** 1.12*** 2.24*** 4.51*** 0.16*** 17.48** 3.49 0.88*** 5.87*** 0.08*** 4.32*** 2.75*** 3.18*** 8.04*** 0.00*** 7.48*** 2.09*** 0.34*** 2.85*** 1.00*** 10.42 2.05 2.92*** 1.39*** 1.00*** 9.44 2.85 0.01 0.01 0.00 0.14 0.07 0.00 0.01 0.00 0.15 0.05 0.03 0.05 0.00 0.03 0.01 0.03 0.04 0.01 110.67 33.80 0.04 0.18 0.00 14.90 1.31 0.57 0.11 0.00 4.16 0.44 0.06 0.06 0.00 0.53 0.07 0.74 0.10 0.03 9.18 14.69 0.18 0.11 0.01 0.70 0.89 0.22 0.13 0.00 0.70 0.10 0.02 0.05 0.01 24.93 1.60 0.06 0.02 0.00 443.89 134.06

Market 2 Coefficient Std. Err.
0.36*** 10.86 0.06*** 4.56 -2.99*** 2.62*** 5.21*** 0.03* 5.58 -0.97 0.92*** 7.85*** 0.05*** 3.60** 1.03 0.05 3.68 0.02 3.28 -4.30 0.44 2.13 0.27 5.90 -6.05 0.04 11.36*** 0.00 4.08 -0.99 0.05*** 23.84*** 0.00*** 3.91*** -4.79*** 0.12*** 3.70*** 0.98*** 5.49** -1.09 13.42 0.56 0.32 5.43** -1.02 0.17*** 28.65*** 0.01*** 5.35*** -5.14*** 0.90*** 8.52*** 1.00*** 11.69*** -2.97*** 0.01 4.67*** 1.00*** 8.42 0.01 0.10 11.18 0.02 5.36 0.74 0.38 0.31 0.02 51.23 8.91 0.19 0.15 0.01 1.64 1.88 26.36 8.20 55.93 6.13 32.06 51.57 6.43 3.96 11.03 58.11 0.19 0.63 0.00 4.40 2.13 0.01 1.07 0.00 0.35 0.36 0.03 0.74 0.08 2.74 0.86 394.71 27.99 3.94 3.15 1.88 0.07 8.47 0.00 1.19 0.57 0.11 0.07 0.01 1.40 0.71 0.19 0.93 0.03 7.81 130.15

Market 3 Coefficient Std. Err.
7.99*** 5.45*** 0.28*** 7.80 -5.52 0.05 6.02*** 1.00*** 3.84 -2.09** 1.08 7.30*** 0.38*** 8.88*** -1.86*** 1.16*** 3.40*** 0.86*** 5.15 -1.91 6.94 7.35*** 0.12 9.53*** -0.92*** 10.29*** 6.36*** 0.01 11.85*** 5.26 0.03 2.63** 1.00*** 9.74 -5.05*** 0.58 6.90*** 0.28*** 6.75 -0.87 0.03 4.50*** 0.79*** 4.73*** -3.08* 0.15 11.49*** 0.00 7.07 -2.84 0.04 1.39 1.00*** 10.74*** -1.15 1.01*** 4.79*** 1.00*** 7.43 0.15 0.75 0.21 0.04 8.70 5.01 2.09 0.44 0.03 11.82 1.03 1.98 0.04 0.00 0.21 0.04 0.31 0.24 0.08 95.97 1.75 35.07 0.05 19.64 0.73 0.13 1.57 0.41 0.06 1.33 9.43 2.92 1.19 0.20 15.17 0.99 0.52 0.58 0.05 1,402.90 6.75 0.22 0.24 0.04 1.64 2.14 0.45 0.68 0.02 7.23 4.96 34.02 13.95 0.05 4.42 2.36 0.35 0.34 0.03 493.22 157.43

***, **, * Significant at the 1%, 5%, and 10% level respectively.

105

Appendix B Chapter 2 Supplement
Top 20 Most Actively Searched Drugs
Drug Name viagra lexapro depo xanax zoloft wellbutrin ambien cymbalta lyrica effexor insulin lipitor paxil prozac celebrex cialis seroquel lithium oxycontin toprol Total Num. Of Sessions 778 728 661 583 566 489 484 477 430 405 384 384 358 330 290 284 267 265 258 253 8,674 Num. Of Queries 2,544 1,734 1,437 1,497 1,305 1,193 1,012 1,060 886 897 1,127 754 873 757 744 830 521 767 1,006 493 21,437 Mean Queries Per Session 3.27 2.38 2.17 2.57 2.31 2.44 2.09 2.22 2.06 2.21 2.93 1.96 2.44 2.29 2.57 2.92 1.95 2.89 3.90 1.95 2.48 Ad Spending (Millions) $80.56 $1.18 $0.00 $0.00 $46.73 $108.14 $130.20 $6.33 $0.58 $4.05 $0.00 $93.54 $0.11 $0.52 $3.59 $110.94 $2.16 $0.05 $0.00 $0.00 $588.66

Ad spending is total expenditure on all forms of DTCA in 2005. These 20 drugs account for 30% of all search sessions, 33% of clicks, and 17% of DTCA spending.

Table B.1: 20 Most Actively Searched Drugs

106

Top 20 Most Actively Advertised Drugs
Num. Of Num. Of Mean Queries Ad Spending Drug Name Sessions Queries Per Session (Millions) nexium 250 439 1.76 $226.34 lunesta 185 383 2.07 $215.14 vytorin 181 330 1.82 $155.26 crestor 226 441 1.95 $141.82 ambien 484 1,012 2.09 $130.20 nasonex 79 143 1.81 $124.16 flonase 65 113 1.74 $112.82 cialis 284 830 2.92 $110.94 lamisil 117 276 2.36 $110.51 plavix 199 371 1.86 $110.16 wellbutrin 489 1,193 2.44 $108.14 singulair 141 323 2.29 $105.05 lipitor 384 754 1.96 $93.54 imitrex 40 106 2.65 $82.21 viagra 778 2,544 3.27 $80.56 valtrex 161 307 1.91 $72.11 prevacid 154 242 1.57 $71.88 allegra 184 379 2.06 $71.04 boniva 87 178 2.05 $66.45 zelnorm 103 150 1.46 $62.45 Total 4,591 10,514 2.10 $2,250.77 Ad spending is total expenditure on all forms of DTCA in 2005. These 20 drugs account for 16% of all search sessions and clicks, and 65% of DTCA spending.

Table B.2: 20 Most Advertised Drugs

Variable Definition for OLS and Probit Models Dependent Variables Description Total Sessions total search sessions for a drug Beyond Page 1 0/1 indicator; 1 if a user clicks on a link on page 2 or higher Independent Variables age years since FDA approval dtca total DTCA spending, available 1994 - February 2006, logs alltv total DTCA spending on TV, available 1994 - February 2006, logs allmags total DTCA spending in magazines, available 1994 - February 2006, logs allnewsp total DTCA spending in newpapers, available 1994 - February 2006, logs allradio total DTCA spending on radio, available 1994 - February 2006, logs outdoor total DTCA spending on outdoor media, available 1994 - February 2006, logs internet total DTCA spending on the internet, available 1994 - February 2006, logs X_Y_qtrb4 total spending on X in the Y quarter prior to search, logs rld 0/1 indicator; 1 if producer of drug is the innovator/pioneer Note: Stock regressions include the total spending for a drug for all months between January 1994 and February 2006. Regressions involving previous quarter data include spending from December 2005 - February 2006. The depreciation analysis also includes spending from three previous quarters in 2005.

Table B.3: Variables Used in Regressions.

107

Appendix C Chapter 3 Supplement
Organic Summary / Dot?Com 14 12 10 Density 8 6 4 2 0 ?0.5 <?? Informational 0 Promotional ??> 0.5 0 ?0.5 <?? Informational 0 Promotional ??> 0.5 Ask Google MSN Yahoo! Density 15 Organic Summary / Dot?Gov Ask Google MSN Yahoo!

10

5

Organic Summary / Dot?Edu 10 8 Density 6 4 2 0 ?0.5 <?? Informational 0 ?0.5 Ask Google MSN Yahoo! Density 15

Organic Summary / Dot?OrgNetInfo Ask Google MSN Yahoo!

10

5

0 Promotional ??>

0.5

0 <?? Informational Promotional ??>

0.5

Figure C.1: Kernel Density of Summary Content, Organic Results, By Extension

108

Complete List of Search Queries

Table C.1: List of Search Queries

109

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

actos actos "actos plus" actos "blood glucose" actos "side effects" actos "weight gain" actos "weight loss" actos actos plus actos blood glucose actos buy actos cheap actos cost actos diabetes actos discount actos generic actos glucophage actos information actos insulin actos interactions actos metformin actos plus actos plus "blood glucose" actos plus "side effects" actos plus "weight gain" actos plus "weight loss" actos plus actos actos plus actos actos plus blood glucose actos plus buy actos plus buy actos plus cheap actos plus cheap actos plus cost actos plus cost actos plus diabetes actos plus diabetes actos plus discount actos plus discount actos plus generic actos plus generic actos plus glucophage actos plus glucophage actos plus information actos plus information actos plus insulin actos plus insulin actos plus interactions actos plus interactions actos plus metformin actos plus metformin actos plus price actos plus price actos plus risks actos plus risks actos plus sale actos plus sale actos plus side effects actos plus weight gain actos plus weight loss

59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116

actos price actos risks actos sale actos side effects actos weight gain actos weight loss advicor altocor amaryl amitriptyline anafranil avandamet avandaryl avandia bupropion buspar byetta celexa celexa "long term" celexa "side effects" celexa "weight gain" celexa "weight loss" celexa buy celexa cheap celexa cost celexa depression celexa discount celexa generic celexa information celexa interactions celexa lexapro celexa long term celexa paxil celexa price celexa prozac celexa risks celexa sale celexa side effects celexa weight gain celexa weight loss celexa zoloft cholestyramine citalopram clozapine colestid colestipol crestor crestor "adverse effects" crestor "blood pressure" crestor "side effects" crestor adverse effects crestor blood pressure crestor buy crestor cheap crestor cholesterol crestor cost crestor discount crestor generic

117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174

crestor information crestor interactions crestor lipitor crestor lovastatin crestor pravachol crestor price crestor risks crestor sale crestor side effects crestor statins crestor zocor cymbalta desyrel doxepin duetact edronax effexor elavil endep escitalopram fenofibrate fluoxetine fluvoxamine galvus gemfibrozil glimepiride glipizide glucophage glucophage "actos plus" glucophage "blood glucose" glucophage "side effects" glucophage "weight gain" glucophage "weight loss" glucophage actos glucophage actos plus glucophage blood glucose glucophage buy glucophage cheap glucophage cost glucophage diabetes glucophage discount glucophage generic glucophage information glucophage insulin glucophage interactions glucophage metformin glucophage price glucophage risks glucophage sale glucophage side effects glucophage weight gain glucophage weight loss glucotrol glucovance glyburide glyset humalog humulin

175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232

insulin insulin "actos plus" insulin "blood glucose" insulin "side effects" insulin "weight gain" insulin "weight loss" insulin actos insulin actos plus insulin blood glucose insulin buy insulin cheap insulin cost insulin diabetes insulin discount insulin generic insulin glucophage insulin information insulin interactions insulin metformin insulin price insulin risks insulin sale insulin side effects insulin weight gain insulin weight loss januvia lantus lescol lexapro lexapro "long term" lexapro "side effects" lexapro "weight gain" lexapro "weight loss" lexapro buy lexapro celexa lexapro cheap lexapro cost lexapro depression lexapro discount lexapro generic lexapro information lexapro interactions lexapro long term lexapro paxil lexapro price lexapro prozac lexapro risks lexapro sale lexapro side effects lexapro weight gain lexapro weight loss lexapro zoloft lipitor lipitor "adverse effects" lipitor "blood pressure" lipitor "side effects" lipitor adverse effects lipitor blood pressure

233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290

lipitor buy lipitor cheap lipitor cholesterol lipitor cost lipitor crestor lipitor discount lipitor generic lipitor information lipitor interactions lipitor lovastatin lipitor pravachol lipitor price lipitor risks lipitor sale lipitor side effects lipitor statins lipitor zocor lopid lovastatin lovastatin "adverse effects" lovastatin "blood pressure" lovastatin "side effects" lovastatin adverse effects lovastatin blood pressure lovastatin buy lovastatin cheap lovastatin cholesterol lovastatin cost lovastatin crestor lovastatin discount lovastatin generic lovastatin information lovastatin interactions lovastatin lipitor lovastatin pravachol lovastatin price lovastatin risks lovastatin sale lovastatin side effects lovastatin statins lovastatin zocor ludiomil luvox metaglip metformin metformin "actos plus" metformin "blood glucose" metformin "side effects" metformin "weight gain" metformin "weight loss" metformin actos metformin actos plus metformin blood glucose metformin buy metformin cheap metformin cost metformin diabetes metformin discount

291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348

metformin generic metformin glucophage metformin information metformin insulin metformin interactions metformin price metformin risks metformin sale metformin side effects metformin weight gain metformin weight loss mevacor mirtazapine nardil niaspan norpramin nortriptyline novolin novolog nph insulin pamelor parnate paroxetine paxil paxil "long term" paxil "side effects" paxil "weight gain" paxil "weight loss" paxil buy paxil celexa paxil cheap paxil cost paxil depression paxil discount paxil generic paxil information paxil interactions paxil lexapro paxil long term paxil price paxil prozac paxil risks paxil sale paxil side effects paxil weight gain paxil weight loss paxil zoloft pertofrane prandin pravachol pravachol "adverse effects" pravachol "blood pressure" pravachol "side effects" pravachol adverse effects pravachol blood pressure pravachol buy pravachol cheap pravachol cholesterol

349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406

pravachol cost pravachol crestor pravachol discount pravachol generic pravachol information pravachol interactions pravachol lipitor pravachol lovastatin pravachol price pravachol risks pravachol sale pravachol side effects pravachol statins pravachol zocor pravastatin precose prozac prozac "long term" prozac "side effects" prozac "weight gain" prozac "weight loss" prozac buy prozac celexa prozac cheap prozac cost prozac depression prozac discount prozac generic prozac information prozac interactions prozac lexapro prozac long term prozac paxil prozac price prozac risks prozac sale prozac side effects prozac weight gain prozac weight loss prozac zoloft questran remeron rezulin sarafem sertraline serzone simvastatin sinequan starlix strattera surmontil tofranil tranylcypromine trazodone tricor trimipramine venlafaxine vestra

407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458

vivactil vytorin welchol wellbutrin zetia zocor zocor "adverse effects" zocor "blood pressure" zocor "side effects" zocor adverse effects zocor blood pressure zocor buy zocor cheap zocor cholesterol zocor cost zocor crestor zocor discount zocor generic zocor information zocor interactions zocor lipitor zocor lovastatin zocor pravachol zocor price zocor risks zocor sale zocor side effects zocor statins zoloft zoloft "long term" zoloft "side effects" zoloft "weight gain" zoloft "weight loss" zoloft buy zoloft celexa zoloft cheap zoloft cost zoloft depression zoloft discount zoloft generic zoloft information zoloft interactions zoloft lexapro zoloft long term zoloft paxil zoloft price zoloft prozac zoloft risks zoloft sale zoloft side effects zoloft weight gain zoloft weight loss

Popular Keyword Lists
Organic Summary Title Promotional Informational Promotional Informational phentermine pills shipping price viagra now save offers lowest cialis net fast cheapest delivery tramadol purchase compare easy blog levitra pill soma worldwide meds link pharmacies xanax homepages snewman anti stmartin discussionboard store aciphex great prescriptions quality cost licensed day products valium shop sale search make overnight 2006 posted ultram effect interactions including possible serious common statin oral cause patient important occur include includes lowering pdf provides see safety nausea symptoms such warnings risk learn muscle consumer over following know experience levels problems prescribed help usage hydrochloride prescribing heart precautions potential well sexual diarrhea adverse clinical out many exercise additional purchase phentermine shipping save hosting offer cialis genuine viagra TRUE #3634 #3619 #3585 delivery tramadol truly brand trusted sale shop home catalog fast cost online: xanax store easy top guaranteed #3610 #1072 compare pharmacies link pump india #3629 topic usa #3633 diabetic aciphex name #3637 bravenet levitra #1086 #3591 xanga patient oral lawsuit statin lawyer hydrochloride withdrawal treatments webmd lawyers antidepressant heart answers handout information: safety rhabdomyolysis encyclopedia medlineplus class niacin type lowering suicide attorney lawsuits warnings product precautions antidepressants articles pcos resistance learn syndrome koop library injury anxiety indications defective mayoclinic wikipedia blood revolution litigation hydrobromide ssri statins oxalate Sponsored Summary Title Promotional Informational Promotional Informational canada orders cheap risk hidden fees fedex beat wholesale overnight x30 accredited ranked 10mg 844 891 brand onlineover off x90 discount competitors satisfaction customer 1800 pharmaciesfind x60 guaranteeaccredited 500mg beaten fastdelivery medisave please give processing meds tablets fee available x180 convenient lowest fda minutes quality medication sale ringtones 30mg home treatment natural tips options out anti depressant expert breaking anxiety birth defects use calcium member support doctor linked code zip check there choose lawyer which membersupport proven smarter questions really works taking nutrition ebay join know breakthrough exciting one recommend most productsfind solution damage loss statins medicine research contact work cheap 20mg cost sold 500mg guaranteed canadapharmacy prescription better 30mg 10mg pills safe assistance huge only capsules 100mg medicine rosuvastatin canadian amp savings canadadrugs download comparison program trusted deals iipitor celexxa 45mg meds off pioglitazone direct clearance starting today china directly incredibly great rxdrugcard medication name selling sale tickets nordisk legal limited promo samples cure right injury inhaled avoid possible lawyer reviewed exubera linked damage kidney birth take discovery attorney lower dna test fatigue natural hair lamictal aid performance sexual lawsuit drugs? defect hypertension infant pulmonary news defects contraceptive missed oral safe? detailed prevachol resistance reverse locating need products review

Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Table C.2: Keywords Used in Classi?cation Algorithm

110

Sponsored Summary / Dot?Com 7 6 Density 5 4 3 2 1 0 ?0.5 <?? Informational 0 Promotional ??> 0.5 Ask Google MSN Yahoo! Density 30 25 20 15 10 5 0 ?0.5

Sponsored Summary / Dot?Gov Ask Google MSN Yahoo!

0 <?? Informational Promotional ??>

0.5

Sponsored Summary / Dot?Edu 0.4 0.3 Density 0.2 0.1 0 ?0.5 <?? Informational Ask Google MSN Yahoo! Density 40 30 20 10 0 ?0.5

Sponsored Summary / Dot?OrgNetInfo Ask Google MSN Yahoo!

0 Promotional ??>

0.5

0 <?? Informational Promotional ??>

0.5

Figure C.2: Kernel Density of Summary Content, Sponsored Results, By Extension
Variable Summary for Probit Regression Dependent Variable Description Page1 0/1 indicator; 1 if the result appears on page 1 of the result Independent Variables dtca_stock total DTCA spending, 1994 - February 2007, billions dotcom 0/1 indicator; 1 if dot-com dotgov 0/1 indicator; 1 if dot-gov dotedu 0/1 indicator; 1 if dot-edu dotorgnetinfo 0/1 indicator; 1 if dot-org, net, or info dotintl 0/1 indicator; 1 if dot-us, uk, or ca dotcom_age interaction term: dotcom and age dotgov_age interaction term: dotgov and age dotedu_age interaction term: dotedu and age dotorg_age interaction term: dotorg and age dotcom_dtc interaction term: dotcom and dtca_stock dotgov_dtc interaction term: dotgov and dtca_stock dotedu_dtc interaction term: dotedu and dtca_stock dotorg_dtc interaction term: dotorg and dtca_stock prop_promo_summary Proportion of words in the summary of organic links that are promotional prop_info_summary Proportion of words in the summary of organic links that are informational prop_promo_title Proportion of words in the title of organic links that are promotional prop_info_title Proportion of words in the title of organic links that are informational

Table C.3: Variable De?nitions 111

References
[1] Aguirregabiria, V. P. Mira, (2006). “Sequential estimation of dynamic discrete games.” Econometrica, 75(1), 2006. [2] Athey, Susan and Glenn Ellison (working paper). “Position Auctions with Consumer Search.” [3] Attanasio, Orazio, (2000). “Consumer Durables and Inertial Behavior: Estimation and Aggregation of Ss Rules for Automobiles.” Review of Economic Studies, October 2000. [4] Bacon, Robert W., (1991). “Rockets and Feathers: The Asymmetric Speed of Adjustment of UK Retail Gasoline Prices to Cost Changes.” Energy Economics, 13 July 1991. [5] Bajari, Patrick, Lanier Benkard, and Jonathan Levin, (2007). “Estimating Dynamic Models of Imperfect Competition.” Econometrica, 75(5), 2007. [6] Battelle, John. (2005). The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. Penguin Books Ltd, 2005. [7] Benkard, Lanier, (2004). “A dynamic analysis of the market for wide-bodied commercial aircraft.” Review of Economic Studies, 71(3), 2004. [8] Besanko, David and Ulrich Doraszelski, (2004). “Capacity Dynamics and Endogenous Asymmetries in Firm Size.” The RAND Journal of Economics, Vol. 35, No. 1. Spring 2004. [9] Besanko, David A., Ulrich Doraszelski, Lauren Xiaoyuan Lu, and Mark A. Satterthwaite, (2008). “Lumpy Capacity Investment and Disinvestment Dynamics.” Harvard Institute of Economic Research Discussion Paper No. 2154 Available at SSRN: http://ssrn.com/abstract=1117991. [10] Borenstein, S., C. A. Cameron and R. Gilbert, (1997). “Do Gasoline Prices Respond Asymmetrically to Crude Oil Price Changes?” Quarterly Journal of Economics, 112(1), 1997. [11] Borenstein, S., (1991). “Selling Costs and Switching Costs: Explaining Retail Gasoline Margins.” The RAND Journal of Economics, 22(3), 1991. [12] Borenstein, S., Andrea Shepard (1996). “Dynamic Pricing in Retail Gasoline Markets.” The RAND Journal of Economics, 27(3), 1996. 112

[13] Chowdhury, A., G. Pass, C. Torgeson. (2006). “A Picture of Search” The First International Conference on Scalable Information Systems, Hong Kong, June, 2006. [14] Day, Ruth. (2006). “Comprehension of Prescription Drug Information: Overview of A Research Program.” Proceedings of the American Association for Arti?cial Intelligence, Argumentation for Consumer Healthcare. 2006. [15] Day, Ruth. (2003). “Understanding Rx drug Information: TV ads, internet, hardcopy.” U.S. Food and Drug Administration, Public Meeting on E?ects of Direct-to-Consumer Advertising. 2003. [16] Edelman, Ben and Michael Ostrovsky. (2007). “Strategic Bidder Behavior in Sponsored Search Auctions.” Decision Support Systems, 2007. [17] Energy Information Administration, US Department of Energy, (2007). “Re?nery Outages: Description and Potential Impact on Petroleum Product Prices.” March 2007. [18] Energy Information Administration, US Department of Energy, (2008). “A Primer on Gasoline Prices.” Online: http://www.eia.doe.gov/bookshelf/ brochures/gasolinepricesprimer/index.html [Downloaded: 09/11/2008], May 2008. [19] Ericson, Richard, and Ariel Pakes, (1995). “Markov-Perfect Industry Dynamics: A Framework for Empirical Work.” Review of Economic Studies, 62:1, 53-83, 1995. [20] Espey, Molly, (1996). “Explaining Variation in Elasticity of Gasoline Demand in the United States: A Meta Analysis.” The Energy Journal, 17, 1996. [21] The Federal Trade Commission, (2006). “Investigation of Gasoline Price Manipulation and Post-Katrina Gasoline Price Increases.” Spring 2006. [22] Ghose, A., and S. Yang. (2009). “An Empirical Analysis of Search Engine Advertising: Sponsored Search and Cross-Selling in Electronic Markets.” Forthcoming in Management Science. [23] Ghose, A., and S. Yang. (2008). “An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising.” Proceedings of the ACM International Conference on Web Search and Data-mining Conference (WSDM 2008), Stanford, February 2008.

113

[24] Ghose, A., and S. Yang. (working paper). “Organic and Paid Search Advertising: Complements, Substitutes or Neither?” [25] Goldberg, Pinelopi K. and Rebecca Hellerstein, (2008). “A Structural Approach to Explaining Incomplete Exchange-Rate Pass-Through and Pricingto-Market.” The American Economic Review, 98(2), 2008. [26] The Government Accountability O?ce, (2006). “Energy Markets: Factors Contributing to Higher Gasoline Prices.” GAO-06-412T. February 2006. [27] Gron, Anne, Deborah Swenson, (2000). “Cost Pass-Through in the U.S. Automobile Market.” The Review of Economics and Statistics, 82(2), 2000. [28] Hamilton, James D., (1983). “Oil and the Macroeconomy since World War II.” The Journal of Political Economy, 91(2), 1983. [29] Hastings, Justine, Jennifer Brown, Erin Mansur, and So?a Villas-Boas, (2008). “Reformulating Competition? Gasoline Content Regulation and Wholesale Gasoline Prices.” Journal of Environmental Economics and Management, January 2008. [30] Hotz, V. J., and R. A. Miller, (1993). “Conditional Choice Probabilities and the Estimation of Dynamic Models.” Review of Economic Studies, 60:3, 497529, 1993. [31] Hubbard, Glenn, (1986). “Supply Shocks and Price Adjustment in the World Oil Market.” The Quarterly Journal of Economics, 101(1), 1986. [32] Huh, Jisu and Brenda Cude (2004). “Is the Information Fair and Balanced in Direct-to-Consumer Prescription Drug Websites?” Journal of Health Communication, 2004. [33] ICF Consulting, (2005). “The Emerging Oil Re?nery Capacity Crunch: A Global Clean Products Outlook.” 2005. [34] Jin, Ginger Zhe and Toshiaki Iizuka. (2005). “The E?ects of Prescription Drug Advertising on Doctor Visits.” Journal of Economics & Management Strategy, Fall 2005. [35] Jin, Ginger Zhe and Toshiaki Iizuka. (2007). “Direct to Consumer Advertising and Prescription Choice.” Journal of Industrial Economics, 2007.

114

[36] Knittel, Christopher, Jonathan E. Hughes, and Daniel Sperling, (2008). “Evidence of a Shift in the Short-Run Price Elasticity of Gasoline Demand.” The Energy Journal, 29(1), January 2008. [37] Kreps, David M., Jose A. Scheinkman, (1983). “Quantity Precommitment and Bertrand Competition Yield Cournot Outcomes.” The Bell Journal of Economics, 14(2), Autumn 1983. [38] Lidderdale, T.C.M. (United States Energy Information Administration), (1999). “Environmental Regulations and Changes in Petroleum Re?ning Operations.” Online: http://www.eia.doe.gov/emeu/steo/pub/special/ enviro.html [Downloaded: 12/07/2007], 1999. [39] Nielsen-online.com. (2009). “April 10, 2009 News Release.” 2009. [40] Nielsen-online.com. (2008). “The Second Opinion: How the Web Drives Healthcase Decisions.” 2009. Webinar presented by Melissa Davies, September 3, 2008. [41] Noel, Michael D., (2007). “Edgeworth Price Cycles, Cost-Based Pricing, and Sticky Pricing in Retail Gasoline Markets.” Review of Economics and Statistics, Vol. 89, 2007. [42] Pakes, Ariel, (2000). “A framework for applied dynamic analysis in I.O.” Working paper no. 8024, NBER, Cambridge, 2000. [43] Pakes, Ariel, Michael Ostrovsky, and Steven T. Berry, (2004). “Simple estimators for the parameters of discrete dynamic games (with entry/exit examples).” Harvard Institute. Economic Research Discussion Paper No. 2036, May 2004. [44] Pakes, Ariel and P. McGuire, (1994). “Computing Markov-perfect Nash equilibria: Numerical implications of a dynamic di?erentiated product model.” Rand Journal of Economics, 25(4), 1994. [45] Peterson, D. J. and Sergej Mahnovski, (2003). “New Forces at Work in Re?ning: Industry Views of Critical Business and Operations Trends.” Santa Monica, CA : RAND, 2003. [46] Rust, John and Harry Paarsch, (forthcoming). “Valuing Programs with Deterministic and Stochastic Cycles.” Forthcoming in the Journal of Economic Dynamics and Control.

115

[47] Rust, John, (2008). “Dynamic Programming.” The New Palgrave Dictionary of Economics. Second Edition. Eds. Steven N. Durlauf and Lawrence E. Blume. Palgrave Macmillan, 2008. [48] Rust, John, (1987). “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher.” Econometrica, 55:5, 999-1033, 1987. [49] Ryan, Stephen, (forthcoming). “The Costs of Environmental Regulation in a Concentrated Industry.” Forthcoming in Econometrica. [50] Tirole, Jean, (1988). The Theory of Industrial Organization. Cambridge, MA: M.I.T. Press. 1988. [51] The United States Senate, (2002). “Gas Prices: How are they Really Set?” Online: http://www.senate.gov/~gov_affairs/042902gasreport. htm [Downloaded 10/01/2007], May 2002. [52] Varian, H. (2007). “Position Auctions.” International Journal of Industrial Organization, 2007.

116

doc_710788432.pdf

Dissertation on Empirical Industrial Organization

Attachments