Description
Paid in capital (Paid-in capital or Contributed capital) refers to capital contributed to a corporation by investors through purchase of stock from the corporation (primary market) (not through purchase of stock in the open market from other stockholders (secondary market)). It includes share capital (i.e. capital stock) as well as additional paid-in capital.
Technical Paper Series Congressional Budget Office Washington, D.C.
FORECASTING CAPITAL GAINS REALIZATIONS
Preston Miller Federal Reserve Bank of Minneapolis and Congressional Budget Office Larry Ozanne Congressional Budget Office
August 2000 2000-5 Technical papers in this series are preliminary and are circulated to stimulate discussion and critical comment. They are not subject to CBO’s formal review and editing processes. The analysis and conclusions expressed in them are those of the authors and do not necessarily represent the position of the Congressional Budget Office, the Federal Reserve Bank of Minneapolis, or the Federal Reserve System. References in publications should be cleared with the authors. Papers in this series can be obtained by sending an email to [email protected]. For additional information about this paper, contact Larry Ozanne at (202) 226-2684 or by email at [email protected]. The authors wish to thank Nicholas Bull, Christopher Sims, Richard M. Todd, and Christopher Williams for their comments on this paper.
ABSTRACT As an input to its estimates of federal revenues, the Congressional Budget Office (CBO) requires an estimate of capital gains realizations in the current calendar year and forecasts of realizations for the next 10 years. The purpose of our study is to improve the accuracy of the current-year estimates. As background, we describe the models and methods that CBO now uses and briefly mention models developed at other institutions. We then discuss in detail our method for constructing a new model and compare the model’s performance with that of other models by determining how well they would have estimated realizations had they been used in the past. Those comparisons are conducted using increasingly realistic assumptions about the amount of information available at the time the estimates are made. We find that if CBO had used our new model throughout the 1990s, with only the information available when the estimates of realizations were made, CBO’s estimates of current-year realizations would have been more accurate.
CONTENTS I. II. INTRODUCTION CURRENT PRACTICES Current-Year Estimation 3 Out-Year Projections 6 III. FORMULATING A NEW MODEL Model Formulation 8 Dependent Variable 9 Explanatory Variables 9 Tax Rates 9 Business Cycle 11 Equity Net Gains 11 Property Net Gains 12 Modeling the Gains in Current and Future Years Separately 13 IV. V. VI. IN-SAMPLE EVALUATION AND COMPARISONS OUT-OF-SAMPLE EVALUATION AND COMPARISONS REAL-TIME EVALUATION AND COMPARISONS 15 18 22 7 1 3
I.
INTRODUCTION
Each December, the Congressional Budget Office (CBO) completes its budget estimates (known as the budget baseline) for the current and 10 succeeding fiscal years. As an input to the revenue estimates, CBO’s Tax Analysis Division requires, by early December, estimates of capital gains realizations for the current and 10 succeeding calendar years.1 At the time the estimates are made, last year’s and most of the current year’s capital gains have been realized. CBO cannot observe those gains until people file their tax returns and the Internal Revenue Service (IRS) processes them, a sequence that takes over a year. Last year’s realizations can be closely approximated from preliminary data from the IRS, but no such data are available for the current year. However, other factors that indicate the amount of gains likely to be realized—such as the year’s movements in stock prices, the strength of the economy, and any changes in tax rates on capital gains—are largely known by the time of the budget estimates. Consequently, CBO estimates the amount of gains realized in the current year by using historically estimated equations that relate capital gains realizations to those other aggregate factors. Projecting realizations in future years is more difficult because the macroeconomic factors used to estimate gains in the current year are unknown in the future. CBO does forecast major macroeconomic variables, but stock prices and other asset prices, which are important determinants of capital gains, are very difficult to forecast. Consequently, CBO does not project future realizations using the same equations that it uses to estimate current-year realizations. Instead, CBO assumes that realizations gradually revert to their historical size relative to gross domestic product (GDP), and it combines that assumption with its projection of GDP to project gains. The empirical research described in this paper was designed to improve CBO’s estimates of current-year realizations. We originally intended to address current-year estimation and future-year prediction with a single model, but we found that the two had different enough problems to benefit from separate treatments. We will discuss the problems of forecasting future gains in a second paper. This paper describes CBO’s current practice for both current-year estimation and future-year projection but discusses only our efforts to improve the former. It also briefly mentions models developed at other institutions because they provide ideas for refining CBO’s model of current-year realizations.
1.
The Tax Analysis Division combines the estimates of realizations with estimates of wages and other income sources and runs them through its microsimulation tax calculator to generate estimates of individual income tax revenues. The contribution of capital gains to total income tax liability is not separately identified on tax returns, and therefore it cannot be forecast directly by time-series or other methods.
Our efforts had some success. We conclude that if CBO had used our new model throughout the 1990s with only the information available when the estimates were made, its estimates of current-year realizations would have been improved. (Our new model, like CBO’s model, consists of four variants of a basic equation.) Between 1991 and 1998, the CBO model had a root mean squared error of 15.1 percentage points. Our model has a root mean squared error of 11.7 percentage points. But even those smaller errors can still lead to substantial errors in the dollar level of realizations and revenues. The new model has not yet withstood the test of time, however. Some of its coefficient estimates were honed by extreme events, and there were not nearly enough such events to have much confidence in the estimates. As a result, similar extreme events in the future could well lead to large errors from our model as well as major shifts in some of its coefficients. Nevertheless, a major advantage of the model is that it affords the opportunity to refine the estimates of some coefficients in the face of future extreme events, whereas most other models do not offer that opportunity.
2
II.
CURRENT PRACTICES
The measure of capital gains realizations that CBO estimates is the annual total of net gains that are reported by taxpayers who have net gains. Taxpayers who have net losses are excluded and their losses are estimated separately. Net losses are much smaller than net gains and grow more regularly because of a $3,000 limit on losses per return. Historically, realizations have tended to grow at the same rate as GDP but with much greater year-to-year fluctuation. Thus, the ratio of gains to GDP changes from year to year but shows little trend (see Figure 1). Its average from 1952 to 1998 was 2.7 percent. Gains reached their high-water mark of 7.4 percent of GDP in 1986, when people rushed to realize gains ahead of a tax increase that was enacted that year but did not take effect until 1987. In 1998, the most recent year for which tax return processing is largely complete, realizations were at their second highest level, 5.1 percent. That peak was reached after an uncharacteristically steady climb beginning in the early 1990s. Although many assets generate capital gains, the two most important classes of assets for tax revenues are equities and real estate (see Table 1). Equities accounted for 30 percent to 40 percent of classifiable asset sales in the 1970s and 1980s, and the tremendous increases in the value of equity holdings in the 1990s have undoubtedly made gains from equities even more important. Taxable gains on real estate come from rental residential and commercial real estate. Some of those gains are passed through partnerships and thus account for some of the gains attributed to partnerships in Table 1. Few gains on owner-occupied homes are taxed, and all such gains are excluded from Table 1. Current-Year Estimation The model that CBO uses to estimate current-year capital gains realizations consists of single-equation regressions based on macroeconomic time series. CBO’s baseline estimate of realizations is usually a central tendency of estimates from the group of equations. Macroeconomic regressions that explain gains were developed in the 1980s by analysts in the Treasury, CBO, and academia. The focus of their work was to measure how realizations respond to changes in tax rates. But the basic idea behind the equations—to explain realizations in terms of the outstanding pool of gains held by taxpayers and the cost of realizing those gains—carries over to predicting aggregate gains.
3
Figure 1: Ratio of Capital Gains Realizations to GDP
Percent 8
6
4
2
0 55 60 65 70 75 Years
SOURCES: Capital gains realizations are from the Department of the Treasury, Office of Tax Analysis, and GDP is from the Department of Commerce, Bureau of Economic Analysis.
80
85
90
95
TABLE 1: DISTRIBUTION OF CAPITAL GAINS BY ASSET TYPE
Gross Capital Gains (Millions of dollars) 1977 1981 1985
Distribution of Gains (Percent) 1977 1981 1985
Total a Corporate stock, CGD Other securities (bonds) Options and futures contracts Partnerships, S-corporations, trusts and estates Residential rental property Depreciable business personal property Depreciable business real property Other assets
53,066 14,783 560 1,689 5,112 4,596 2,256 3,410 20,660
97,057 39,447 1,065 3,683 9,485 8,229 3,576 3,420 28,152
194,689 81,814 3,054 3,406 42,977 18,748 1,335 14,067 29,290
100 28 1 3 10 9 4 6 39
100 41 1 4 10 8 4 4 29
100 42 2 2 22 10 1 7 15
SOURCE: Internal Revenue Service, Statistics of Income Bulletin (Winter 1985-86 and Spring 1999). NOTES: Data are not fully comparable among years, especially 1977 versus 1981 and 1985. CGD is capital gains distributions from mutual funds. a. Excludes all capital gains on personal residences.
Outstanding capital gains are those that have accrued during the current and prior years less those that were realized in previous years or were exempted from tax because their owner died. Accrued gains are estimated and reported in the Federal Reserve Board’s flow-of-funds accounts as revaluations of corporate and noncorporate equity held by the household sector. Those two revaluation measures were rarely used in the early development of the equations. Instead, analysts approximated accrued gains on stocks using the value of corporate equities held by the household sector (also from the flow-of-funds accounts) or stock price indexes. GDP was commonly used to approximate accrued gains on assets other than stocks. No reliable and timely data on gains exempted at death are available, so that factor was ignored in most equations. The cost of realizing gains was represented by tax rates on capital gains. In addition, equations developed after 1987 included a variable to capture the transitory effects of the large increase in the tax rate on capital gains in 1987 that was passed in 1986. CBO’s equations used a dummy variable to isolate those effects. Another cost, that of selling an asset, has not been incorporated in the equations, although the cost of trading stocks has been falling since the 1970s and could be affecting the willingness to realize gains. Empirically, the stage of the business cycle has seemed to affect people’s willingness to realize gains. CBO began estimating capital gains at the end of 1986, but it did not adopt an equation approach until 1988. From 1988 through 1990, CBO used regressions that explain the logarithm of realized gains to estimate gains in the current year and forecast gains over a five-year horizon. In 1991, it shifted to its current model, which explains changes in the logarithm of gains, or, roughly speaking, the annual growth rate of gains. In most years, CBO uses four variants of a basic equation, which differ in their inclusion of multifamily housing starts and an error-correction term (see Table 2).2 The equations are estimated from 1955 through the last complete year and then used to predict gains in the current year. In most years, CBO averages the current-year estimates from the four equations. In 1999, however, the regressions with error correction were omitted because of their large errors the previous year. Also, in some years the estimates are adjusted for special factors, such as the initial effect of the 1997 tax cut. Predictions made by the four equations in early December 1999 are shown in Table 3. The predictions have large standard errors, as indicated by their 95 percent confidence intervals.
2.
The error-correction term is the lagged residual from an equation that explains the logarithm of the ratio of gains to GDP as a function of the capital gains tax rate and the 1986 dummy variable. The residual indicates whether gains differ from their expected long-run size relative to GDP.
4
TABLE 2: CURRENT CBO EQUATIONS (Dependent variable is change in log of gains, estimated 1955-1998)
Explanatory Variables
No Error Correction No Starts Coefficient t-stat
No Error Correction Starts Coefficient t-stat
Error Correction No Starts Coefficient t-stat
Error Correction Starts Coefficient t-stat
Constant term Growth rate of prices Real growth rate of household equity holdings Growth rate of real GDP Acceleration of real GDP Growth rate of multifamily housing starts Change in maximum tax rate Indicator: 1986 = 1, 1987 = -1 (0 otherwise) Error-correction term Adjusted R-squared Durbin Watson
-0.060 1.024 0.528 2.562 1.287 -0.027 0.544
-0.8 1.0 4.3 1.9 1.4 -3.5 5.7
-0.096 1.691 0.533 2.921 0.243 -0.025 0.523
-1.6 1.8 4.9 3.1 3.0 -3.6 6.0
-0.148 1.788 0.517 4.335
-2.3 1.7 4.3 4.5
-0.120 1.985 0.484 3.304 0.223 -0.026 0.555 -0.163 0.778 1.888
-2.0 2.1 4.4 3.5 2.8 -3.7 6.3 -1.7
-0.025 0.581 -0.203 0.737 1.856
-3.4 6.1 -2.0
0.725 2.072
0.767 2.018
SOURCE: CBO calculations.
TABLE 3: FORECAST OF CHANGE IN LOGARITHM AND LEVEL OF CAPITAL GAINS FOR 1999 (Based on CBO forecasts of December 7 and 9, 1999)
Equations
Change in Log of Gains Standard 95% Confidence Interval Mean Error Low High
Level of Gains (Billions of dollars) 95% Confidence Interval Mean Low High
No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Average of change in logs
0.142 0.127 0.027 0.028 0.081
0.129 0.119 0.141 0.129
-0.117 -0.111 -0.255 -0.231 -0.179
0.401 0.365 0.309 0.287 0.340
507 500 452 452 477
391 394 341 349 368
657 634 600 586 618
SOURCE: CBO calculations. NOTE: The estimated level of gains in 1999 is based on preliminary 1998 gains of $440 billion.
In addition to the regressions developed at CBO, regression models of capital gains realizations have been developed by analysts at other institutions to provide input to their forecasts. Table 4 highlights some of the salient differences in the selection of variables among four of those models. The four were developed by Nicholas Bull and David Richardson of the Treasury Department; Randall Mariger, formerly of the Federal Reserve Board; Prawpan Siwapradit of the New York State Division of the Budget; and Thomas Stinson of the Minnesota Department of Finance.3 For 1999, CBO estimated several equations adapted from the BullRichardson, Mariger, and Siwapradit models. The equations generated current-year estimates similar to those of the CBO equations without error correction. CBO used the estimates from those adapted equations in settling on its 1999 baseline estimate. The accuracy of CBO’s current-year predictions can be evaluated for 1986 through 1998 (see Table 5). CBO was farthest off in 1986, when it underestimated how much people would respond to the impending tax increase in 1987. Other large errors occurred when gains were overestimated in 1989 and 1990 and underestimated in 1996. The errors also show a cyclical pattern, overestimating growth in 1989 through 1991 and underestimating it in most years since then. The root mean squared error on predicted annual growth rates from 1986 to 1998 is 26 percent, compared with growth rates that ranged from an increase of 90 percent to a decrease of 56 percent. Looking just at the experience from 1991 to 1998, which excludes the unusual years of 1986 and 1987 and coincides with the use of CBO’s current equations, the root mean squared error is 16 percent. During that period, growth rates ranged from a 45 percent increase to a 10 percent decline. The errors since 1991 are still large compared with the growth rates of gains, and they reflect the substantial uncertainty in predicting capital gains even when other macroeconomic variables are largely known. That uncertainty is reflected in the standard errors of the equations themselves. CBO’s record in estimating current-year realizations and some of the statistical properties of its equations suggest there is room for improvement. First and foremost, the current-year estimates are often far off the mark. Second, the coefficients on some variables, such as the growth rates of inflation and real GDP, vary considerably based on the set of other explanatory variables. Third, the coefficients on the error-correction terms vary considerably over time. Finally, the use of dummy variables for 1986 and 1987 prevents the equations from estimating the transitory effects of a future large changes in tax rates, should one occur.
3.
The models do not fully describe the process of forecasting gains at the four institutions. The institutions have access to other models and consider factors outside any specific model.
5
TABLE 4:
OTHER MODELS OF CAPITAL GAINS REALIZATIONS
Model
Dependent Variable
Explanatory Variables
Bull-Richardson
Dollar change in capital gains realizations.
Current-year change in tax rate. Next year’s change in tax rate if positive; zero otherwise. Accumulated monthly increases in S&P 500 per year. Accumulated monthly decreases in S&P 500 per year.
Mariger
Change in the ratio of capital gains realizations to nominal potential GDP.
Change in the ratio of actual GDP to potential GDP. Change in the ratio of equities held by households to potential GDP. Dummy for 1986.
Siwapradit
Change in the log of capital gains realizations.
Change in the log of the value of shares traded on the New York, Nasdaq, and American stock exchanges. Tax rate combines federal, New York State, and New York City maximums. Dummy for 1986.
Stinson
Change in the ratio of capital gains realizations to the value of household assets.
Growth in the value of household assets and GDP. Dummy for 1986.
TABLE 5: Current-Year CURRENT-YEAR Forecasts FORECASTS and Actual AND Capital ACTUAL Gains CAPITAL GAINS
Level of Gains (Billions of dollars) Actual Forecast Error
Growth Rate (Percent) a Actual Forecast Error
1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
171 324 144 162 154 124 112 127 152 153 180 261 365 c 450
223 144 151 224 170 132 119 131 150 177 b 196 382 418 500
101 0 11 -70 -46 -20 8 21 3 3 65 -17 32
90.2 -55.6 12.3 -5.2 -19.4 -9.8 13.5 20.2 0.3 17.9 45.0 39.8 23.4
30.7 -55.6 4.7 38.4 11.7 10.0 10.2 6.1 3.4 14.9 8.9 46.4 13.0 13.6
59.5 0.1 7.5 -43.5 -31.1 -19.8 3.3 14.1 -3.1 2.9 36.1 -6.6 10.4
Root mean squared error, 1986-1998 Root mean squared error, 1991-1998
25.6 16.1
SOURCE: CBO calculations. a. Growth rate forecasts are from preliminary values for the prior year. b. The December 1995 forecast was modified to $175 billion in the delayed baseline of March 1996. c. Preliminary.
Out-Year Projections Because capital gains have shown little trend relative to GDP, CBO projects that in the years after the current year, gains will move back to their expected size relative to GDP. That expected size is the historical average adjusted by a regression equation for differences between current and historical average tax rates on capital gains. For example, at the end of 1999, CBO’s equations predicted that realizations that year would be around $500 billion, or 5.5 percent of GDP. As noted earlier, gains have averaged 2.7 percent of GDP historically, but because tax rates are currently below their historical average, the expected ratio is about 3.1 percent. Thus, CBO projects that the ratio of gains to GDP will fall toward 3.1 percent starting in 2000. The rate at which that ratio declines is based on the estimated coefficients of the error-correction terms in CBO’s two equations with such terms. Those coefficients suggest a rate of decline per year equal to about 20 percent of the gap between the previous year’s ratio and expected long-run ratio. The 20 percent rate was used in the December 1999 projection. Given the rate of decline, ratios of gains to GDP can be calculated for each year in the projection period. Multiplying those ratios by CBO’s forecast of GDP in each year of the projection period gives the outyear projections of capital gains. As can be seen in Table 6, the 1999 projection shows gains declining from 1999 through 2004 and then growing back to about $500 billion in 2010.
6
TABLE 6: OUT-YEAR PROJECTIONS OF CAPITAL GAINS
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
In Billions of Dollars GDP Capital gains Past and current Projected 8,301 365 8,760 440 9,235 500 480 466 456 In Percent Ratio of gains to GDP Past and current Projected Growth rate of gains 449 447 449 453 461 471 483 498 9,692 10,154 10,610 11,069 11,544 12,054 12,589 13,148 13,734 14,362 15,024
4.39
5.02 20.61
5.41 13.64 4.96 -3.94 4.59 -3.00 4.29 -2.19 4.06 -1.39 3.87 -0.54 3.72 0.36 3.60 1.06 3.50 1.65 3.43 2.16 3.37 2.70 3.32 3.07
Assumptions 500 = Equations' predictions for 1999 3.12% = target ratio of realizations to GDP 20% = approximate rebound rate in error correction equations
SOURCES:
GDP in 1997 and 1998 are from the Bureau of Economic Analysis as of December 1999. GDP in 1999 through 2010 is from CBO's forecast of December 9, 1999. Capital gains in 1997 are from the Statistics of Income Division, Internal Revenue Service. Other data are from CBO projections and calculations.
III.
FORMULATING A NEW MODEL
This section describes the steps taken to formulate our new model. The sections that follow examine the model’s performance and compare it with alternative models. We examined performance in three stages: • In-Sample. For this stage, we estimate coefficients once over the 1952-1998 period using the data set as it existed at the end of 1999. We generate estimates of realizations for any given year by applying those coefficients to the actual values of the explanatory variables for that year. Out-of-Sample. This stage uses the same data set as the in-sample stage, but we estimate coefficients and realizations on the basis of what is sometimes called “recursive regression.” For instance, we estimate an equation over the years 1952 through 1959 and apply those coefficients to the actual values of explanatory variables in 1960 to estimate realizations in 1960. We then estimate the model over the years 1952 through 1960 and apply those coefficients to the actual values of explanatory variables in 1961 to estimate realizations in 1961. We repeat that step until we estimate the model from 1952 through 1998 and use it to generate an estimate of realizations in 1999. The set of coefficients estimated in this last step, using data from 1952 through 1998, is the same as the single set estimated for the in-sample stage. Real-Time. For this stage, we use recursive regression as we did in the previous stage but with two important differences. First, at each step in the recursive regression, the coefficients are based on the data set that actually existed at the end of November in that year. Second, values for all explanatory variables are forecast through the end of the current year on the basis of the partial and preliminary data available at the end of November.
•
•
The ultimate question we are addressing is, Which model will generate the most accurate estimates of capital gains realizations now and into the future? Our methodology assumes that useful information on that question can be derived by examining hypothetical past performance—in particular, by looking at which model would have generated the most accurate estimates over past periods had it been available to analysts. We examine that question in the out-of-sample stage and, with more realism, in the real-time stage. Arguably, examining actual or hypothetical past performance is not relevant to determining the best model now. For instance, although a model may have generated large errors before, perhaps it was “learning” along the way and may do well from now on. We acknowledge that point of view by examining the in-sample fit. However, we believe that looking at how models did, or would have done, in the 7
past also yields important information. In particular, analysts can have more confidence in using a model in the future if, over a considerable period, its coefficients have been stable and its errors in estimating gains have been small. Model Formulation Our search for an improved model was guided by several principles. One was to choose a reasonable set of explanatory variables. By reasonable, we mean few in number, with each variable having a clear economic interpretation. It also means that each variable has a coefficient that is of a plausible magnitude and stable across specifications and time. A second principle was to replace dummy variables with tax variables, so that there would be a way in the future to calculate the effects of tax rate changes and to learn from actual experience. In addition, that principle reflects our philosophy that extreme events can be useful in refining coefficient estimates and so should not be discarded as aberrations. A third principle was to allow for dynamic effects. We wanted to account for both the delayed effects of past developments and the effects of anticipated future developments. A fourth principle was to make the model useful in real time; that is, it should estimate current-year realizations well using the data that is available when the estimates are made. That principal requires that explanatory variables be reported with little delay and be subject to little revision. Our set of explanatory variables was chosen to capture four influences: • • • • Capital gains tax rates, The business cycle, Net gains on equity, and Net gains on property.
Our original specification was determined by in-sample fit. Variables for inclusion were evaluated according to the significance of their coefficients as measured by t-statistics and their contribution to fit as measured by the adjusted Rsquareds of the regressions. The evaluations were done sequentially in the order of the influences listed above. That is, we first selected tax rate variables based on the t-statistics and adjusted R-squareds. Then we selected a business cycle variable based on the same criteria but applied to the regression including our selected tax variables. After the business cycle variable was chosen and evaluated in the regression, we went on to select the variable representing equity net gains and finally the variable representing property net gains.
8
Dependent Variable For the dependent variable, we took the difference in the ratio of capital gains realizations to nominal potential GDP, using the series constructed by CBO for the denominator. Using potential GDP as the scale variable is equivalent to assuming that gains change in proportion to potential GDP, controlling for other influences. The idea of using a scale variable was also employed by Stinson and Mariger. We followed Mariger’s choice of the scale variable partly because CBO uses the ratio of gains to GDP in projecting realizations in future years, and the projection of GDP follows the growth of potential GDP. Our initial thought in selecting this variable was that one equation could improve both current-year estimation and future-year forecasting. We used the first difference in the ratio to increase the likelihood that our dependent variable is stationary. The ratio itself had appeared to be stationary when CBO’s current method was developed in 1991 using data from 1955 through 1990. That appearance was the basis for the assumption used in the out-year projections that gains would revert to their historical size relative to GDP. Tests on data from 1955 through 1993 also soundly rejected the hypothesis of a unit root in the ratio. However, the ratio appears less stationary now because of two data changes in the current analysis. First, we expanded the sample period by adding 1952 through 1954. We added no years before 1952 to be able to test various lag lengths of explanatory variables. Second, the ratio can now be observed through 1998. The newly added early years are record lows in the ratio, and the newly available recent years are unusual highs. A Dicky-Fuller test of the ratio of gains to GDP now barely rejects the hypothesis of a unit root at the 5 percent probability level. Using the first difference in the ratio of gains to GDP removes all evidence of a unit root, returning us to a more clearly stationary dependent variable. We tested for the addition of lags in the dependent variable, but their coefficients were insignificant, and they added nothing to the fit. Explanatory Variables Tax Rates. As our measure of capital gains tax rates, we use the rate faced by taxpayers in the highest income tax bracket. And, as is done in microstudies, we separate the steady-state and transitory effects of changes in that rate.4 The transitory effects are generally expected to be larger than the steady-state effects because the former allow substitution of realizations between years, but the latter do not. Bull
4.
See Alan Auerbach and Jonathan Siegel, “Capital-Gains Realizations of the Rich and Sophisticated,” American Economic Review (May 2000), pp. 276-282.
9
and Richardson followed a similar strategy, although they specified their tax terms differently. For the steady-state tax rate variable, MTRNEXT, we use the rate legislated as of the current year that is to be in effect in the next year.5 The steady-state tax rate enters the realizations regression linearly (which implies that revenues are quadratic in the steady-state tax rate). For our transitory tax rate variable, MTRTRANS, we take the difference between this year’s rate and the steady-state rate, square that quantity, and preserve the sign. The basic idea behind the nonlinear specification of the transitory tax rate term is that large anticipated tax changes can be expected to have disproportionately larger effects on realizations than small anticipated tax changes will. That idea can be motivated by a transactions-cost theory of financial markets. Transactions costs create thresholds for investors. For small changes in anticipated tax rates, few investors would cross their thresholds and change their portfolios. However, more and more would cross their thresholds and rearrange their portfolios as the anticipated tax change grew larger. Transactions costs lessen in importance over the long run, which is consistent with the linear form of the steady-state tax term. More specifically, we assume that the level of gains (ignoring potential GDP for the moment) is explained as follows: GAINS = ?*MTRNEXT + ?*MTRTRANS + ?*X + µ where MTRTRANS = (MTR-MTRNEXT)2 *sign(MTR-MTRNEXT) and MTR is the current-year tax rate; X is other variables; ?, ?, and ? are coefficients; and µ is the unexplained residual. Furthermore, because we transform our dependent variable to first differences, our estimating equation becomes: D(GAINS) = ?*D(MTRNEXT) + ?*D(MTRTRANS) + ?*D(X) + D(µ) where D(.) indicates the change from the previous year. To illustrate the results of this tax rate specification, consider the years surrounding the Tax Reform Act of 1986, which raised the top tax rate on capital gains to 28 percent starting in 1987. The actual top tax rate on gains, our MTR, was .20 in 1985 and 1986 and .28 in 1987 and thereafter. But the legislated rate to be in effect next year, our MTRNEXT, was .20 in 1985 and .28 in 1986 and thereafter.
5.
The construction of our legislated tax variable and its values year by year are described in a memo by Larry Ozanne, which is available on request.
10
Thus, D(MTRNEXT) has a value of 0 in 1985, .08 in 1986, and 0 again in 1987. With the coefficient ? negative, this pattern implies that taxpayers will permanently reduce their realizations starting in 1986. The transitory tax term, MTRTRANS, which depends on the difference between MTR and MTRNEXT, is (.20-.20)2 = 0 in 1985, -(.20-.28)2 = -.0064 in 1986, and (.28-.28)2 = 0 in 1987. Thus, D(MTRTRANS) is 0 in 1985, -.0064 in 1986, and +.0064 in 1987. With the coefficient ? negative, this pattern implies that taxpayers will increase their realizations in 1986 to take advantage of the temporarily low rate and then reverse that increase in 1987. The estimated magnitudes of coefficients ? and ? should be such that the transitory effect of a big announced tax change is larger than the permanent effect, thereby accounting for the increase in gains in 1986 and the reduction in 1987. Business Cycle. For our measure of the business cycle, we employed the ratio of annual GDP to annual potential GDP, as used in Mariger’s model. We chose annual averages so that the business cycle variable is measured consistently with the scale variable used as the denominator of the dependent variable. Since the equation is in first-difference form, the business cycle variable also enters as a first difference. That formulation, which separates the scale and business cycle variables, fixes a problem with CBO’s current equations. In them, GDP often plays both roles, which makes its coefficient difficult to interpret and probably contributes to making its value vary from equation to equation. Equity Net Gains. For the variable representing net gains in equity, we chose the difference in the logarithm of the Standard & Poor’s (S&P) 500 index from the fourth quarter of the previous year to the fourth quarter of the current year. Although that measure was not our first choice on conceptual grounds, it turned out to be more practical than the alternatives. Initially, we considered three main contenders for this variable: one based on flow-of-funds revaluations, one based on stock market volume, and one based on stock prices. Conceptually, the first one, a revaluations variable, seemed most closely related to our dependent variable, as discussed in Section I. And empirically, with full-sample information, it was the best. In first-difference form and entering with three lags, it provided the best fit—but only marginally better than that of the current year’s growth in the S&P 500. Consideration of real-time problems, however, tipped the balance in favor of the S&P 500. In real time, in early December of any year, the S&P 500 is known up to the day the current-year estimate of realizations is made. However, at that time, there is no figure for revaluations in that year and only a preliminary figure for revaluations in the previous year, which is likely to face major revisions. So if the revaluations measure were used as an explanatory variable, its current-year value would have to be estimated using current 11
stock prices. When we experimented by using the current year’s growth in stock prices and the lagged growth in revaluations, the lagged terms were insignificant. Thus, we dropped revaluations from consideration. We also did not choose the second option, the dollar-volume measure used by Siwapradit. On conceptual grounds, we felt that its relationship to realizations has been changing. One reason is that the share of stock market volume accounted for by institutions not subject to the capital gains tax has been steadily rising since the 1950s (although that change would also affect the relationship of stock prices to realizations). Another reason for discarding stock market volume is that the costs of trading stocks have declined sharply in recent years, especially with trading over the Internet. The result is almost certainly an increasing amount of sales with smaller gains per trade. On empirical grounds, we found that the current year’s growth in the S&P 500 fit better than the volume variable over the whole sample period. Even choosing to use a stock price variable to represent equity net gains does not necessarily suggest that the measure should be simply the current year’s growth in the S&P 500. For example, we allowed for up to four lagged growth rates in the index, but they did not significantly improve the fit. We also considered a broaderbased stock index, the Wilshire 5000, which includes all stocks on the New York, Nasdaq, and American exchanges. Surprisingly, we found that in the years when both the S&P 500 and Wilshire 5000 are available, they track each other very closely. So even though Nasdaq and S&P 500 stock prices have behaved differently over the past few years, the difference has been offset in the Wilshire 5000 with the behavior of the prices of other stocks that are in neither the S&P nor Nasdaq. Since the S&P 500 is available over our whole sample period but the Wilshire 5000 is not, we opted to use the S&P 500. Property Net Gains. For the variable representing property net gains, we chose multifamily housing starts. Although that is also the choice in the current CBO equations, its inclusion seems, on first thought, to be strange. It won an empirical process of elimination, however, and we have a hypothesis for why it contributes to our model’s fit. We were seeking a measure of accrued gains on real estate investments and other unincorporated business assets held by households. Conceptually, the variable closest to that desired measure is the revaluations on noncorporate equity in the flowof-funds accounts. Empirically, however, that variable is not measured well, probably because prices of existing property are not measured well. Moreover, that variable would present the same real-time problems of delayed availability and large revisions that revaluations of corporate equity do. Further, when we included it in our regression, it added nothing to fit.
12
Our next choice was an index of real estate prices, excluding single-family homes. We tried measures of prices from the National Association of Real Estate Investment Trusts, the National Council of Real Estate Investment Fiduciaries, the National Real Estate Index (compiled by CB Real Estate), and the national income and product accounts (NIPA). None of them helped—the first three suffer from short histories, limited coverage of real estate, and failure to measure change in the market price of the same basket of properties, and the NIPA measures reflect the cost of investment rather than the price of existing properties. Our third choice was NIPA investment variables. But they, too, did not help. Thus, we ended up using the series for multifamily housing starts, which does contribute marginally to fit. Our hypothesis for that result is based on a Tobin’s Q story for real estate investment. The story is that changes in demand or supply of real estate first cause changes in the prices of existing properties, which change the ratio of prices of new structures to prices of existing structures (Tobin’s Q). Changes in Tobin’s Q then lead to changes in the rate of investment in new structures. Although changes in multifamily housing starts lag behind changes in prices of existing structures, the former are measured much more accurately. Meanwhile, starts lead investment, which is just a distributed lag of past starts. When we examine the contribution of starts to fit, it appears small over the whole period. But in another sense, it is large, because virtually all of that contribution comes during roughly 10 percent of the sample period (in the late 1980s and early 1990s). During those years, realizations were much weaker than would be expected on the basis of stock prices and the other explanatory variables in our model. A reasonable explanation, supported by limited data from tax returns, is that part of the shortfall during those years resulted from the collapse in real estate markets. The multifamily housing starts variable picks up that effect. Modeling the Gains in Current and Future Years Separately A final exercise did not change our model, but it did change our research strategy in important ways. We experimented with including an interest-rate-spread variable— that is, the difference between a long-term and a short-term interest rate. We reasoned that the spread is commonly found to be useful as a leading indicator and as a variable in time-series forecasts. We found that the first lag of the spread marginally contributed to fit, but its coefficient was unstable. All of its contribution came early in the sample period, and it seemed to be capturing the process of intermediation that occurred under the Federal Reserve’s Regulation Q. In the latter part of the sample, its role changed to leading economic indicator, but the main variables it was predicting were already largely known quantities on the right-hand 13
side of our equation, such as the business cycle and stock prices. As a result, we did not include the interest rate spread in our model. Nevertheless, we changed our research strategy because of that experiment. It led us to separate current-year estimation from out-year forecasting because of the difference in the predictability of right-hand-side variables. For instance, for currentyear estimation, stock prices are largely known, and including them as an explanatory variable essentially removes the usefulness of the previous year’s interest rate spread. We suspect that for forecasting, even one year ahead, the reverse would be true: the current year’s value of the interest rate spread would mostly be known, but stock prices over the next year would have to be predicted (probably badly). Thus, although we originally chose the ratio form for our model because we felt it would be useful for out-year forecasting, we concluded that was no longer necessary. We decided to approach current-year estimation and out-year forecasting using different models. The criterion of the first would be to minimize a sum of squared residuals, and the criterion of the second would be to minimize a sum of squared forecast errors. The variables in the two models could be different, as could the forms of the equations. That reasoning suggested that we need not state our current-year estimation regression in ratio form; instead, we could choose whatever form seemed to work best.
14
IV.
IN-SAMPLE EVALUATION AND COMPARISONS
All the regressions we evaluated in this stage of our model’s development were estimated by ordinary least squares over calendar years 1952 through 1998 based on the full set of data as it existed at the end of 1999. In this section, we evaluate the fit of two versions of our model—without and with multifamily housing starts—and compare their fit to the fit of our adaptations of other models. Although our model does relatively well compared with our versions of other models, the comparison suggested some additional avenues that we pursued in the out-of-sample evaluation. Our estimated regression without starts is displayed in Table 7. Its coefficients seem plausible. With realized gains scaled to potential GDP, the coefficient on the ratio of GDP to potential GDP indicates a strong business cycle effect. The stock price effect is also strong and significant. The coefficients on the tax terms are both of the expected sign and significant. The extraordinary significance of the coefficient on D(MTRTRANS) arises from the variable explaining the spike in realizations in 1986. The reasonableness of the coefficients on the tax terms can be judged by the elasticities they imply: a permanent elasticity of -.36 for a tax increase from 20 percent to 28 percent (which is near the lower end of results found in other studies) and a transitory elasticity of 1.8 for realizations in the current year from an announced tax increase next year from 20 percent to 28 percent (which is near the lower end of results found in studies of panels of taxpayers).6 The coefficients in Table 7 also vary little across various specifications of other variables. We checked for the stability over time of the coefficients by dividing the sample period in half and estimating the regression separately over each half. Although most coefficients were comparable in both subperiods, the coefficient on the transitory tax term seemed to shift by a sizable magnitude. We reasoned that the shift could result either from a fundamental instability or from the fact that the large tax increase in 1987 that was legislated in 1986 was an extreme event that allowed
6. The specification of MTRTRANS also allows inferences about a pure transitory tax change such as would occur if 1986 legislation had reduced the 1986 capital gains tax rate by 8 percentage points but left future tax rates at 20 percent. The coefficient of MTRTRANS in Table 7 implies a pure transitory elasticity of -2.2 for such a change. A few caveats are important to keep in mind when comparing our elasticities with those in earlier studies. In most equations that have been reported, the elasticity varies with the marginal tax rate and possibly other factors, so any direct comparison of elasticities is flawed unless the conditions under which they were evaluated are known. A second distinction is that our tax rate is the maximum capital gains rate, whereas many behavioral responses attempt to measure responses to the average marginal capital gains tax rate. Third, our measure of capital gains includes net short-term gains, whereas most behavioral estimates focus on net long-term gains because those gains are directly affected by the capital gains tax rate. Two recent studies reporting elasticities and references to earlier studies are Matthew Eichner and Todd Sinai, “Capital Gains Tax Realizations and Tax Rates,” National Tax Journal (forthcoming in 2000); and Auerbach and Siegel, “Capital-Gains Realizations of the Rich and Sophisticated.”
15
TABLE 7: OUR RATIO EQUATION WITHOUT MULTIFAMILY HOUSING STARTS
Dependent Variable: D(TOTGAIN2/GDPFE) Method: Least Squares Sample: 1952-1998 Variable D(MTRNEXT) D(MTRTRANS) D(GDP/GDPFE) DLOG(SP500Q4) Coefficient -0.0802 -6.0706 0.0955 0.0181 Standard Error 0.0197 0.3969 0.0233 0.0029 t-Statistic -4.0718 -15.2954 4.0908 6.1303 Probability 0.0002 0.0000 0.0002 0.0000
R-squared Adjusted R-squared Standard error of regression Sum of squared residuals Log likelihood Durbin-Watson statistic
0.877522 0.868977 0.00337 0.000488 202.9677 1.700098
Mean dependent variable Standard deviation of dependent variable Akaike info criterion Schwarz criterion F-statistic Probability (F-statistic)
0.000695 0.009309 -8.46671 -8.30925 102.6949 0
Key: D is the first difference operator: D[x(t)] = x(t) - x(t-1) TOTGAIN2 is capital gains realizations GDPFE is nominal potential GDP MTRNEXT is our permanent tax rate term MTRTRANS is our transitory tax rate term SP500Q4 is the average value of the S&P 500 index in the fourth quarter
SOURCE: CBO calculations.
the coefficient to be estimated more precisely. The regression, split at the midpoint, passes a formal Chow test for stability, which supports the second explanation for the transitory tax term coefficient. Our estimated regression with multifamily housing starts is displayed in Table 8. Including starts modestly lowers the coefficients on the business cycle and S&P 500 variables and increases the adjusted R-squared by only a little more than 1 percentage point. Although overall the improvement is small, it all comes in the roughly 10 percent of the sample period from the late 1980s to the early 1990s. As before, we split the sample period in two and estimated the regression including starts over the two halves. This time, the coefficient on multifamily housing starts also seems to shift by a sizable magnitude. Again, the regression passes a formal Chow test, which suggests that the change in the value of the coefficient reflects the greater precision afforded by the extreme event of the real estate collapse that began in the late 1980s. Neither equation suffers from serial correlation of its residuals, based on the Durbin-Watson statistic and a correlogram of the residuals. Neither equation fails the White test for heteroskedastic residuals. We next compared the fit of the two versions of our model with our adaptations of other models. To make the statistics of fit comparable among models with different dependent variables, we judged the fit of all models in terms of how well they explain the annual growth rate of realizations in-sample. Thus, we converted the predicted output of each model to a predicted growth rate in each year and compared it with actual growth rates. The comparison included the following regressions: • • The four current CBO equations specified in Table 2 (with and without starts and with and without error correction). Four equations with Siwapradit’s dollar-volume variable. Value of volume and GDP enter nominally, and no price index is included; otherwise, the equations match the CBO equations (with and without starts and with and without error correction). These equations are referred to as the dollarvolume model. Three equations (following those of Bull and Richardson) with separate measures of stock price increases and decreases, which we refer to as the SPUD (stock price up and down) model. The three equations differ in the number of variables with lagged terms: none, two (the stock price changes), and three (those two plus the dependent variable). The tax rate terms in those equations are described in Table 4. 16
•
TABLE 8: OUR RATIO EQUATION WITH MULTIFAMILY HOUSING STARTS
Dependent Variable: D(TOTGAIN2/GDPFE) Method: Least Squares Sample: 1952-1998 Variable D(MTRNEXT) D(MTRTRANS) D(GDP/GDPFE) DLOG(SP500Q4) DLOG(STARTS) Coefficient -0.0837 -6.0210 0.0704 0.0161 0.0055 Standard Error 0.0186 0.3744 0.0241 0.0029 0.0022 t-Statistic -4.4982 -16.0808 2.9165 5.5864 2.5391 Probability 0.0001 0.0000 0.0057 0.0000 0.0149
R-squared Adjusted R-squared Standard error of regression Sum of squared residuals Log likelihood Durbin-Watson statistic
0.893821 0.883709 0.003175 0.000423 206.3236 1.65308
Mean dependent variable Standard deviation of dependent variable Akaike info criterion Schwarz criterion F-statistic Probability (F-statistic)
0.000695 0.009309 -8.566962 -8.370138 88.38997 0
Key: STARTS is the number of dwelling units started in structures with two or more dwellings. Other variables are defined in Table 7.
SOURCE: CBO calculations.
•
The two versions of our equation in difference-in-ratio form (with and without starts).
The statistics of fit that we used to compare models, in terms of annual growth rates, are R-squared, adjusted R-squared, and root mean squared error over the entire sample period, and root mean squared error in the 1990s. Those statistics are displayed in Table 9.7 Examining those statistics leads to a number of conclusions. First, our equations seem to do about as well overall as the current CBO and dollar-volume equations, even though the latter have two distinct advantages: they are specified in growth-rate form and the criterion is in terms of growth rates; and they use dummy variables for 1986 and 1987, whereas ours do not. Second, across different models, multifamily housing starts helped somewhat. Third, using the dollar-volume variable in a growth-rate specification helped in the 1990s. Finally, although the SPUD equations do not fit the entire period well, they do fit the 1990s well. That final conclusion led us to experiment with our method of estimating coefficients. We observed that the dollar value of capital gains realizations has grown over time. We reasoned that the SPUD equations—which minimize the sum of squared errors in the dollar changes in capital gains realizations—are like weighted regressions that give more weight to more recent observations. We therefore experimented with our equations using both weighted regressions and Kalman-based, time-varying coefficients. The improvement was marginal, however, and we judged that it did not justify the added complexity. Although our first conclusion about the relatively good fit of our new model was encouraging, other conclusions relating to the success of different specifications in the 1990s left us open to alternative specifications of the model. Thus, we continued to experiment based on out-of-sample comparisons and, as a result, changed the preferred specification of our model.
7.
The Durbin-Watson statistic failed to find significant serial correlation of residuals in any equation.
17
TABLE 9: COMPARING IN-SAMPLE ERRORS OF EQUATIONS
R-squared
Adjusted R-squared
Root Mean Squared Error 1952-1998 1990-1998
Growth-Rate Equations [DLOG(GAINS)] Current CBO equations No error correction, no starts 0.705 0.661 No error correction, starts 0.767 0.733 Error correction, no starts 0.702 0.657 Error correction, starts 0.777 0.737 Dollar-volume equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts
0.134 0.119 0.135 0.117
0.140 0.126 0.143 0.131
0.676 0.766 0.699 0.776
0.646 0.737 0.663 0.743
0.141 0.120 0.136 0.117
0.122 0.114 0.132 0.120
No lags Lagged SPUD Lagged SPUD and gains
Change in Dollar Gains (SPUD) -3.147 -3.542 -2.348 -2.850 -2.663 -3.321 Change in Ratio of Gains to GDPFE 0.704 0.683 0.756 0.732
0.504 0.453 0.474
0.157 0.110 0.115
Without starts or constant With starts, no constant
0.135 0.122
0.122 0.119
SOURCE: CBO calculations. NOTE: SPUD = stock price up and down; GDPFE = potential nominal GDP.
V. OUT-OF-SAMPLE EVALUATION AND COMPARISONS The purpose of the out-of-sample exercise was to determine the performance of different models when there was more limited knowledge of future events at the time each estimate was made than in the in-sample exercise. Both the in-sample and outof-sample exercises used the data set as it existed at the end of 1999. And both exercises assumed that values were known for right-hand-side variables in the current year and for lagged dependent variables in the previous year. The difference in the two exercises was in how the models were estimated. For the in-sample exercise, the models were estimated over the entire sample, so the fitted values of the dependent variable at each date built in knowledge of events that occurred at future dates. For the out-of-sample exercise, that problem of building in knowledge of future events was limited by using recursive regressions. For example, regressions were estimated from 1952 through 1959, and the coefficients were used to estimate capital gains realizations for 1960. Next, the regressions were estimated from 1952 through 1960, and the new coefficients were used to generate realization estimates for 1961. The process was repeated through 1999. That process parallels the annual updating of equations that CBO actually uses, although CBO has not used the same equations for so long a period. The versions of models selected for the initial out-of-sample comparison were the same as those for the in-sample comparison, with one exception. We included only one regression with error correction and examined its performance only for the 1990s. That exception was made because recursive regression is more difficult when there is an error-correction term and because we knew from experience that equations with error-correction terms produced large errors in the late 1990s. We judged the performance of models by their out-of-sample root mean squared errors (RMSEs).8 If, over the full estimation period, a model’s recursively estimated coefficients were unchanging, its in-sample and out-of-sample RMSEs would be the same. Thus, a big deterioration in a model’s performance in this stage of testing compared with its in-sample performance indicates coefficient instability. We also investigated coefficient instability more directly by applying a Chow test to the model’s coefficients estimated over the first and second halves of the sample and by examining plots of the coefficients and their associated standard errors computed in each year of the recursive regressions.
8.
Care must be exercised in interpreting the out-of-sample RMSEs because the uncertainty surrounding estimates of realizations changes from year to year. For example, uncertainty tends to be larger in earlier years because fewer observations are available to estimate an equation’s coefficients. Thus, it may be appropriate to weight each error by the uncertainty surrounding the forecast at the time. Comparing root mean squared errors by decade controls for some differences in uncertainty.
18
Our general conclusions from comparing accuracy and stability across equations were that our difference-in-ratio model and the dollar-volume model seemed best, but neither was clearly superior. Both models had lower errors than the current CBO and SPUD models over the whole period. In particular, error correction hurt accuracy in the late 1990s, judging by the CBO equation in that form, because realizations did not revert back to their historical norm relative to GDP. Meanwhile, the difference-in-ratio and dollar-volume regressions had similar root mean squared errors. Our regressions were more accurate in the 1980s and over the 1980-1998 period, but the dollar-volume regressions were more accurate in the 1970s and over the entire 1960-1998 period. The out-of-sample RMSEs are compared in Table 10. Our difference-in-ratio regressions have more stable coefficients over the 1980s and 1990s than do the SPUD regressions. Figures 2 and 3 compare the recursive coefficients from versions of each model. (The model underlying Figure 2 is shown in Table 8, and the one underlying Figure 3 is shown in Table 11.) Note that when coefficients change in our difference-in-ratio equation, the new values are still well within the 95 percent confidence intervals of previous coefficient estimates. In contrast, the changes in coefficients in the SPUD equation following the 1987 tax increase are well beyond the confidence intervals of previously estimated coefficients. The coefficients change because tax terms in the SPUD equation must treat the fallback in realizations in 1987 proportionately to responses to other tax changes. That leaves a large part of the fallback to be explained by other variables. Nevertheless, both the SPUD equation and ours pass a Chow test when the sample is divided in halves. (The stability of the current CBO and dollar-volume regressions could not be tested over the full period because their inclusion of dummy variables essentially assumes a structural break in 1986 and 1987.) Based on those general conclusions, we experimented with changing the functional form and variables of our model. Our aim was to see whether we could improve the model by borrowing from some successes of other models, especially the dollar-volume model. Tables 12 and 13 show the in-sample and out-of-sample fits, respectively, of several alternative versions of our model. (Those tables show results for the new versions as additions to the results shown in Tables 9 and 10.) In terms of functional form, we compared the performance of our equations in their original difference-in-ratio form with difference-in-logarithm and percentage change forms. Each equation used the explanatory variables from our original equations plus a constant term. (We added the constant term because it improved the in-sample fit and out-of-sample accuracy, although it worsened the White test scores for homoskedasticity.) All three forms have similar adjusted R-squared statistics over the full sample, but the difference-in-logarithm form is slightly better than the 19
TABLE 10: COMPARING OUT-OF-SAMPLE ROOT MEAN SQUARED ERRORS (Errors are actual growth rates of gains less estimates) Addendum: Forecast of Growth Rate for 1999
1960-1998 1980-1998
1960-1969 1970-1979 1980-1989 1990-1998
Growth-Rate Equations [DLOG(GAINS] Current CBO equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Dollar-volume equations No error correction, no starts No error correction, starts 0.295 0.312 0.200 0.198 0.147 0.174 0.493 0.525 0.237 0.243 0.148 0.133 0.157 0.146 0.198 0.171 0.105 0.093
0.198 0.195
0.219 0.214
0.138 0.159
0.208 0.190
0.276 0.272
0.128 0.121
0.268 0.231
No lags Lagged SPUD Lagged SPUD and gains
0.246 0.229 0.313
Change in Dollar Gains (SPUD) 0.286 0.106 0.266 0.257 0.143 0.242 0.361 0.217 0.296 Change in Ratio of Gains to GDPFE 0.189 0.137 0.322 0.180 0.160 0.328
0.355 0.318 0.472
0.179 0.165 0.165
0.125 0.212 0.168
Without starts or constant With starts, no constant
0.221 0.224
0.232 0.218
0.126 0.125
0.138 0.122
SOURCE: CBO calculations. NOTE: Data used in the forecasts differs slightly from those used in the in-sample fits. SPUD = stock price up and down; GDPFE = potential nominal GDP.
Figure 2: Recursive Regression Coefficients (and Two Standard Errors Up or Down) of Difference-in-Ratio Equation with Starts, by Year
0.8 20
0.4
0
0.0
-20
-0.4
-40
-0.8
-60
-1.2 65 70 75 80 85 90 95 D(MTRNEXT) ± 2 S.E.
-80 65 70 75 80 85 90 95 D(MTRTRANS) ± 2 S.E.
0.20
0.035 0.030 0.025
0.15
0.10 0.020 0.05 0.015 0.00 0.010 0.005 65 70 75 80 85 90 ± 2 S.E. 95 65 70 75 80 85 90 ± 2 S.E. 95 D(GDP/GDPFE) DLOG(SP500Q4)
-0.05
0.020 0.015 0.010 0.005 0.000 -0.005 -0.010 -0.015 65 70 75 80 85 90 ± 2 S.E. 95
DLOG(STARTS)
SOURCE: CBO calculations. NOTE: For details of the ratio equation, see Table 8.
Figure 3: Recursive Regression Coefficients (and Two Standard Errors Up or Down) of Simple SPUD Equation with Starts, by Year
20 15 10 5 0 -5 -10 -15 65 70 75 80 85 90 95 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 65 70 75 80 85 ± 2 S.E. 90 95
Constant
± 2 S.E.
SPUP
1.6
5
1.2
0
0.8 -5 0.4 -10
0.0
-0.4 65 70 75 80 85 90 95
-15 65 70 75 80 85 90 95
SPDOWN
± 2 S.E.
D(CBOMTR)
± 2 S.E.
20 15 10 5 0 -5 -10 65 70 75 80 85 90 95
CBOPOS(1)
± 2 S.E.
SOURCE: CBO calculations. NOTE: For details of the simple SPUD equation, see Table 11.
TABLE 11: SIMPLE SPUD MODEL
Dependent Variable: D(TOTGAIN2) Method: Least Squares Sample: 1952-1998 Variable Constant SPUP SPDOWN D(CBOMTR) CBOPOS(1) Coefficient 0.286711 0.326442 0.555197 -11.74883 12.13001 Standard Error 3.116311 0.053031 0.141336 1.309981 1.448326 t-Statistic 0.092003 6.155634 3.928194 -8.968706 8.375189 Probability 0.9271 0 0.0003 0 0
R-squared Adjusted R-squared Standard error of regression Sum of squared residuals Log likelihood Durbin-Watson statistic
0.863226 0.8502 16.47897 11405.37 -195.7449 2.213039
Mean dependent variable Standard deviation of dependent variable Akaike info criterion Schwarz criterion F-statistic Probability (F-statistic)
9.437381 42.57696 8.542335 8.739159 66.2692 0
Key: D(TOTGAIN2) is change in capital gains, in billions of dollars SPUP is the sum of the monthly increases in the S&P 500 index for those months with a net increase over the previous month SPDOWN is analogous to SPUP for decreases in the S&P 500 D(CBOMTR) is the change in an average tax rate on capital gains CBOPOS(1) is the increase in CBOMTR in the coming year when the rate increases; 0 otherwise
SOURCE: CBO calculations. NOTE: SPUD = stock price up and down.
TABLE 12: COMPARING IN-SAMPLE ERRORS OF MORE EQUATIONS
R-squared
Adjusted R-squared
Root Mean Squared Error 1952-1998 1990-1998
Growth-Rate Equations [DLOG(GAINS)] Current CBO equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Dollar-volume equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts 0.705 0.767 0.702 0.777 0.661 0.733 0.657 0.737 0.134 0.119 0.135 0.117 0.140 0.126 0.143 0.131
0.676 0.766 0.699 0.776
0.646 0.737 0.663 0.743
0.141 0.120 0.136 0.117
0.122 0.114 0.132 0.120
No lags Lagged SPUD Lagged SPUD and gains
Change in Dollar Gains (SPUD) -3.147 -3.542 -2.348 -2.850 -2.663 -3.321 Change in Ratio of Gains to GDPFE 0.704 0.683 0.756 0.732 0.720 0.693 0.770 0.741 Scaled Growth Rate [DLOG(GAINS/GDPFE)] 0.716 0.689 0.783 0.757 0.696 0.667 0.760 0.731
0.504 0.453 0.474
0.157 0.110 0.115
Without starts or constant With starts, no constant Without starts, with constant With starts and constant
0.135 0.122 0.131 0.119
0.122 0.119 0.115 0.113
S&P 500, without starts S&P 500, with starts Dollar volume, without starts Dollar volume, with starts
0.132 0.115 0.136 0.121
0.121 0.114 0.109 0.110
Constant, no starts Constant, starts
Percentage Growth Rate of Gains/GDPFE 0.732 0.707 0.766 0.738
0.128 0.120
0.119 0.115
SOURCE: CBO calculations. NOTE: SPUD = stock price up and down; GDPFE = potential nominal GDP. The scaled growth rate and percentage growth rate equations include a constant term.
TABLE 13: COMPARING OUT-OF-SAMPLE ROOT MEAN SQUARED ERRORS IN MORE EQUATIONS (Errors are actual growth rates of gains less estimates) Addendum: Forecast of Growth Rate for 1999
1960-1998 1980-1998
1960-1969 1970-1979 1980-1989 1990-1998
Growth-Rate Equations [DLOG(GAINS)] Current CBO equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Dollar-volume equations No error correction, no starts No error correction, starts 0.295 0.312 0.200 0.198 0.147 0.174 0.493 0.525 0.237 0.243 0.148 0.133 0.157 0.146 0.198 0.171 0.105 0.093
0.198 0.195
0.219 0.214
0.138 0.159
0.208 0.190
0.276 0.272
0.128 0.121
0.268 0.231
No lags Lagged SPUD Lagged SPUD and gains
0.246 0.229 0.313
Change in Dollar Gains (SPUD) 0.286 0.106 0.266 0.257 0.143 0.242 0.361 0.217 0.296 Change in Ratio of Gains to GDPFE 0.189 0.137 0.322 0.180 0.160 0.328 0.184 0.145 0.311 0.176 0.167 0.317 Scaled Growth Rate [DLOG(GAINS/GDPFE)] 0.157 0.192 0.234 0.139 0.208 0.227 0.249 0.150 0.230 0.231 0.175 0.221 Percentage Growth Rate of Gains/GDPFE 0.143 0.259 0.245 0.150 0.277 0.228
0.355 0.318 0.472
0.179 0.165 0.165
0.125 0.212 0.168
Without starts or constant With starts, no constant Without starts, with constant With starts and constant
0.221 0.224 0.216 0.219
0.232 0.218 0.226 0.214
0.126 0.125 0.121 0.121
0.138 0.122 0.129 0.116
S&P 500, no starts S&P 500, starts Dollar volume, no starts Dollar volume, starts
0.188 0.184 0.223 0.215
0.180 0.155 0.327 0.298
0.126 0.120 0.113 0.116
0.216 0.183 0.285 0.242
Constant, no starts Constant, starts
0.206 0.209
0.158 0.171
0.124 0.122
SOURCE: CBO calculations. NOTE: Data used in the forecasts differ slightly from those used in the in-sample fits. SPUD = stock price up and down; GDPFE = potential nominal GDP. The scaled growth rate and percentage growth rate equations include a constant term.
others in terms of out-of-sample root mean squared errors. It tends to have relatively smaller errors in years when all of the equations make large errors. A Dicky-Fuller test could not reject the presence of a unit root in the logarithm of the ratio of gains to potential GDP, but it could in the first difference, suggesting that the difference-inlogarithm variable is stationary. In terms of variables, beyond the constant term, we experimented in our difference-in-logarithm equations with substituting the dollar-volume variable for the S&P 500 variable. Replacing the S&P 500 variable with the dollar-volume variable was not a clear success. The equations with the S&P 500 fit slightly better over the whole sample period. Their out-of-sample errors are smaller over the 1960-1998 and 1980-1998 periods and during the 1980s. The equations with dollar volume have slightly lower errors in the 1990s and much lower errors in the 1970s. Equations with either variable do not have autocorrelated residuals. The S&P 500 equations pass the Chow test but fail the White test. The dollar-volume equations pass the White test, but the equation without starts fails the Chow test and the one with starts barely passes it. Moreover, in the recursive regressions, the S&P 500 equations have more stable coefficients on the variables representing the business cycle and equity net gains. Based on those results, our preferred equation uses the difference-in-log form (see Table 14). It includes a constant; the difference in the permanent tax rate (the legislated tax rate for next year); the difference in the transitory tax rate (the current rate less the permanent rate, squared but with the sign preserved); the difference in the log of the ratio of GDP to potential GDP; the difference in the log of the S&P 500; and the difference in the log of multifamily housing starts. It has the highest adjusted R-squared. It is most accurate over the entire period and over 1980 through 1998, although it does not dominate in the 1990s. It fails the White test but passes the Chow test, and its coefficients are more stable over the 1980s and 1990s than those of the other equations. The coefficients of our preferred specification also seem reasonable. The size and significance of the coefficient on DLOG(GDP/GDPFE) confirm that the excessive size of the coefficient on GDP in the current CBO equations (shown in Table 2) resulted from the sensitivity of realizations to the business cycle. The coefficient on DLOG(SP500Q4) indicates that a 1 percent increase in the S&P 500 leads to a 0.7 percent increase in the growth rate of gains relative to potential GDP. Finally, the tax terms imply a permanent elasticity of -0.47 and a transitory elasticity of 1.8 for changes in tax rates like those in 1986. Despite our model’s apparent good performance, we believe some caveats are necessary. First, there is good reason to suspect that the coefficients on the transitory tax term and multifamily housing starts are more uncertain than their standard errors 20
TABLE 14: PREFERRED EQUATION FROM IN-SAMPLE AND OUT-OF-SAMPLE TESTS
Dependent Variable: DLOG(TOTGAIN2/GDPFE) Method: Least Squares Sample: 1952-1998 Variable Constant D(MTRNEXT) D(MTRTRANS) DLOG(GDP/GDPFE) DLOG(SP500Q4) DLOG(STARTS) Coefficient -0.042628 -2.585064 -115.4395 2.358513 0.736638 0.218479 Standard Error 0.019472 0.675268 13.61065 0.871639 0.121054 0.079682 t-Statistic -2.189185 -3.828206 -8.481552 2.705837 6.085178 2.741875 Probability 0.0343 0.0004 0 0.0099 0 0.009
R-squared Adjusted R-squared Standard error of regression Sum of squared residuals Log likelihood Durbin-Watson statistic
0.802356 0.778253 0.11529 0.544968 38.05353 1.910765
Mean dependent variable Standard deviation of dependent variable Akaike info criterion Schwarz criterion F-statistic Probability (F-statistic)
0.020822 0.24483 -1.36398 -1.127791 33.2887 0
Key: D is the first difference operator: D[x(t)] = x(t) - x(t-1) LOG(x) is the logarithm of x TOTGAIN2 is capital gains realizations GDPFE is nominal potential GDP MTRNEXT is our permanent tax rate term MTRTRANS is our transitory tax rate term SP500Q4 is the average value of the S&P 500 index in the fourth quarter STARTS is the number of dwelling units started in structures with two or more dwellings
SOURCE: CBO calculations.
indicate. The coefficient on the transitory tax term is estimated primarily from one event: the large tax increase legislated in 1986 that took effect in 1987. Similarly, the coefficient on multifamily housing starts is estimated mainly from one event: the real estate collapse that occurred from the late 1980s to the early 1990s. Although those two coefficients are estimated reasonably based on historical information, we suspect they could be subject to large revisions if significant changes occurred in the future to anticipated capital gains tax rates or real estate market conditions. Another caveat is that the superiority of our equation largely comes in the 1980s, when it had an advantage. None of the other models had a way to forecast out-of-sample the effects on realizations of the anticipated 1987 tax increase. (The SPUD model could anticipate the reaction in 1986 but only part of the reaction in 1987.) Thus, it is fair to argue that knowledge of what happened in 1986 and 1987 guided our specification. However, the specification still has an advantage over the others. Should changes in tax rates be announced in the future, our equation can estimate the effect on realizations, even if that estimate is imprecise. And the new observation will allow the coefficient estimate to be refined. In contrast, equations that use dummies for 1986 and 1987 cannot generate such estimates, and the SPUD equation will have trouble estimating the effect of the tax change in the year following the change.
21
VI. REAL-TIME EVALUATION AND COMPARISONS Out-of-sample comparisons are better than in-sample comparisons at indicating how analysts would have fared using alternative models. The advantage of out-of-sample comparisons is that they limit the degree to which unknown future developments can affect the realization estimates of alternative models. But out-of-sample comparisons do not go far enough. In this section, we examine how analysts would have fared in estimating realizations using different models if they had only the information available at the date the estimates were made. Such real-time comparisons differ from out-of-sample comparisons in two fundamental ways. First, out-of-sample model estimation is based on the data set as it currently exists. However, that data set has been revised several times and can differ greatly from the set that analysts had when they made their estimates of realizations. In real time, by contrast, the models are estimated as of November 30 of each year based on the data the modeler would have had at that time. Second, the realizations estimates from out-of-sample model estimation assume that the lagged dependent variable and the right-hand-side variables are known with certainty. But, of course, that is not actually the case. Realizations in the previous year have not been fully tabulated by the Internal Revenue Service as of November 30, so that figure must be extrapolated. Meanwhile, right-hand-side variables are known for only part, if any, of the current year and must be projected through the end of the year. The main questions we posed in our real-time experiment were: How well would CBO have estimated the current-year growth of capital gains in the 1990s had it had access to the models we consider? Which of those models would have performed best? And would that model have improved on the estimates that CBO actually made? Thus, for the extrapolations of realizations in the previous year, we used the ones CBO actually made in each December from 1991 through 1998. For 1990, we extrapolated the way that CBO would have, assuming it used the same method it did in later years. For projections of right-hand-side variables for the year, we used those actually produced by CBO’s Macroeconomic Analysis Division (MAD) for the variables they project. For variables not projected by MAD, such as stock prices or value of volume, we used simple time-series methods of calculation. In particular, we assumed that each of those series follows a continuous-time random walk with drift. Thus, for each series, we took the value at the end of November and increased it by its historical average monthly gain (through that November) to get its December value. Because real-time considerations about the reliability of data and the predictability of right-hand-side variables could change the ranking of equations in
22
terms of estimating accuracy, we included four versions of our difference-inlogarithm specification: • • With and without multifamily housing starts, and With an S&P 500 variable or a dollar-volume variable.
For comparison, we also included four of CBO’s current forecasting equations, with and without starts and with and without error-correction terms. Background information for the real-time comparisons appears in Table 15. The first line shows the annual growth rate of realizations based on the data provided by the IRS for complete years. Although the actual figures computed by the IRS are available after a considerable delay (and would not be available in real time), they still are the ones that CBO is trying to estimate. Thus, the accuracy of realizations estimates is measured with respect to the figures in the first line. However, because of the delay, an actual figure for 1999 is not yet available, and the figure for 1998 could change slightly. The second line of Table 15 shows the baseline estimates of growth in realizations that CBO made in early December of each year. In 1991 through 1999, those estimates relied on input from the current CBO equations as well as on judgment. In 1990, the estimate was based on a different model. The next five lines show the estimates of the four current CBO equations and their average. Those estimates were made in 1991 through 1999 and provided the starting point for the forecast shown in the second line of the table. The average of those equations differs noticeably from the baseline estimate in 1991, 1994, 1997, and 1999, when additional considerations were incorporated. The next five lines show revised estimates from the current CBO equations, which we reestimated to make them more comparable with our new equations. The current equations’ sample period was extended from 1955 back to 1952, the same as for the new equations. Also, the value of corporate equities in the current year was projected using the growth rate of the S&P 500, as we projected that index for December in our new equations. Originally, corporate equities were updated using the New York Stock Exchange composite index without any drift added for the remainder of the year. The last five lines of Table 15 parallel the previous five lines, but the estimates are from our new equations. They predict much stronger growth in realizations in 1999 than the current equations or CBO’s baseline estimate do. Only time will tell which estimate is closer to the actual outcome. 23
TABLE 15: ACTUAL AND ESTIMATED GROWTH RATES OF CAPITAL GAINS REALIZATIONS
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
Actual CBO Baseline Current CBO Equations a Dependent Variable = DLOG(GAINS) No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Average Reestimated CBO Equations Dependent Variable = DLOG(GAINS) No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Average
-0.196 0.117
-0.098 0.100
0.135 0.102
0.202 0.061
0.003 0.034
0.179 0.150
0.447 0.089
0.399 0.464
0.234 0.130
n.a. 0.136
n.a. n.a. n.a. n.a. n.a.
0.052 -0.053 0.052 -0.029 0.005
0.111 0.056 0.093 0.102 0.090
0.061 0.036 0.093 0.077 0.067
0.041 0.131 0.055 0.152 0.095
0.140 0.161 0.170 0.175 0.162
0.105 0.103 0.058 0.073 0.085
0.534 0.464 0.301 0.294 0.394
0.201 0.182 0.068 0.073 0.129
0.152 0.135 0.028 0.028 0.084
-0.085 -0.136 -0.136 -0.152 -0.127
-0.019 -0.127 -0.047 -0.115 -0.077
0.083 0.004 0.019 0.032 0.035
0.055 0.022 0.058 0.041 0.044
0.058 0.141 0.046 0.142 0.097
0.188 0.203 0.183 0.190 0.191
0.144 0.141 0.097 0.112 0.124
0.460 0.404 0.308 0.293 0.366
0.284 0.255 0.132 0.124 0.199
0.121 0.099 0.026 0.024 0.067
New Equations Dependent Variable = DLOG(GAINS/GDPFE) S&P 500, no starts -0.114 S&P 500, starts -0.140 Dollar volume, no starts -0.081 Dollar volume, starts -0.114 Average -0.112
0.046 -0.052 0.000 -0.091 -0.024
0.073 0.076 0.081 0.082 0.078
0.102 0.073 0.200 0.155 0.133
0.054 0.148 0.090 0.183 0.119
0.245 0.243 0.186 0.190 0.216
0.175 0.180 0.148 0.156 0.165
0.507 0.472 0.380 0.362 0.430
0.252 0.239 0.213 0.204 0.227
0.197 0.168 0.269 0.229 0.215
SOURCE: CBO calculations. NOTE: GDPFE = potential nominal GDP; n.a. = not applicable. a. The current CBO equations were first used in 1991.
Table 16 compares the degree to which the estimates shown in Table 15 differ from the actual growth rates. That comparison of errors yields three major conclusions: • • • The new equations perform best overall; The average of the four new equations slightly outperforms any one of the new equations individually; and The reestimated CBO equations do better than the current CBO ones, which in turn do better than CBO’s baseline estimates.9
Compared with the reestimated equations, the average improvement in accuracy from the new equations is on the order of 15 percent. The improvement is not uniform, however, since the current equation without multifamily housing starts or error correction does only slightly worse than some of the new equations. For all of the specifications, multifamily housing starts help only in our new equation with the S&P 500. The average estimate of our new equations improves on the baseline estimates that CBO made in the 1991-1998 period (when CBO was using the current equations) by reducing the root mean squared error from .160 to .117, roughly a 27 percent improvement. Most of that improvement comes from reducing the root mean squared error of the current-equation estimates from .151 to .117, roughly a 23 percent improvement. The average estimate from the four new equations does better than any of the new equations individually. The equation with dollar volume but not starts does the best of the four, particularly from 1991 through 1998. The equation with the S&P 500 and starts, chosen as our preferred equation on the basis of its in-sample and outof-sample performance, comes in second place. The superiority of the dollar-volume equations in the real-time comparisons is consistent with their superiority for the 1990s in both in-sample and out-of-sample comparisons. However, those equations did not dominate out-of-sample in the period since the 1960s and did particularly poorly in the 1980s. Moreover, the fact that the average did the best suggests that all
9. The relative ranking of the equations remains unchanged when their target is growth from the preliminary level of gains in the previous year to the actual level of gains reached in the current year. Recall that prioryear realizations are based on incomplete tax return information and therefore contain errors. Thus, even if an equation accurately predicts the growth rate in gains as shown by final tax return data, it can still miss the level of gains reached in the current year because its growth is from the wrong base. An alternative target for comparing the equations is how well they grow from the estimated prior-year base to the correct current-year gain. When the equations are judged by that alternative standard, their root mean squared errors change slightly, but the rankings do not change. The root mean squared error for the average of the new equations falls from .114 to .113, and that for the reestimated CBO equations rises from .132 to .136. The changes in root mean squared errors are small because the errors in the prior-year level of gains are small in most years. The relative rankings are also preserved because the errors in prior-year gains appear to be unrelated to errors that the equations make in predicting current-year growth.
24
TABLE 16: ERRORS IN ESTIMATES OF GROWTH RATES OF CAPITAL GAINS REALIZATIONS
1990
1991
1992
1993
1994
1995
1996
1997
1998
Root Mean Squared Error 1990-1998 1991-1998
CBO Baseline Current CBO Equations a Dependent Variable = DLOG(GAINS) No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Average
-0.314
-0.198
0.033
0.141
-0.031
0.029
0.358
-0.064
0.104
0.184
0.160
n.a. n.a. n.a. n.a. n.a.
-0.150 -0.046 -0.150 -0.069 -0.104
0.024 0.080 0.043 0.033 0.045
0.141 0.165 0.109 0.125 0.135
-0.038 -0.128 -0.052 -0.149 -0.092
0.039 0.019 0.009 0.005 0.018
0.342 0.344 0.389 0.374 0.363
-0.135 -0.064 0.099 0.105 0.005
0.032 0.051 0.166 0.161 0.104
n.a. n.a. n.a. n.a. n.a.
0.151 0.149 0.169 0.166 0.151
Reestimated CBO Equations Dependent Variable = DLOG(GAINS) No error correction, no starts -0.111 No error correction, starts -0.060 Error correction, no starts -0.060 Error correction, starts -0.044 Average -0.069 New Equations Dependent Variable = DLOG(GAINS/GDPFE) S&P 500, no starts -0.083 S&P 500, starts -0.057 Dollar volume, no starts -0.115 Dollar volume, starts -0.082 Average -0.084
-0.080 0.029 -0.052 0.017 -0.021
0.052 0.131 0.116 0.103 0.101
0.147 0.179 0.144 0.161 0.158
-0.055 -0.138 -0.043 -0.139 -0.094
-0.009 -0.024 -0.003 -0.011 -0.012
0.303 0.306 0.350 0.336 0.324
-0.061 -0.004 0.091 0.107 0.033
-0.050 -0.022 0.102 0.109 0.035
0.127 0.136 0.143 0.147 0.132
0.128 0.143 0.150 0.155 0.138
-0.144 -0.046 -0.099 -0.007 -0.074
0.062 0.059 0.054 0.053 0.057
0.100 0.128 0.002 0.047 0.069
-0.051 -0.145 -0.087 -0.180 -0.116
-0.066 -0.064 -0.006 -0.011 -0.037
0.273 0.267 0.299 0.291 0.283
-0.107 -0.072 0.019 0.038 -0.031
-0.018 -0.006 0.020 0.029 0.006
0.122 0.119 0.117 0.121 0.114
0.126 0.124 0.118 0.125 0.117
SOURCE: CBO calculations. NOTE: GDPFE = potential nominal GDP; n.a. = not applicable. a. The current CBO equations were first used in 1991.
of the new equations should be carried into the future until finer discriminations among them are possible. Finally, the reestimated CBO equations do better than the original baseline for two reasons. First, the reestimated equations are more accurate than the ones estimated at the time (the RMSE of the average falls from .151 to .138), apparently because of the longer sample period and alternative projection of current-year equity values. Second, adjustments made to the current-equation estimates to reflect additional considerations worsened the estimates (those adjustments raised the RMSE from .151 to .160). Adjustments made in 1991 and 1997 increased the error, and one made in 1994 reduced it, but not by enough to offset the other two. It is too early to know how the adjustment in 1999 affected accuracy.
25
doc_821611571.pdf
Paid in capital (Paid-in capital or Contributed capital) refers to capital contributed to a corporation by investors through purchase of stock from the corporation (primary market) (not through purchase of stock in the open market from other stockholders (secondary market)). It includes share capital (i.e. capital stock) as well as additional paid-in capital.
Technical Paper Series Congressional Budget Office Washington, D.C.
FORECASTING CAPITAL GAINS REALIZATIONS
Preston Miller Federal Reserve Bank of Minneapolis and Congressional Budget Office Larry Ozanne Congressional Budget Office
August 2000 2000-5 Technical papers in this series are preliminary and are circulated to stimulate discussion and critical comment. They are not subject to CBO’s formal review and editing processes. The analysis and conclusions expressed in them are those of the authors and do not necessarily represent the position of the Congressional Budget Office, the Federal Reserve Bank of Minneapolis, or the Federal Reserve System. References in publications should be cleared with the authors. Papers in this series can be obtained by sending an email to [email protected]. For additional information about this paper, contact Larry Ozanne at (202) 226-2684 or by email at [email protected]. The authors wish to thank Nicholas Bull, Christopher Sims, Richard M. Todd, and Christopher Williams for their comments on this paper.
ABSTRACT As an input to its estimates of federal revenues, the Congressional Budget Office (CBO) requires an estimate of capital gains realizations in the current calendar year and forecasts of realizations for the next 10 years. The purpose of our study is to improve the accuracy of the current-year estimates. As background, we describe the models and methods that CBO now uses and briefly mention models developed at other institutions. We then discuss in detail our method for constructing a new model and compare the model’s performance with that of other models by determining how well they would have estimated realizations had they been used in the past. Those comparisons are conducted using increasingly realistic assumptions about the amount of information available at the time the estimates are made. We find that if CBO had used our new model throughout the 1990s, with only the information available when the estimates of realizations were made, CBO’s estimates of current-year realizations would have been more accurate.
CONTENTS I. II. INTRODUCTION CURRENT PRACTICES Current-Year Estimation 3 Out-Year Projections 6 III. FORMULATING A NEW MODEL Model Formulation 8 Dependent Variable 9 Explanatory Variables 9 Tax Rates 9 Business Cycle 11 Equity Net Gains 11 Property Net Gains 12 Modeling the Gains in Current and Future Years Separately 13 IV. V. VI. IN-SAMPLE EVALUATION AND COMPARISONS OUT-OF-SAMPLE EVALUATION AND COMPARISONS REAL-TIME EVALUATION AND COMPARISONS 15 18 22 7 1 3
I.
INTRODUCTION
Each December, the Congressional Budget Office (CBO) completes its budget estimates (known as the budget baseline) for the current and 10 succeeding fiscal years. As an input to the revenue estimates, CBO’s Tax Analysis Division requires, by early December, estimates of capital gains realizations for the current and 10 succeeding calendar years.1 At the time the estimates are made, last year’s and most of the current year’s capital gains have been realized. CBO cannot observe those gains until people file their tax returns and the Internal Revenue Service (IRS) processes them, a sequence that takes over a year. Last year’s realizations can be closely approximated from preliminary data from the IRS, but no such data are available for the current year. However, other factors that indicate the amount of gains likely to be realized—such as the year’s movements in stock prices, the strength of the economy, and any changes in tax rates on capital gains—are largely known by the time of the budget estimates. Consequently, CBO estimates the amount of gains realized in the current year by using historically estimated equations that relate capital gains realizations to those other aggregate factors. Projecting realizations in future years is more difficult because the macroeconomic factors used to estimate gains in the current year are unknown in the future. CBO does forecast major macroeconomic variables, but stock prices and other asset prices, which are important determinants of capital gains, are very difficult to forecast. Consequently, CBO does not project future realizations using the same equations that it uses to estimate current-year realizations. Instead, CBO assumes that realizations gradually revert to their historical size relative to gross domestic product (GDP), and it combines that assumption with its projection of GDP to project gains. The empirical research described in this paper was designed to improve CBO’s estimates of current-year realizations. We originally intended to address current-year estimation and future-year prediction with a single model, but we found that the two had different enough problems to benefit from separate treatments. We will discuss the problems of forecasting future gains in a second paper. This paper describes CBO’s current practice for both current-year estimation and future-year projection but discusses only our efforts to improve the former. It also briefly mentions models developed at other institutions because they provide ideas for refining CBO’s model of current-year realizations.
1.
The Tax Analysis Division combines the estimates of realizations with estimates of wages and other income sources and runs them through its microsimulation tax calculator to generate estimates of individual income tax revenues. The contribution of capital gains to total income tax liability is not separately identified on tax returns, and therefore it cannot be forecast directly by time-series or other methods.
Our efforts had some success. We conclude that if CBO had used our new model throughout the 1990s with only the information available when the estimates were made, its estimates of current-year realizations would have been improved. (Our new model, like CBO’s model, consists of four variants of a basic equation.) Between 1991 and 1998, the CBO model had a root mean squared error of 15.1 percentage points. Our model has a root mean squared error of 11.7 percentage points. But even those smaller errors can still lead to substantial errors in the dollar level of realizations and revenues. The new model has not yet withstood the test of time, however. Some of its coefficient estimates were honed by extreme events, and there were not nearly enough such events to have much confidence in the estimates. As a result, similar extreme events in the future could well lead to large errors from our model as well as major shifts in some of its coefficients. Nevertheless, a major advantage of the model is that it affords the opportunity to refine the estimates of some coefficients in the face of future extreme events, whereas most other models do not offer that opportunity.
2
II.
CURRENT PRACTICES
The measure of capital gains realizations that CBO estimates is the annual total of net gains that are reported by taxpayers who have net gains. Taxpayers who have net losses are excluded and their losses are estimated separately. Net losses are much smaller than net gains and grow more regularly because of a $3,000 limit on losses per return. Historically, realizations have tended to grow at the same rate as GDP but with much greater year-to-year fluctuation. Thus, the ratio of gains to GDP changes from year to year but shows little trend (see Figure 1). Its average from 1952 to 1998 was 2.7 percent. Gains reached their high-water mark of 7.4 percent of GDP in 1986, when people rushed to realize gains ahead of a tax increase that was enacted that year but did not take effect until 1987. In 1998, the most recent year for which tax return processing is largely complete, realizations were at their second highest level, 5.1 percent. That peak was reached after an uncharacteristically steady climb beginning in the early 1990s. Although many assets generate capital gains, the two most important classes of assets for tax revenues are equities and real estate (see Table 1). Equities accounted for 30 percent to 40 percent of classifiable asset sales in the 1970s and 1980s, and the tremendous increases in the value of equity holdings in the 1990s have undoubtedly made gains from equities even more important. Taxable gains on real estate come from rental residential and commercial real estate. Some of those gains are passed through partnerships and thus account for some of the gains attributed to partnerships in Table 1. Few gains on owner-occupied homes are taxed, and all such gains are excluded from Table 1. Current-Year Estimation The model that CBO uses to estimate current-year capital gains realizations consists of single-equation regressions based on macroeconomic time series. CBO’s baseline estimate of realizations is usually a central tendency of estimates from the group of equations. Macroeconomic regressions that explain gains were developed in the 1980s by analysts in the Treasury, CBO, and academia. The focus of their work was to measure how realizations respond to changes in tax rates. But the basic idea behind the equations—to explain realizations in terms of the outstanding pool of gains held by taxpayers and the cost of realizing those gains—carries over to predicting aggregate gains.
3
Figure 1: Ratio of Capital Gains Realizations to GDP
Percent 8
6
4
2
0 55 60 65 70 75 Years
SOURCES: Capital gains realizations are from the Department of the Treasury, Office of Tax Analysis, and GDP is from the Department of Commerce, Bureau of Economic Analysis.
80
85
90
95
TABLE 1: DISTRIBUTION OF CAPITAL GAINS BY ASSET TYPE
Gross Capital Gains (Millions of dollars) 1977 1981 1985
Distribution of Gains (Percent) 1977 1981 1985
Total a Corporate stock, CGD Other securities (bonds) Options and futures contracts Partnerships, S-corporations, trusts and estates Residential rental property Depreciable business personal property Depreciable business real property Other assets
53,066 14,783 560 1,689 5,112 4,596 2,256 3,410 20,660
97,057 39,447 1,065 3,683 9,485 8,229 3,576 3,420 28,152
194,689 81,814 3,054 3,406 42,977 18,748 1,335 14,067 29,290
100 28 1 3 10 9 4 6 39
100 41 1 4 10 8 4 4 29
100 42 2 2 22 10 1 7 15
SOURCE: Internal Revenue Service, Statistics of Income Bulletin (Winter 1985-86 and Spring 1999). NOTES: Data are not fully comparable among years, especially 1977 versus 1981 and 1985. CGD is capital gains distributions from mutual funds. a. Excludes all capital gains on personal residences.
Outstanding capital gains are those that have accrued during the current and prior years less those that were realized in previous years or were exempted from tax because their owner died. Accrued gains are estimated and reported in the Federal Reserve Board’s flow-of-funds accounts as revaluations of corporate and noncorporate equity held by the household sector. Those two revaluation measures were rarely used in the early development of the equations. Instead, analysts approximated accrued gains on stocks using the value of corporate equities held by the household sector (also from the flow-of-funds accounts) or stock price indexes. GDP was commonly used to approximate accrued gains on assets other than stocks. No reliable and timely data on gains exempted at death are available, so that factor was ignored in most equations. The cost of realizing gains was represented by tax rates on capital gains. In addition, equations developed after 1987 included a variable to capture the transitory effects of the large increase in the tax rate on capital gains in 1987 that was passed in 1986. CBO’s equations used a dummy variable to isolate those effects. Another cost, that of selling an asset, has not been incorporated in the equations, although the cost of trading stocks has been falling since the 1970s and could be affecting the willingness to realize gains. Empirically, the stage of the business cycle has seemed to affect people’s willingness to realize gains. CBO began estimating capital gains at the end of 1986, but it did not adopt an equation approach until 1988. From 1988 through 1990, CBO used regressions that explain the logarithm of realized gains to estimate gains in the current year and forecast gains over a five-year horizon. In 1991, it shifted to its current model, which explains changes in the logarithm of gains, or, roughly speaking, the annual growth rate of gains. In most years, CBO uses four variants of a basic equation, which differ in their inclusion of multifamily housing starts and an error-correction term (see Table 2).2 The equations are estimated from 1955 through the last complete year and then used to predict gains in the current year. In most years, CBO averages the current-year estimates from the four equations. In 1999, however, the regressions with error correction were omitted because of their large errors the previous year. Also, in some years the estimates are adjusted for special factors, such as the initial effect of the 1997 tax cut. Predictions made by the four equations in early December 1999 are shown in Table 3. The predictions have large standard errors, as indicated by their 95 percent confidence intervals.
2.
The error-correction term is the lagged residual from an equation that explains the logarithm of the ratio of gains to GDP as a function of the capital gains tax rate and the 1986 dummy variable. The residual indicates whether gains differ from their expected long-run size relative to GDP.
4
TABLE 2: CURRENT CBO EQUATIONS (Dependent variable is change in log of gains, estimated 1955-1998)
Explanatory Variables
No Error Correction No Starts Coefficient t-stat
No Error Correction Starts Coefficient t-stat
Error Correction No Starts Coefficient t-stat
Error Correction Starts Coefficient t-stat
Constant term Growth rate of prices Real growth rate of household equity holdings Growth rate of real GDP Acceleration of real GDP Growth rate of multifamily housing starts Change in maximum tax rate Indicator: 1986 = 1, 1987 = -1 (0 otherwise) Error-correction term Adjusted R-squared Durbin Watson
-0.060 1.024 0.528 2.562 1.287 -0.027 0.544
-0.8 1.0 4.3 1.9 1.4 -3.5 5.7
-0.096 1.691 0.533 2.921 0.243 -0.025 0.523
-1.6 1.8 4.9 3.1 3.0 -3.6 6.0
-0.148 1.788 0.517 4.335
-2.3 1.7 4.3 4.5
-0.120 1.985 0.484 3.304 0.223 -0.026 0.555 -0.163 0.778 1.888
-2.0 2.1 4.4 3.5 2.8 -3.7 6.3 -1.7
-0.025 0.581 -0.203 0.737 1.856
-3.4 6.1 -2.0
0.725 2.072
0.767 2.018
SOURCE: CBO calculations.
TABLE 3: FORECAST OF CHANGE IN LOGARITHM AND LEVEL OF CAPITAL GAINS FOR 1999 (Based on CBO forecasts of December 7 and 9, 1999)
Equations
Change in Log of Gains Standard 95% Confidence Interval Mean Error Low High
Level of Gains (Billions of dollars) 95% Confidence Interval Mean Low High
No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Average of change in logs
0.142 0.127 0.027 0.028 0.081
0.129 0.119 0.141 0.129
-0.117 -0.111 -0.255 -0.231 -0.179
0.401 0.365 0.309 0.287 0.340
507 500 452 452 477
391 394 341 349 368
657 634 600 586 618
SOURCE: CBO calculations. NOTE: The estimated level of gains in 1999 is based on preliminary 1998 gains of $440 billion.
In addition to the regressions developed at CBO, regression models of capital gains realizations have been developed by analysts at other institutions to provide input to their forecasts. Table 4 highlights some of the salient differences in the selection of variables among four of those models. The four were developed by Nicholas Bull and David Richardson of the Treasury Department; Randall Mariger, formerly of the Federal Reserve Board; Prawpan Siwapradit of the New York State Division of the Budget; and Thomas Stinson of the Minnesota Department of Finance.3 For 1999, CBO estimated several equations adapted from the BullRichardson, Mariger, and Siwapradit models. The equations generated current-year estimates similar to those of the CBO equations without error correction. CBO used the estimates from those adapted equations in settling on its 1999 baseline estimate. The accuracy of CBO’s current-year predictions can be evaluated for 1986 through 1998 (see Table 5). CBO was farthest off in 1986, when it underestimated how much people would respond to the impending tax increase in 1987. Other large errors occurred when gains were overestimated in 1989 and 1990 and underestimated in 1996. The errors also show a cyclical pattern, overestimating growth in 1989 through 1991 and underestimating it in most years since then. The root mean squared error on predicted annual growth rates from 1986 to 1998 is 26 percent, compared with growth rates that ranged from an increase of 90 percent to a decrease of 56 percent. Looking just at the experience from 1991 to 1998, which excludes the unusual years of 1986 and 1987 and coincides with the use of CBO’s current equations, the root mean squared error is 16 percent. During that period, growth rates ranged from a 45 percent increase to a 10 percent decline. The errors since 1991 are still large compared with the growth rates of gains, and they reflect the substantial uncertainty in predicting capital gains even when other macroeconomic variables are largely known. That uncertainty is reflected in the standard errors of the equations themselves. CBO’s record in estimating current-year realizations and some of the statistical properties of its equations suggest there is room for improvement. First and foremost, the current-year estimates are often far off the mark. Second, the coefficients on some variables, such as the growth rates of inflation and real GDP, vary considerably based on the set of other explanatory variables. Third, the coefficients on the error-correction terms vary considerably over time. Finally, the use of dummy variables for 1986 and 1987 prevents the equations from estimating the transitory effects of a future large changes in tax rates, should one occur.
3.
The models do not fully describe the process of forecasting gains at the four institutions. The institutions have access to other models and consider factors outside any specific model.
5
TABLE 4:
OTHER MODELS OF CAPITAL GAINS REALIZATIONS
Model
Dependent Variable
Explanatory Variables
Bull-Richardson
Dollar change in capital gains realizations.
Current-year change in tax rate. Next year’s change in tax rate if positive; zero otherwise. Accumulated monthly increases in S&P 500 per year. Accumulated monthly decreases in S&P 500 per year.
Mariger
Change in the ratio of capital gains realizations to nominal potential GDP.
Change in the ratio of actual GDP to potential GDP. Change in the ratio of equities held by households to potential GDP. Dummy for 1986.
Siwapradit
Change in the log of capital gains realizations.
Change in the log of the value of shares traded on the New York, Nasdaq, and American stock exchanges. Tax rate combines federal, New York State, and New York City maximums. Dummy for 1986.
Stinson
Change in the ratio of capital gains realizations to the value of household assets.
Growth in the value of household assets and GDP. Dummy for 1986.
TABLE 5: Current-Year CURRENT-YEAR Forecasts FORECASTS and Actual AND Capital ACTUAL Gains CAPITAL GAINS
Level of Gains (Billions of dollars) Actual Forecast Error
Growth Rate (Percent) a Actual Forecast Error
1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
171 324 144 162 154 124 112 127 152 153 180 261 365 c 450
223 144 151 224 170 132 119 131 150 177 b 196 382 418 500
101 0 11 -70 -46 -20 8 21 3 3 65 -17 32
90.2 -55.6 12.3 -5.2 -19.4 -9.8 13.5 20.2 0.3 17.9 45.0 39.8 23.4
30.7 -55.6 4.7 38.4 11.7 10.0 10.2 6.1 3.4 14.9 8.9 46.4 13.0 13.6
59.5 0.1 7.5 -43.5 -31.1 -19.8 3.3 14.1 -3.1 2.9 36.1 -6.6 10.4
Root mean squared error, 1986-1998 Root mean squared error, 1991-1998
25.6 16.1
SOURCE: CBO calculations. a. Growth rate forecasts are from preliminary values for the prior year. b. The December 1995 forecast was modified to $175 billion in the delayed baseline of March 1996. c. Preliminary.
Out-Year Projections Because capital gains have shown little trend relative to GDP, CBO projects that in the years after the current year, gains will move back to their expected size relative to GDP. That expected size is the historical average adjusted by a regression equation for differences between current and historical average tax rates on capital gains. For example, at the end of 1999, CBO’s equations predicted that realizations that year would be around $500 billion, or 5.5 percent of GDP. As noted earlier, gains have averaged 2.7 percent of GDP historically, but because tax rates are currently below their historical average, the expected ratio is about 3.1 percent. Thus, CBO projects that the ratio of gains to GDP will fall toward 3.1 percent starting in 2000. The rate at which that ratio declines is based on the estimated coefficients of the error-correction terms in CBO’s two equations with such terms. Those coefficients suggest a rate of decline per year equal to about 20 percent of the gap between the previous year’s ratio and expected long-run ratio. The 20 percent rate was used in the December 1999 projection. Given the rate of decline, ratios of gains to GDP can be calculated for each year in the projection period. Multiplying those ratios by CBO’s forecast of GDP in each year of the projection period gives the outyear projections of capital gains. As can be seen in Table 6, the 1999 projection shows gains declining from 1999 through 2004 and then growing back to about $500 billion in 2010.
6
TABLE 6: OUT-YEAR PROJECTIONS OF CAPITAL GAINS
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
In Billions of Dollars GDP Capital gains Past and current Projected 8,301 365 8,760 440 9,235 500 480 466 456 In Percent Ratio of gains to GDP Past and current Projected Growth rate of gains 449 447 449 453 461 471 483 498 9,692 10,154 10,610 11,069 11,544 12,054 12,589 13,148 13,734 14,362 15,024
4.39
5.02 20.61
5.41 13.64 4.96 -3.94 4.59 -3.00 4.29 -2.19 4.06 -1.39 3.87 -0.54 3.72 0.36 3.60 1.06 3.50 1.65 3.43 2.16 3.37 2.70 3.32 3.07
Assumptions 500 = Equations' predictions for 1999 3.12% = target ratio of realizations to GDP 20% = approximate rebound rate in error correction equations
SOURCES:
GDP in 1997 and 1998 are from the Bureau of Economic Analysis as of December 1999. GDP in 1999 through 2010 is from CBO's forecast of December 9, 1999. Capital gains in 1997 are from the Statistics of Income Division, Internal Revenue Service. Other data are from CBO projections and calculations.
III.
FORMULATING A NEW MODEL
This section describes the steps taken to formulate our new model. The sections that follow examine the model’s performance and compare it with alternative models. We examined performance in three stages: • In-Sample. For this stage, we estimate coefficients once over the 1952-1998 period using the data set as it existed at the end of 1999. We generate estimates of realizations for any given year by applying those coefficients to the actual values of the explanatory variables for that year. Out-of-Sample. This stage uses the same data set as the in-sample stage, but we estimate coefficients and realizations on the basis of what is sometimes called “recursive regression.” For instance, we estimate an equation over the years 1952 through 1959 and apply those coefficients to the actual values of explanatory variables in 1960 to estimate realizations in 1960. We then estimate the model over the years 1952 through 1960 and apply those coefficients to the actual values of explanatory variables in 1961 to estimate realizations in 1961. We repeat that step until we estimate the model from 1952 through 1998 and use it to generate an estimate of realizations in 1999. The set of coefficients estimated in this last step, using data from 1952 through 1998, is the same as the single set estimated for the in-sample stage. Real-Time. For this stage, we use recursive regression as we did in the previous stage but with two important differences. First, at each step in the recursive regression, the coefficients are based on the data set that actually existed at the end of November in that year. Second, values for all explanatory variables are forecast through the end of the current year on the basis of the partial and preliminary data available at the end of November.
•
•
The ultimate question we are addressing is, Which model will generate the most accurate estimates of capital gains realizations now and into the future? Our methodology assumes that useful information on that question can be derived by examining hypothetical past performance—in particular, by looking at which model would have generated the most accurate estimates over past periods had it been available to analysts. We examine that question in the out-of-sample stage and, with more realism, in the real-time stage. Arguably, examining actual or hypothetical past performance is not relevant to determining the best model now. For instance, although a model may have generated large errors before, perhaps it was “learning” along the way and may do well from now on. We acknowledge that point of view by examining the in-sample fit. However, we believe that looking at how models did, or would have done, in the 7
past also yields important information. In particular, analysts can have more confidence in using a model in the future if, over a considerable period, its coefficients have been stable and its errors in estimating gains have been small. Model Formulation Our search for an improved model was guided by several principles. One was to choose a reasonable set of explanatory variables. By reasonable, we mean few in number, with each variable having a clear economic interpretation. It also means that each variable has a coefficient that is of a plausible magnitude and stable across specifications and time. A second principle was to replace dummy variables with tax variables, so that there would be a way in the future to calculate the effects of tax rate changes and to learn from actual experience. In addition, that principle reflects our philosophy that extreme events can be useful in refining coefficient estimates and so should not be discarded as aberrations. A third principle was to allow for dynamic effects. We wanted to account for both the delayed effects of past developments and the effects of anticipated future developments. A fourth principle was to make the model useful in real time; that is, it should estimate current-year realizations well using the data that is available when the estimates are made. That principal requires that explanatory variables be reported with little delay and be subject to little revision. Our set of explanatory variables was chosen to capture four influences: • • • • Capital gains tax rates, The business cycle, Net gains on equity, and Net gains on property.
Our original specification was determined by in-sample fit. Variables for inclusion were evaluated according to the significance of their coefficients as measured by t-statistics and their contribution to fit as measured by the adjusted Rsquareds of the regressions. The evaluations were done sequentially in the order of the influences listed above. That is, we first selected tax rate variables based on the t-statistics and adjusted R-squareds. Then we selected a business cycle variable based on the same criteria but applied to the regression including our selected tax variables. After the business cycle variable was chosen and evaluated in the regression, we went on to select the variable representing equity net gains and finally the variable representing property net gains.
8
Dependent Variable For the dependent variable, we took the difference in the ratio of capital gains realizations to nominal potential GDP, using the series constructed by CBO for the denominator. Using potential GDP as the scale variable is equivalent to assuming that gains change in proportion to potential GDP, controlling for other influences. The idea of using a scale variable was also employed by Stinson and Mariger. We followed Mariger’s choice of the scale variable partly because CBO uses the ratio of gains to GDP in projecting realizations in future years, and the projection of GDP follows the growth of potential GDP. Our initial thought in selecting this variable was that one equation could improve both current-year estimation and future-year forecasting. We used the first difference in the ratio to increase the likelihood that our dependent variable is stationary. The ratio itself had appeared to be stationary when CBO’s current method was developed in 1991 using data from 1955 through 1990. That appearance was the basis for the assumption used in the out-year projections that gains would revert to their historical size relative to GDP. Tests on data from 1955 through 1993 also soundly rejected the hypothesis of a unit root in the ratio. However, the ratio appears less stationary now because of two data changes in the current analysis. First, we expanded the sample period by adding 1952 through 1954. We added no years before 1952 to be able to test various lag lengths of explanatory variables. Second, the ratio can now be observed through 1998. The newly added early years are record lows in the ratio, and the newly available recent years are unusual highs. A Dicky-Fuller test of the ratio of gains to GDP now barely rejects the hypothesis of a unit root at the 5 percent probability level. Using the first difference in the ratio of gains to GDP removes all evidence of a unit root, returning us to a more clearly stationary dependent variable. We tested for the addition of lags in the dependent variable, but their coefficients were insignificant, and they added nothing to the fit. Explanatory Variables Tax Rates. As our measure of capital gains tax rates, we use the rate faced by taxpayers in the highest income tax bracket. And, as is done in microstudies, we separate the steady-state and transitory effects of changes in that rate.4 The transitory effects are generally expected to be larger than the steady-state effects because the former allow substitution of realizations between years, but the latter do not. Bull
4.
See Alan Auerbach and Jonathan Siegel, “Capital-Gains Realizations of the Rich and Sophisticated,” American Economic Review (May 2000), pp. 276-282.
9
and Richardson followed a similar strategy, although they specified their tax terms differently. For the steady-state tax rate variable, MTRNEXT, we use the rate legislated as of the current year that is to be in effect in the next year.5 The steady-state tax rate enters the realizations regression linearly (which implies that revenues are quadratic in the steady-state tax rate). For our transitory tax rate variable, MTRTRANS, we take the difference between this year’s rate and the steady-state rate, square that quantity, and preserve the sign. The basic idea behind the nonlinear specification of the transitory tax rate term is that large anticipated tax changes can be expected to have disproportionately larger effects on realizations than small anticipated tax changes will. That idea can be motivated by a transactions-cost theory of financial markets. Transactions costs create thresholds for investors. For small changes in anticipated tax rates, few investors would cross their thresholds and change their portfolios. However, more and more would cross their thresholds and rearrange their portfolios as the anticipated tax change grew larger. Transactions costs lessen in importance over the long run, which is consistent with the linear form of the steady-state tax term. More specifically, we assume that the level of gains (ignoring potential GDP for the moment) is explained as follows: GAINS = ?*MTRNEXT + ?*MTRTRANS + ?*X + µ where MTRTRANS = (MTR-MTRNEXT)2 *sign(MTR-MTRNEXT) and MTR is the current-year tax rate; X is other variables; ?, ?, and ? are coefficients; and µ is the unexplained residual. Furthermore, because we transform our dependent variable to first differences, our estimating equation becomes: D(GAINS) = ?*D(MTRNEXT) + ?*D(MTRTRANS) + ?*D(X) + D(µ) where D(.) indicates the change from the previous year. To illustrate the results of this tax rate specification, consider the years surrounding the Tax Reform Act of 1986, which raised the top tax rate on capital gains to 28 percent starting in 1987. The actual top tax rate on gains, our MTR, was .20 in 1985 and 1986 and .28 in 1987 and thereafter. But the legislated rate to be in effect next year, our MTRNEXT, was .20 in 1985 and .28 in 1986 and thereafter.
5.
The construction of our legislated tax variable and its values year by year are described in a memo by Larry Ozanne, which is available on request.
10
Thus, D(MTRNEXT) has a value of 0 in 1985, .08 in 1986, and 0 again in 1987. With the coefficient ? negative, this pattern implies that taxpayers will permanently reduce their realizations starting in 1986. The transitory tax term, MTRTRANS, which depends on the difference between MTR and MTRNEXT, is (.20-.20)2 = 0 in 1985, -(.20-.28)2 = -.0064 in 1986, and (.28-.28)2 = 0 in 1987. Thus, D(MTRTRANS) is 0 in 1985, -.0064 in 1986, and +.0064 in 1987. With the coefficient ? negative, this pattern implies that taxpayers will increase their realizations in 1986 to take advantage of the temporarily low rate and then reverse that increase in 1987. The estimated magnitudes of coefficients ? and ? should be such that the transitory effect of a big announced tax change is larger than the permanent effect, thereby accounting for the increase in gains in 1986 and the reduction in 1987. Business Cycle. For our measure of the business cycle, we employed the ratio of annual GDP to annual potential GDP, as used in Mariger’s model. We chose annual averages so that the business cycle variable is measured consistently with the scale variable used as the denominator of the dependent variable. Since the equation is in first-difference form, the business cycle variable also enters as a first difference. That formulation, which separates the scale and business cycle variables, fixes a problem with CBO’s current equations. In them, GDP often plays both roles, which makes its coefficient difficult to interpret and probably contributes to making its value vary from equation to equation. Equity Net Gains. For the variable representing net gains in equity, we chose the difference in the logarithm of the Standard & Poor’s (S&P) 500 index from the fourth quarter of the previous year to the fourth quarter of the current year. Although that measure was not our first choice on conceptual grounds, it turned out to be more practical than the alternatives. Initially, we considered three main contenders for this variable: one based on flow-of-funds revaluations, one based on stock market volume, and one based on stock prices. Conceptually, the first one, a revaluations variable, seemed most closely related to our dependent variable, as discussed in Section I. And empirically, with full-sample information, it was the best. In first-difference form and entering with three lags, it provided the best fit—but only marginally better than that of the current year’s growth in the S&P 500. Consideration of real-time problems, however, tipped the balance in favor of the S&P 500. In real time, in early December of any year, the S&P 500 is known up to the day the current-year estimate of realizations is made. However, at that time, there is no figure for revaluations in that year and only a preliminary figure for revaluations in the previous year, which is likely to face major revisions. So if the revaluations measure were used as an explanatory variable, its current-year value would have to be estimated using current 11
stock prices. When we experimented by using the current year’s growth in stock prices and the lagged growth in revaluations, the lagged terms were insignificant. Thus, we dropped revaluations from consideration. We also did not choose the second option, the dollar-volume measure used by Siwapradit. On conceptual grounds, we felt that its relationship to realizations has been changing. One reason is that the share of stock market volume accounted for by institutions not subject to the capital gains tax has been steadily rising since the 1950s (although that change would also affect the relationship of stock prices to realizations). Another reason for discarding stock market volume is that the costs of trading stocks have declined sharply in recent years, especially with trading over the Internet. The result is almost certainly an increasing amount of sales with smaller gains per trade. On empirical grounds, we found that the current year’s growth in the S&P 500 fit better than the volume variable over the whole sample period. Even choosing to use a stock price variable to represent equity net gains does not necessarily suggest that the measure should be simply the current year’s growth in the S&P 500. For example, we allowed for up to four lagged growth rates in the index, but they did not significantly improve the fit. We also considered a broaderbased stock index, the Wilshire 5000, which includes all stocks on the New York, Nasdaq, and American exchanges. Surprisingly, we found that in the years when both the S&P 500 and Wilshire 5000 are available, they track each other very closely. So even though Nasdaq and S&P 500 stock prices have behaved differently over the past few years, the difference has been offset in the Wilshire 5000 with the behavior of the prices of other stocks that are in neither the S&P nor Nasdaq. Since the S&P 500 is available over our whole sample period but the Wilshire 5000 is not, we opted to use the S&P 500. Property Net Gains. For the variable representing property net gains, we chose multifamily housing starts. Although that is also the choice in the current CBO equations, its inclusion seems, on first thought, to be strange. It won an empirical process of elimination, however, and we have a hypothesis for why it contributes to our model’s fit. We were seeking a measure of accrued gains on real estate investments and other unincorporated business assets held by households. Conceptually, the variable closest to that desired measure is the revaluations on noncorporate equity in the flowof-funds accounts. Empirically, however, that variable is not measured well, probably because prices of existing property are not measured well. Moreover, that variable would present the same real-time problems of delayed availability and large revisions that revaluations of corporate equity do. Further, when we included it in our regression, it added nothing to fit.
12
Our next choice was an index of real estate prices, excluding single-family homes. We tried measures of prices from the National Association of Real Estate Investment Trusts, the National Council of Real Estate Investment Fiduciaries, the National Real Estate Index (compiled by CB Real Estate), and the national income and product accounts (NIPA). None of them helped—the first three suffer from short histories, limited coverage of real estate, and failure to measure change in the market price of the same basket of properties, and the NIPA measures reflect the cost of investment rather than the price of existing properties. Our third choice was NIPA investment variables. But they, too, did not help. Thus, we ended up using the series for multifamily housing starts, which does contribute marginally to fit. Our hypothesis for that result is based on a Tobin’s Q story for real estate investment. The story is that changes in demand or supply of real estate first cause changes in the prices of existing properties, which change the ratio of prices of new structures to prices of existing structures (Tobin’s Q). Changes in Tobin’s Q then lead to changes in the rate of investment in new structures. Although changes in multifamily housing starts lag behind changes in prices of existing structures, the former are measured much more accurately. Meanwhile, starts lead investment, which is just a distributed lag of past starts. When we examine the contribution of starts to fit, it appears small over the whole period. But in another sense, it is large, because virtually all of that contribution comes during roughly 10 percent of the sample period (in the late 1980s and early 1990s). During those years, realizations were much weaker than would be expected on the basis of stock prices and the other explanatory variables in our model. A reasonable explanation, supported by limited data from tax returns, is that part of the shortfall during those years resulted from the collapse in real estate markets. The multifamily housing starts variable picks up that effect. Modeling the Gains in Current and Future Years Separately A final exercise did not change our model, but it did change our research strategy in important ways. We experimented with including an interest-rate-spread variable— that is, the difference between a long-term and a short-term interest rate. We reasoned that the spread is commonly found to be useful as a leading indicator and as a variable in time-series forecasts. We found that the first lag of the spread marginally contributed to fit, but its coefficient was unstable. All of its contribution came early in the sample period, and it seemed to be capturing the process of intermediation that occurred under the Federal Reserve’s Regulation Q. In the latter part of the sample, its role changed to leading economic indicator, but the main variables it was predicting were already largely known quantities on the right-hand 13
side of our equation, such as the business cycle and stock prices. As a result, we did not include the interest rate spread in our model. Nevertheless, we changed our research strategy because of that experiment. It led us to separate current-year estimation from out-year forecasting because of the difference in the predictability of right-hand-side variables. For instance, for currentyear estimation, stock prices are largely known, and including them as an explanatory variable essentially removes the usefulness of the previous year’s interest rate spread. We suspect that for forecasting, even one year ahead, the reverse would be true: the current year’s value of the interest rate spread would mostly be known, but stock prices over the next year would have to be predicted (probably badly). Thus, although we originally chose the ratio form for our model because we felt it would be useful for out-year forecasting, we concluded that was no longer necessary. We decided to approach current-year estimation and out-year forecasting using different models. The criterion of the first would be to minimize a sum of squared residuals, and the criterion of the second would be to minimize a sum of squared forecast errors. The variables in the two models could be different, as could the forms of the equations. That reasoning suggested that we need not state our current-year estimation regression in ratio form; instead, we could choose whatever form seemed to work best.
14
IV.
IN-SAMPLE EVALUATION AND COMPARISONS
All the regressions we evaluated in this stage of our model’s development were estimated by ordinary least squares over calendar years 1952 through 1998 based on the full set of data as it existed at the end of 1999. In this section, we evaluate the fit of two versions of our model—without and with multifamily housing starts—and compare their fit to the fit of our adaptations of other models. Although our model does relatively well compared with our versions of other models, the comparison suggested some additional avenues that we pursued in the out-of-sample evaluation. Our estimated regression without starts is displayed in Table 7. Its coefficients seem plausible. With realized gains scaled to potential GDP, the coefficient on the ratio of GDP to potential GDP indicates a strong business cycle effect. The stock price effect is also strong and significant. The coefficients on the tax terms are both of the expected sign and significant. The extraordinary significance of the coefficient on D(MTRTRANS) arises from the variable explaining the spike in realizations in 1986. The reasonableness of the coefficients on the tax terms can be judged by the elasticities they imply: a permanent elasticity of -.36 for a tax increase from 20 percent to 28 percent (which is near the lower end of results found in other studies) and a transitory elasticity of 1.8 for realizations in the current year from an announced tax increase next year from 20 percent to 28 percent (which is near the lower end of results found in studies of panels of taxpayers).6 The coefficients in Table 7 also vary little across various specifications of other variables. We checked for the stability over time of the coefficients by dividing the sample period in half and estimating the regression separately over each half. Although most coefficients were comparable in both subperiods, the coefficient on the transitory tax term seemed to shift by a sizable magnitude. We reasoned that the shift could result either from a fundamental instability or from the fact that the large tax increase in 1987 that was legislated in 1986 was an extreme event that allowed
6. The specification of MTRTRANS also allows inferences about a pure transitory tax change such as would occur if 1986 legislation had reduced the 1986 capital gains tax rate by 8 percentage points but left future tax rates at 20 percent. The coefficient of MTRTRANS in Table 7 implies a pure transitory elasticity of -2.2 for such a change. A few caveats are important to keep in mind when comparing our elasticities with those in earlier studies. In most equations that have been reported, the elasticity varies with the marginal tax rate and possibly other factors, so any direct comparison of elasticities is flawed unless the conditions under which they were evaluated are known. A second distinction is that our tax rate is the maximum capital gains rate, whereas many behavioral responses attempt to measure responses to the average marginal capital gains tax rate. Third, our measure of capital gains includes net short-term gains, whereas most behavioral estimates focus on net long-term gains because those gains are directly affected by the capital gains tax rate. Two recent studies reporting elasticities and references to earlier studies are Matthew Eichner and Todd Sinai, “Capital Gains Tax Realizations and Tax Rates,” National Tax Journal (forthcoming in 2000); and Auerbach and Siegel, “Capital-Gains Realizations of the Rich and Sophisticated.”
15
TABLE 7: OUR RATIO EQUATION WITHOUT MULTIFAMILY HOUSING STARTS
Dependent Variable: D(TOTGAIN2/GDPFE) Method: Least Squares Sample: 1952-1998 Variable D(MTRNEXT) D(MTRTRANS) D(GDP/GDPFE) DLOG(SP500Q4) Coefficient -0.0802 -6.0706 0.0955 0.0181 Standard Error 0.0197 0.3969 0.0233 0.0029 t-Statistic -4.0718 -15.2954 4.0908 6.1303 Probability 0.0002 0.0000 0.0002 0.0000
R-squared Adjusted R-squared Standard error of regression Sum of squared residuals Log likelihood Durbin-Watson statistic
0.877522 0.868977 0.00337 0.000488 202.9677 1.700098
Mean dependent variable Standard deviation of dependent variable Akaike info criterion Schwarz criterion F-statistic Probability (F-statistic)
0.000695 0.009309 -8.46671 -8.30925 102.6949 0
Key: D is the first difference operator: D[x(t)] = x(t) - x(t-1) TOTGAIN2 is capital gains realizations GDPFE is nominal potential GDP MTRNEXT is our permanent tax rate term MTRTRANS is our transitory tax rate term SP500Q4 is the average value of the S&P 500 index in the fourth quarter
SOURCE: CBO calculations.
the coefficient to be estimated more precisely. The regression, split at the midpoint, passes a formal Chow test for stability, which supports the second explanation for the transitory tax term coefficient. Our estimated regression with multifamily housing starts is displayed in Table 8. Including starts modestly lowers the coefficients on the business cycle and S&P 500 variables and increases the adjusted R-squared by only a little more than 1 percentage point. Although overall the improvement is small, it all comes in the roughly 10 percent of the sample period from the late 1980s to the early 1990s. As before, we split the sample period in two and estimated the regression including starts over the two halves. This time, the coefficient on multifamily housing starts also seems to shift by a sizable magnitude. Again, the regression passes a formal Chow test, which suggests that the change in the value of the coefficient reflects the greater precision afforded by the extreme event of the real estate collapse that began in the late 1980s. Neither equation suffers from serial correlation of its residuals, based on the Durbin-Watson statistic and a correlogram of the residuals. Neither equation fails the White test for heteroskedastic residuals. We next compared the fit of the two versions of our model with our adaptations of other models. To make the statistics of fit comparable among models with different dependent variables, we judged the fit of all models in terms of how well they explain the annual growth rate of realizations in-sample. Thus, we converted the predicted output of each model to a predicted growth rate in each year and compared it with actual growth rates. The comparison included the following regressions: • • The four current CBO equations specified in Table 2 (with and without starts and with and without error correction). Four equations with Siwapradit’s dollar-volume variable. Value of volume and GDP enter nominally, and no price index is included; otherwise, the equations match the CBO equations (with and without starts and with and without error correction). These equations are referred to as the dollarvolume model. Three equations (following those of Bull and Richardson) with separate measures of stock price increases and decreases, which we refer to as the SPUD (stock price up and down) model. The three equations differ in the number of variables with lagged terms: none, two (the stock price changes), and three (those two plus the dependent variable). The tax rate terms in those equations are described in Table 4. 16
•
TABLE 8: OUR RATIO EQUATION WITH MULTIFAMILY HOUSING STARTS
Dependent Variable: D(TOTGAIN2/GDPFE) Method: Least Squares Sample: 1952-1998 Variable D(MTRNEXT) D(MTRTRANS) D(GDP/GDPFE) DLOG(SP500Q4) DLOG(STARTS) Coefficient -0.0837 -6.0210 0.0704 0.0161 0.0055 Standard Error 0.0186 0.3744 0.0241 0.0029 0.0022 t-Statistic -4.4982 -16.0808 2.9165 5.5864 2.5391 Probability 0.0001 0.0000 0.0057 0.0000 0.0149
R-squared Adjusted R-squared Standard error of regression Sum of squared residuals Log likelihood Durbin-Watson statistic
0.893821 0.883709 0.003175 0.000423 206.3236 1.65308
Mean dependent variable Standard deviation of dependent variable Akaike info criterion Schwarz criterion F-statistic Probability (F-statistic)
0.000695 0.009309 -8.566962 -8.370138 88.38997 0
Key: STARTS is the number of dwelling units started in structures with two or more dwellings. Other variables are defined in Table 7.
SOURCE: CBO calculations.
•
The two versions of our equation in difference-in-ratio form (with and without starts).
The statistics of fit that we used to compare models, in terms of annual growth rates, are R-squared, adjusted R-squared, and root mean squared error over the entire sample period, and root mean squared error in the 1990s. Those statistics are displayed in Table 9.7 Examining those statistics leads to a number of conclusions. First, our equations seem to do about as well overall as the current CBO and dollar-volume equations, even though the latter have two distinct advantages: they are specified in growth-rate form and the criterion is in terms of growth rates; and they use dummy variables for 1986 and 1987, whereas ours do not. Second, across different models, multifamily housing starts helped somewhat. Third, using the dollar-volume variable in a growth-rate specification helped in the 1990s. Finally, although the SPUD equations do not fit the entire period well, they do fit the 1990s well. That final conclusion led us to experiment with our method of estimating coefficients. We observed that the dollar value of capital gains realizations has grown over time. We reasoned that the SPUD equations—which minimize the sum of squared errors in the dollar changes in capital gains realizations—are like weighted regressions that give more weight to more recent observations. We therefore experimented with our equations using both weighted regressions and Kalman-based, time-varying coefficients. The improvement was marginal, however, and we judged that it did not justify the added complexity. Although our first conclusion about the relatively good fit of our new model was encouraging, other conclusions relating to the success of different specifications in the 1990s left us open to alternative specifications of the model. Thus, we continued to experiment based on out-of-sample comparisons and, as a result, changed the preferred specification of our model.
7.
The Durbin-Watson statistic failed to find significant serial correlation of residuals in any equation.
17
TABLE 9: COMPARING IN-SAMPLE ERRORS OF EQUATIONS
R-squared
Adjusted R-squared
Root Mean Squared Error 1952-1998 1990-1998
Growth-Rate Equations [DLOG(GAINS)] Current CBO equations No error correction, no starts 0.705 0.661 No error correction, starts 0.767 0.733 Error correction, no starts 0.702 0.657 Error correction, starts 0.777 0.737 Dollar-volume equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts
0.134 0.119 0.135 0.117
0.140 0.126 0.143 0.131
0.676 0.766 0.699 0.776
0.646 0.737 0.663 0.743
0.141 0.120 0.136 0.117
0.122 0.114 0.132 0.120
No lags Lagged SPUD Lagged SPUD and gains
Change in Dollar Gains (SPUD) -3.147 -3.542 -2.348 -2.850 -2.663 -3.321 Change in Ratio of Gains to GDPFE 0.704 0.683 0.756 0.732
0.504 0.453 0.474
0.157 0.110 0.115
Without starts or constant With starts, no constant
0.135 0.122
0.122 0.119
SOURCE: CBO calculations. NOTE: SPUD = stock price up and down; GDPFE = potential nominal GDP.
V. OUT-OF-SAMPLE EVALUATION AND COMPARISONS The purpose of the out-of-sample exercise was to determine the performance of different models when there was more limited knowledge of future events at the time each estimate was made than in the in-sample exercise. Both the in-sample and outof-sample exercises used the data set as it existed at the end of 1999. And both exercises assumed that values were known for right-hand-side variables in the current year and for lagged dependent variables in the previous year. The difference in the two exercises was in how the models were estimated. For the in-sample exercise, the models were estimated over the entire sample, so the fitted values of the dependent variable at each date built in knowledge of events that occurred at future dates. For the out-of-sample exercise, that problem of building in knowledge of future events was limited by using recursive regressions. For example, regressions were estimated from 1952 through 1959, and the coefficients were used to estimate capital gains realizations for 1960. Next, the regressions were estimated from 1952 through 1960, and the new coefficients were used to generate realization estimates for 1961. The process was repeated through 1999. That process parallels the annual updating of equations that CBO actually uses, although CBO has not used the same equations for so long a period. The versions of models selected for the initial out-of-sample comparison were the same as those for the in-sample comparison, with one exception. We included only one regression with error correction and examined its performance only for the 1990s. That exception was made because recursive regression is more difficult when there is an error-correction term and because we knew from experience that equations with error-correction terms produced large errors in the late 1990s. We judged the performance of models by their out-of-sample root mean squared errors (RMSEs).8 If, over the full estimation period, a model’s recursively estimated coefficients were unchanging, its in-sample and out-of-sample RMSEs would be the same. Thus, a big deterioration in a model’s performance in this stage of testing compared with its in-sample performance indicates coefficient instability. We also investigated coefficient instability more directly by applying a Chow test to the model’s coefficients estimated over the first and second halves of the sample and by examining plots of the coefficients and their associated standard errors computed in each year of the recursive regressions.
8.
Care must be exercised in interpreting the out-of-sample RMSEs because the uncertainty surrounding estimates of realizations changes from year to year. For example, uncertainty tends to be larger in earlier years because fewer observations are available to estimate an equation’s coefficients. Thus, it may be appropriate to weight each error by the uncertainty surrounding the forecast at the time. Comparing root mean squared errors by decade controls for some differences in uncertainty.
18
Our general conclusions from comparing accuracy and stability across equations were that our difference-in-ratio model and the dollar-volume model seemed best, but neither was clearly superior. Both models had lower errors than the current CBO and SPUD models over the whole period. In particular, error correction hurt accuracy in the late 1990s, judging by the CBO equation in that form, because realizations did not revert back to their historical norm relative to GDP. Meanwhile, the difference-in-ratio and dollar-volume regressions had similar root mean squared errors. Our regressions were more accurate in the 1980s and over the 1980-1998 period, but the dollar-volume regressions were more accurate in the 1970s and over the entire 1960-1998 period. The out-of-sample RMSEs are compared in Table 10. Our difference-in-ratio regressions have more stable coefficients over the 1980s and 1990s than do the SPUD regressions. Figures 2 and 3 compare the recursive coefficients from versions of each model. (The model underlying Figure 2 is shown in Table 8, and the one underlying Figure 3 is shown in Table 11.) Note that when coefficients change in our difference-in-ratio equation, the new values are still well within the 95 percent confidence intervals of previous coefficient estimates. In contrast, the changes in coefficients in the SPUD equation following the 1987 tax increase are well beyond the confidence intervals of previously estimated coefficients. The coefficients change because tax terms in the SPUD equation must treat the fallback in realizations in 1987 proportionately to responses to other tax changes. That leaves a large part of the fallback to be explained by other variables. Nevertheless, both the SPUD equation and ours pass a Chow test when the sample is divided in halves. (The stability of the current CBO and dollar-volume regressions could not be tested over the full period because their inclusion of dummy variables essentially assumes a structural break in 1986 and 1987.) Based on those general conclusions, we experimented with changing the functional form and variables of our model. Our aim was to see whether we could improve the model by borrowing from some successes of other models, especially the dollar-volume model. Tables 12 and 13 show the in-sample and out-of-sample fits, respectively, of several alternative versions of our model. (Those tables show results for the new versions as additions to the results shown in Tables 9 and 10.) In terms of functional form, we compared the performance of our equations in their original difference-in-ratio form with difference-in-logarithm and percentage change forms. Each equation used the explanatory variables from our original equations plus a constant term. (We added the constant term because it improved the in-sample fit and out-of-sample accuracy, although it worsened the White test scores for homoskedasticity.) All three forms have similar adjusted R-squared statistics over the full sample, but the difference-in-logarithm form is slightly better than the 19
TABLE 10: COMPARING OUT-OF-SAMPLE ROOT MEAN SQUARED ERRORS (Errors are actual growth rates of gains less estimates) Addendum: Forecast of Growth Rate for 1999
1960-1998 1980-1998
1960-1969 1970-1979 1980-1989 1990-1998
Growth-Rate Equations [DLOG(GAINS] Current CBO equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Dollar-volume equations No error correction, no starts No error correction, starts 0.295 0.312 0.200 0.198 0.147 0.174 0.493 0.525 0.237 0.243 0.148 0.133 0.157 0.146 0.198 0.171 0.105 0.093
0.198 0.195
0.219 0.214
0.138 0.159
0.208 0.190
0.276 0.272
0.128 0.121
0.268 0.231
No lags Lagged SPUD Lagged SPUD and gains
0.246 0.229 0.313
Change in Dollar Gains (SPUD) 0.286 0.106 0.266 0.257 0.143 0.242 0.361 0.217 0.296 Change in Ratio of Gains to GDPFE 0.189 0.137 0.322 0.180 0.160 0.328
0.355 0.318 0.472
0.179 0.165 0.165
0.125 0.212 0.168
Without starts or constant With starts, no constant
0.221 0.224
0.232 0.218
0.126 0.125
0.138 0.122
SOURCE: CBO calculations. NOTE: Data used in the forecasts differs slightly from those used in the in-sample fits. SPUD = stock price up and down; GDPFE = potential nominal GDP.
Figure 2: Recursive Regression Coefficients (and Two Standard Errors Up or Down) of Difference-in-Ratio Equation with Starts, by Year
0.8 20
0.4
0
0.0
-20
-0.4
-40
-0.8
-60
-1.2 65 70 75 80 85 90 95 D(MTRNEXT) ± 2 S.E.
-80 65 70 75 80 85 90 95 D(MTRTRANS) ± 2 S.E.
0.20
0.035 0.030 0.025
0.15
0.10 0.020 0.05 0.015 0.00 0.010 0.005 65 70 75 80 85 90 ± 2 S.E. 95 65 70 75 80 85 90 ± 2 S.E. 95 D(GDP/GDPFE) DLOG(SP500Q4)
-0.05
0.020 0.015 0.010 0.005 0.000 -0.005 -0.010 -0.015 65 70 75 80 85 90 ± 2 S.E. 95
DLOG(STARTS)
SOURCE: CBO calculations. NOTE: For details of the ratio equation, see Table 8.
Figure 3: Recursive Regression Coefficients (and Two Standard Errors Up or Down) of Simple SPUD Equation with Starts, by Year
20 15 10 5 0 -5 -10 -15 65 70 75 80 85 90 95 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 65 70 75 80 85 ± 2 S.E. 90 95
Constant
± 2 S.E.
SPUP
1.6
5
1.2
0
0.8 -5 0.4 -10
0.0
-0.4 65 70 75 80 85 90 95
-15 65 70 75 80 85 90 95
SPDOWN
± 2 S.E.
D(CBOMTR)
± 2 S.E.
20 15 10 5 0 -5 -10 65 70 75 80 85 90 95
CBOPOS(1)
± 2 S.E.
SOURCE: CBO calculations. NOTE: For details of the simple SPUD equation, see Table 11.
TABLE 11: SIMPLE SPUD MODEL
Dependent Variable: D(TOTGAIN2) Method: Least Squares Sample: 1952-1998 Variable Constant SPUP SPDOWN D(CBOMTR) CBOPOS(1) Coefficient 0.286711 0.326442 0.555197 -11.74883 12.13001 Standard Error 3.116311 0.053031 0.141336 1.309981 1.448326 t-Statistic 0.092003 6.155634 3.928194 -8.968706 8.375189 Probability 0.9271 0 0.0003 0 0
R-squared Adjusted R-squared Standard error of regression Sum of squared residuals Log likelihood Durbin-Watson statistic
0.863226 0.8502 16.47897 11405.37 -195.7449 2.213039
Mean dependent variable Standard deviation of dependent variable Akaike info criterion Schwarz criterion F-statistic Probability (F-statistic)
9.437381 42.57696 8.542335 8.739159 66.2692 0
Key: D(TOTGAIN2) is change in capital gains, in billions of dollars SPUP is the sum of the monthly increases in the S&P 500 index for those months with a net increase over the previous month SPDOWN is analogous to SPUP for decreases in the S&P 500 D(CBOMTR) is the change in an average tax rate on capital gains CBOPOS(1) is the increase in CBOMTR in the coming year when the rate increases; 0 otherwise
SOURCE: CBO calculations. NOTE: SPUD = stock price up and down.
TABLE 12: COMPARING IN-SAMPLE ERRORS OF MORE EQUATIONS
R-squared
Adjusted R-squared
Root Mean Squared Error 1952-1998 1990-1998
Growth-Rate Equations [DLOG(GAINS)] Current CBO equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Dollar-volume equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts 0.705 0.767 0.702 0.777 0.661 0.733 0.657 0.737 0.134 0.119 0.135 0.117 0.140 0.126 0.143 0.131
0.676 0.766 0.699 0.776
0.646 0.737 0.663 0.743
0.141 0.120 0.136 0.117
0.122 0.114 0.132 0.120
No lags Lagged SPUD Lagged SPUD and gains
Change in Dollar Gains (SPUD) -3.147 -3.542 -2.348 -2.850 -2.663 -3.321 Change in Ratio of Gains to GDPFE 0.704 0.683 0.756 0.732 0.720 0.693 0.770 0.741 Scaled Growth Rate [DLOG(GAINS/GDPFE)] 0.716 0.689 0.783 0.757 0.696 0.667 0.760 0.731
0.504 0.453 0.474
0.157 0.110 0.115
Without starts or constant With starts, no constant Without starts, with constant With starts and constant
0.135 0.122 0.131 0.119
0.122 0.119 0.115 0.113
S&P 500, without starts S&P 500, with starts Dollar volume, without starts Dollar volume, with starts
0.132 0.115 0.136 0.121
0.121 0.114 0.109 0.110
Constant, no starts Constant, starts
Percentage Growth Rate of Gains/GDPFE 0.732 0.707 0.766 0.738
0.128 0.120
0.119 0.115
SOURCE: CBO calculations. NOTE: SPUD = stock price up and down; GDPFE = potential nominal GDP. The scaled growth rate and percentage growth rate equations include a constant term.
TABLE 13: COMPARING OUT-OF-SAMPLE ROOT MEAN SQUARED ERRORS IN MORE EQUATIONS (Errors are actual growth rates of gains less estimates) Addendum: Forecast of Growth Rate for 1999
1960-1998 1980-1998
1960-1969 1970-1979 1980-1989 1990-1998
Growth-Rate Equations [DLOG(GAINS)] Current CBO equations No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Dollar-volume equations No error correction, no starts No error correction, starts 0.295 0.312 0.200 0.198 0.147 0.174 0.493 0.525 0.237 0.243 0.148 0.133 0.157 0.146 0.198 0.171 0.105 0.093
0.198 0.195
0.219 0.214
0.138 0.159
0.208 0.190
0.276 0.272
0.128 0.121
0.268 0.231
No lags Lagged SPUD Lagged SPUD and gains
0.246 0.229 0.313
Change in Dollar Gains (SPUD) 0.286 0.106 0.266 0.257 0.143 0.242 0.361 0.217 0.296 Change in Ratio of Gains to GDPFE 0.189 0.137 0.322 0.180 0.160 0.328 0.184 0.145 0.311 0.176 0.167 0.317 Scaled Growth Rate [DLOG(GAINS/GDPFE)] 0.157 0.192 0.234 0.139 0.208 0.227 0.249 0.150 0.230 0.231 0.175 0.221 Percentage Growth Rate of Gains/GDPFE 0.143 0.259 0.245 0.150 0.277 0.228
0.355 0.318 0.472
0.179 0.165 0.165
0.125 0.212 0.168
Without starts or constant With starts, no constant Without starts, with constant With starts and constant
0.221 0.224 0.216 0.219
0.232 0.218 0.226 0.214
0.126 0.125 0.121 0.121
0.138 0.122 0.129 0.116
S&P 500, no starts S&P 500, starts Dollar volume, no starts Dollar volume, starts
0.188 0.184 0.223 0.215
0.180 0.155 0.327 0.298
0.126 0.120 0.113 0.116
0.216 0.183 0.285 0.242
Constant, no starts Constant, starts
0.206 0.209
0.158 0.171
0.124 0.122
SOURCE: CBO calculations. NOTE: Data used in the forecasts differ slightly from those used in the in-sample fits. SPUD = stock price up and down; GDPFE = potential nominal GDP. The scaled growth rate and percentage growth rate equations include a constant term.
others in terms of out-of-sample root mean squared errors. It tends to have relatively smaller errors in years when all of the equations make large errors. A Dicky-Fuller test could not reject the presence of a unit root in the logarithm of the ratio of gains to potential GDP, but it could in the first difference, suggesting that the difference-inlogarithm variable is stationary. In terms of variables, beyond the constant term, we experimented in our difference-in-logarithm equations with substituting the dollar-volume variable for the S&P 500 variable. Replacing the S&P 500 variable with the dollar-volume variable was not a clear success. The equations with the S&P 500 fit slightly better over the whole sample period. Their out-of-sample errors are smaller over the 1960-1998 and 1980-1998 periods and during the 1980s. The equations with dollar volume have slightly lower errors in the 1990s and much lower errors in the 1970s. Equations with either variable do not have autocorrelated residuals. The S&P 500 equations pass the Chow test but fail the White test. The dollar-volume equations pass the White test, but the equation without starts fails the Chow test and the one with starts barely passes it. Moreover, in the recursive regressions, the S&P 500 equations have more stable coefficients on the variables representing the business cycle and equity net gains. Based on those results, our preferred equation uses the difference-in-log form (see Table 14). It includes a constant; the difference in the permanent tax rate (the legislated tax rate for next year); the difference in the transitory tax rate (the current rate less the permanent rate, squared but with the sign preserved); the difference in the log of the ratio of GDP to potential GDP; the difference in the log of the S&P 500; and the difference in the log of multifamily housing starts. It has the highest adjusted R-squared. It is most accurate over the entire period and over 1980 through 1998, although it does not dominate in the 1990s. It fails the White test but passes the Chow test, and its coefficients are more stable over the 1980s and 1990s than those of the other equations. The coefficients of our preferred specification also seem reasonable. The size and significance of the coefficient on DLOG(GDP/GDPFE) confirm that the excessive size of the coefficient on GDP in the current CBO equations (shown in Table 2) resulted from the sensitivity of realizations to the business cycle. The coefficient on DLOG(SP500Q4) indicates that a 1 percent increase in the S&P 500 leads to a 0.7 percent increase in the growth rate of gains relative to potential GDP. Finally, the tax terms imply a permanent elasticity of -0.47 and a transitory elasticity of 1.8 for changes in tax rates like those in 1986. Despite our model’s apparent good performance, we believe some caveats are necessary. First, there is good reason to suspect that the coefficients on the transitory tax term and multifamily housing starts are more uncertain than their standard errors 20
TABLE 14: PREFERRED EQUATION FROM IN-SAMPLE AND OUT-OF-SAMPLE TESTS
Dependent Variable: DLOG(TOTGAIN2/GDPFE) Method: Least Squares Sample: 1952-1998 Variable Constant D(MTRNEXT) D(MTRTRANS) DLOG(GDP/GDPFE) DLOG(SP500Q4) DLOG(STARTS) Coefficient -0.042628 -2.585064 -115.4395 2.358513 0.736638 0.218479 Standard Error 0.019472 0.675268 13.61065 0.871639 0.121054 0.079682 t-Statistic -2.189185 -3.828206 -8.481552 2.705837 6.085178 2.741875 Probability 0.0343 0.0004 0 0.0099 0 0.009
R-squared Adjusted R-squared Standard error of regression Sum of squared residuals Log likelihood Durbin-Watson statistic
0.802356 0.778253 0.11529 0.544968 38.05353 1.910765
Mean dependent variable Standard deviation of dependent variable Akaike info criterion Schwarz criterion F-statistic Probability (F-statistic)
0.020822 0.24483 -1.36398 -1.127791 33.2887 0
Key: D is the first difference operator: D[x(t)] = x(t) - x(t-1) LOG(x) is the logarithm of x TOTGAIN2 is capital gains realizations GDPFE is nominal potential GDP MTRNEXT is our permanent tax rate term MTRTRANS is our transitory tax rate term SP500Q4 is the average value of the S&P 500 index in the fourth quarter STARTS is the number of dwelling units started in structures with two or more dwellings
SOURCE: CBO calculations.
indicate. The coefficient on the transitory tax term is estimated primarily from one event: the large tax increase legislated in 1986 that took effect in 1987. Similarly, the coefficient on multifamily housing starts is estimated mainly from one event: the real estate collapse that occurred from the late 1980s to the early 1990s. Although those two coefficients are estimated reasonably based on historical information, we suspect they could be subject to large revisions if significant changes occurred in the future to anticipated capital gains tax rates or real estate market conditions. Another caveat is that the superiority of our equation largely comes in the 1980s, when it had an advantage. None of the other models had a way to forecast out-of-sample the effects on realizations of the anticipated 1987 tax increase. (The SPUD model could anticipate the reaction in 1986 but only part of the reaction in 1987.) Thus, it is fair to argue that knowledge of what happened in 1986 and 1987 guided our specification. However, the specification still has an advantage over the others. Should changes in tax rates be announced in the future, our equation can estimate the effect on realizations, even if that estimate is imprecise. And the new observation will allow the coefficient estimate to be refined. In contrast, equations that use dummies for 1986 and 1987 cannot generate such estimates, and the SPUD equation will have trouble estimating the effect of the tax change in the year following the change.
21
VI. REAL-TIME EVALUATION AND COMPARISONS Out-of-sample comparisons are better than in-sample comparisons at indicating how analysts would have fared using alternative models. The advantage of out-of-sample comparisons is that they limit the degree to which unknown future developments can affect the realization estimates of alternative models. But out-of-sample comparisons do not go far enough. In this section, we examine how analysts would have fared in estimating realizations using different models if they had only the information available at the date the estimates were made. Such real-time comparisons differ from out-of-sample comparisons in two fundamental ways. First, out-of-sample model estimation is based on the data set as it currently exists. However, that data set has been revised several times and can differ greatly from the set that analysts had when they made their estimates of realizations. In real time, by contrast, the models are estimated as of November 30 of each year based on the data the modeler would have had at that time. Second, the realizations estimates from out-of-sample model estimation assume that the lagged dependent variable and the right-hand-side variables are known with certainty. But, of course, that is not actually the case. Realizations in the previous year have not been fully tabulated by the Internal Revenue Service as of November 30, so that figure must be extrapolated. Meanwhile, right-hand-side variables are known for only part, if any, of the current year and must be projected through the end of the year. The main questions we posed in our real-time experiment were: How well would CBO have estimated the current-year growth of capital gains in the 1990s had it had access to the models we consider? Which of those models would have performed best? And would that model have improved on the estimates that CBO actually made? Thus, for the extrapolations of realizations in the previous year, we used the ones CBO actually made in each December from 1991 through 1998. For 1990, we extrapolated the way that CBO would have, assuming it used the same method it did in later years. For projections of right-hand-side variables for the year, we used those actually produced by CBO’s Macroeconomic Analysis Division (MAD) for the variables they project. For variables not projected by MAD, such as stock prices or value of volume, we used simple time-series methods of calculation. In particular, we assumed that each of those series follows a continuous-time random walk with drift. Thus, for each series, we took the value at the end of November and increased it by its historical average monthly gain (through that November) to get its December value. Because real-time considerations about the reliability of data and the predictability of right-hand-side variables could change the ranking of equations in
22
terms of estimating accuracy, we included four versions of our difference-inlogarithm specification: • • With and without multifamily housing starts, and With an S&P 500 variable or a dollar-volume variable.
For comparison, we also included four of CBO’s current forecasting equations, with and without starts and with and without error-correction terms. Background information for the real-time comparisons appears in Table 15. The first line shows the annual growth rate of realizations based on the data provided by the IRS for complete years. Although the actual figures computed by the IRS are available after a considerable delay (and would not be available in real time), they still are the ones that CBO is trying to estimate. Thus, the accuracy of realizations estimates is measured with respect to the figures in the first line. However, because of the delay, an actual figure for 1999 is not yet available, and the figure for 1998 could change slightly. The second line of Table 15 shows the baseline estimates of growth in realizations that CBO made in early December of each year. In 1991 through 1999, those estimates relied on input from the current CBO equations as well as on judgment. In 1990, the estimate was based on a different model. The next five lines show the estimates of the four current CBO equations and their average. Those estimates were made in 1991 through 1999 and provided the starting point for the forecast shown in the second line of the table. The average of those equations differs noticeably from the baseline estimate in 1991, 1994, 1997, and 1999, when additional considerations were incorporated. The next five lines show revised estimates from the current CBO equations, which we reestimated to make them more comparable with our new equations. The current equations’ sample period was extended from 1955 back to 1952, the same as for the new equations. Also, the value of corporate equities in the current year was projected using the growth rate of the S&P 500, as we projected that index for December in our new equations. Originally, corporate equities were updated using the New York Stock Exchange composite index without any drift added for the remainder of the year. The last five lines of Table 15 parallel the previous five lines, but the estimates are from our new equations. They predict much stronger growth in realizations in 1999 than the current equations or CBO’s baseline estimate do. Only time will tell which estimate is closer to the actual outcome. 23
TABLE 15: ACTUAL AND ESTIMATED GROWTH RATES OF CAPITAL GAINS REALIZATIONS
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
Actual CBO Baseline Current CBO Equations a Dependent Variable = DLOG(GAINS) No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Average Reestimated CBO Equations Dependent Variable = DLOG(GAINS) No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Average
-0.196 0.117
-0.098 0.100
0.135 0.102
0.202 0.061
0.003 0.034
0.179 0.150
0.447 0.089
0.399 0.464
0.234 0.130
n.a. 0.136
n.a. n.a. n.a. n.a. n.a.
0.052 -0.053 0.052 -0.029 0.005
0.111 0.056 0.093 0.102 0.090
0.061 0.036 0.093 0.077 0.067
0.041 0.131 0.055 0.152 0.095
0.140 0.161 0.170 0.175 0.162
0.105 0.103 0.058 0.073 0.085
0.534 0.464 0.301 0.294 0.394
0.201 0.182 0.068 0.073 0.129
0.152 0.135 0.028 0.028 0.084
-0.085 -0.136 -0.136 -0.152 -0.127
-0.019 -0.127 -0.047 -0.115 -0.077
0.083 0.004 0.019 0.032 0.035
0.055 0.022 0.058 0.041 0.044
0.058 0.141 0.046 0.142 0.097
0.188 0.203 0.183 0.190 0.191
0.144 0.141 0.097 0.112 0.124
0.460 0.404 0.308 0.293 0.366
0.284 0.255 0.132 0.124 0.199
0.121 0.099 0.026 0.024 0.067
New Equations Dependent Variable = DLOG(GAINS/GDPFE) S&P 500, no starts -0.114 S&P 500, starts -0.140 Dollar volume, no starts -0.081 Dollar volume, starts -0.114 Average -0.112
0.046 -0.052 0.000 -0.091 -0.024
0.073 0.076 0.081 0.082 0.078
0.102 0.073 0.200 0.155 0.133
0.054 0.148 0.090 0.183 0.119
0.245 0.243 0.186 0.190 0.216
0.175 0.180 0.148 0.156 0.165
0.507 0.472 0.380 0.362 0.430
0.252 0.239 0.213 0.204 0.227
0.197 0.168 0.269 0.229 0.215
SOURCE: CBO calculations. NOTE: GDPFE = potential nominal GDP; n.a. = not applicable. a. The current CBO equations were first used in 1991.
Table 16 compares the degree to which the estimates shown in Table 15 differ from the actual growth rates. That comparison of errors yields three major conclusions: • • • The new equations perform best overall; The average of the four new equations slightly outperforms any one of the new equations individually; and The reestimated CBO equations do better than the current CBO ones, which in turn do better than CBO’s baseline estimates.9
Compared with the reestimated equations, the average improvement in accuracy from the new equations is on the order of 15 percent. The improvement is not uniform, however, since the current equation without multifamily housing starts or error correction does only slightly worse than some of the new equations. For all of the specifications, multifamily housing starts help only in our new equation with the S&P 500. The average estimate of our new equations improves on the baseline estimates that CBO made in the 1991-1998 period (when CBO was using the current equations) by reducing the root mean squared error from .160 to .117, roughly a 27 percent improvement. Most of that improvement comes from reducing the root mean squared error of the current-equation estimates from .151 to .117, roughly a 23 percent improvement. The average estimate from the four new equations does better than any of the new equations individually. The equation with dollar volume but not starts does the best of the four, particularly from 1991 through 1998. The equation with the S&P 500 and starts, chosen as our preferred equation on the basis of its in-sample and outof-sample performance, comes in second place. The superiority of the dollar-volume equations in the real-time comparisons is consistent with their superiority for the 1990s in both in-sample and out-of-sample comparisons. However, those equations did not dominate out-of-sample in the period since the 1960s and did particularly poorly in the 1980s. Moreover, the fact that the average did the best suggests that all
9. The relative ranking of the equations remains unchanged when their target is growth from the preliminary level of gains in the previous year to the actual level of gains reached in the current year. Recall that prioryear realizations are based on incomplete tax return information and therefore contain errors. Thus, even if an equation accurately predicts the growth rate in gains as shown by final tax return data, it can still miss the level of gains reached in the current year because its growth is from the wrong base. An alternative target for comparing the equations is how well they grow from the estimated prior-year base to the correct current-year gain. When the equations are judged by that alternative standard, their root mean squared errors change slightly, but the rankings do not change. The root mean squared error for the average of the new equations falls from .114 to .113, and that for the reestimated CBO equations rises from .132 to .136. The changes in root mean squared errors are small because the errors in the prior-year level of gains are small in most years. The relative rankings are also preserved because the errors in prior-year gains appear to be unrelated to errors that the equations make in predicting current-year growth.
24
TABLE 16: ERRORS IN ESTIMATES OF GROWTH RATES OF CAPITAL GAINS REALIZATIONS
1990
1991
1992
1993
1994
1995
1996
1997
1998
Root Mean Squared Error 1990-1998 1991-1998
CBO Baseline Current CBO Equations a Dependent Variable = DLOG(GAINS) No error correction, no starts No error correction, starts Error correction, no starts Error correction, starts Average
-0.314
-0.198
0.033
0.141
-0.031
0.029
0.358
-0.064
0.104
0.184
0.160
n.a. n.a. n.a. n.a. n.a.
-0.150 -0.046 -0.150 -0.069 -0.104
0.024 0.080 0.043 0.033 0.045
0.141 0.165 0.109 0.125 0.135
-0.038 -0.128 -0.052 -0.149 -0.092
0.039 0.019 0.009 0.005 0.018
0.342 0.344 0.389 0.374 0.363
-0.135 -0.064 0.099 0.105 0.005
0.032 0.051 0.166 0.161 0.104
n.a. n.a. n.a. n.a. n.a.
0.151 0.149 0.169 0.166 0.151
Reestimated CBO Equations Dependent Variable = DLOG(GAINS) No error correction, no starts -0.111 No error correction, starts -0.060 Error correction, no starts -0.060 Error correction, starts -0.044 Average -0.069 New Equations Dependent Variable = DLOG(GAINS/GDPFE) S&P 500, no starts -0.083 S&P 500, starts -0.057 Dollar volume, no starts -0.115 Dollar volume, starts -0.082 Average -0.084
-0.080 0.029 -0.052 0.017 -0.021
0.052 0.131 0.116 0.103 0.101
0.147 0.179 0.144 0.161 0.158
-0.055 -0.138 -0.043 -0.139 -0.094
-0.009 -0.024 -0.003 -0.011 -0.012
0.303 0.306 0.350 0.336 0.324
-0.061 -0.004 0.091 0.107 0.033
-0.050 -0.022 0.102 0.109 0.035
0.127 0.136 0.143 0.147 0.132
0.128 0.143 0.150 0.155 0.138
-0.144 -0.046 -0.099 -0.007 -0.074
0.062 0.059 0.054 0.053 0.057
0.100 0.128 0.002 0.047 0.069
-0.051 -0.145 -0.087 -0.180 -0.116
-0.066 -0.064 -0.006 -0.011 -0.037
0.273 0.267 0.299 0.291 0.283
-0.107 -0.072 0.019 0.038 -0.031
-0.018 -0.006 0.020 0.029 0.006
0.122 0.119 0.117 0.121 0.114
0.126 0.124 0.118 0.125 0.117
SOURCE: CBO calculations. NOTE: GDPFE = potential nominal GDP; n.a. = not applicable. a. The current CBO equations were first used in 1991.
of the new equations should be carried into the future until finer discriminations among them are possible. Finally, the reestimated CBO equations do better than the original baseline for two reasons. First, the reestimated equations are more accurate than the ones estimated at the time (the RMSE of the average falls from .151 to .138), apparently because of the longer sample period and alternative projection of current-year equity values. Second, adjustments made to the current-equation estimates to reflect additional considerations worsened the estimates (those adjustments raised the RMSE from .151 to .160). Adjustments made in 1991 and 1997 increased the error, and one made in 1994 reduced it, but not by enough to offset the other two. It is too early to know how the adjustment in 1999 affected accuracy.
25
doc_821611571.pdf