Stock Price Prediction Report

Description
The objective of the research object is: “To predict the share prices/movement of the company based on various internal and external factors”

Stock Price Prediction
Data Mining for Business Intelligence

Table of Contents
Executive Summary:.................................................................................................................................. 3 Research Objectives: ................................................................................................................................. 3 Scope of the Project: ................................................................................................................................. 3 Methodology:............................................................................................................................................ 4 The Microsoft Decision Tree Viewer: ........................................................................................................ 5 Decision Tree: ........................................................................................................................................... 5 Predicting Discrete Attributes:.................................................................................................................. 5 Predicting Continuous Attributes: ............................................................................................................ 6 Infosys Decision Tree: ............................................................................................................................... 7 Patni Decision Tree: .................................................................................................................................. 8 TCS Decision Tree:..................................................................................................................................... 9 Dependency Network: ............................................................................................................................ 10 Mining Legend: ....................................................................................................................................... 10 Infosys Dependency Network: ................................................................................................................ 11 Patni Dependency Network: ................................................................................................................... 12 TCS Dependency Network: ..................................................................................................................... 13 Linear Regression Model: ....................................................................................................................... 14 Linear Regression on Infosys data: ......................................................................................................... 14 Linear Regression on Patni data: ............................................................................................................ 17 Linear Regression on TCS data: ............................................................................................................... 20 Conclusion: .............................................................................................................................................. 23 Future Scope: .......................................................................................................................................... 23

2|Page

Executive Summary:
Predicting Share Prices of various companies have always been of great importance to the investors. Many people had tried to predict the movement of share prices and beat the market but no one can really accurately predict the movement of a particular share prices for company listed in the stock exchange. There has been attempt from IT professionals to exploit the stock price prediction area through Data Mining and Business Intelligence. The most basic factors that influence price of share are demand and supply factors. If most people start buying then prices move up and if people start selling prices go down. Government policies, company’s and industry’s performance and potentials have effects on demand behaviour of investors, both in the primary and secondary markets. The factors affecting the price of an equity share can be viewed from the macro and micro economic perspectives. This project is an effort to predict the share prices or movement of the share prices of companies of same sector (Information Technology) and based on that develops a model. For the current project we are predicting the share prices of three major companies in the Information Technology sectors i.e., Infosys Technologies, Tata Consultancy Services and Patni Computers Systems Limited. The variables used for predicting the share prices are earning per Share (EPS), Gross Domestic Product (GDP) of India, Interest Rates and Forex Reserve. The data was collected with the help of PROWESS and Indiastat.com website.

Research Objectives:
The objective of the research object is: “To predict the share prices/movement of the company based on various internal and external factors”

Scope of the Project:
1. The model is limited to the IT Industry; however it can be easily expanded for other Industries/Sectors. 2. The analysis is restricted to descriptive analysis.

3|Page

Methodology:
The model was developed based on the research paper “Determinants of Equity Prices in the Stock Prices” published in International Research Journal of Finance and Economics. A model defined by Al-Tamimi was used to regress the variables (stock prices, earnings per share, gross domestic product, lending interest rate and foreign exchange rate) after testing for multicollinarity among the independent variables. SP = f (EPS, GDP, INT, FX) Where, SP - Share Price, EPS - Earning Per Share, GDP - Gross Domestic Product INT- Interest rate FX – Forex Reserve Two methods have been used to predict the share prices/performance 1. Decision Tree 2. Linear Regression Model The data is stored in the excel files which is uploaded to Microsoft SQL Server. Microsoft Visual Studio and SPSS are used for analyzing the data.

4|Page

The Microsoft Decision Tree Viewer:
The Microsoft Tree Viewer in Microsoft SQL Server Analysis Services displays decision trees that are built with the Microsoft Decision Trees algorithm. The Microsoft Decision Trees algorithm is a hybrid decision tree algorithm that supports both classification and regression. The Microsoft Decision Trees algorithm is used for predictive modeling of both discrete and continuous attributes. When you browse a mining model in Analysis Services, the model is displayed on the Mining Model Viewer tab of Data Mining Designer in the appropriate viewer for the model. The Microsoft Tree Viewer includes the following tabs and panes:
? ? ?

Decision Tree Dependency Network Mining Legend

Decision Tree:
When you build a decision tree model, Analysis Services builds a separate tree for each predictable attribute. You can view an individual tree by selecting it from the Tree list on the Decision Tree tab of the viewer. A decision tree is composed of a series of splits, with the most important split, as determined by the algorithm, at the left of the viewer in the All node. Additional splits occur to the right. The split in the All node is most important because it contains the strongest split-causing conditional in the dataset, and therefore it caused the first split.

Predicting Discrete Attributes:
When a tree is built with a discrete predictable attribute, the viewer displays the following on each node in the tree:
? ?

The condition that caused the split. A histogram that represents the distribution of the states of the predictable attribute, ordered by popularity.

5|Page

The Histogram option can be used to change the number of states that appear in the histograms in the tree. This is useful if the predictable attribute has many states. The states appear in a histogram in order of popularity from left to right; if the number of states that you choose to display is fewer than the total number of states in the attribute, the least popular states are displayed collectively in gray. The background color of each node represents the concentration of cases of the particular attribute state that you select by using the Background option. You can use this option to highlight nodes that contain a particular target in which you are interested.

Predicting Continuous Attributes:
When a tree is built with a continuous predictable attribute, the viewer displays a diamond chart, instead of a histogram, for each node in the tree. The diamond chart has a line that represents the range of the attribute. The diamond is located at the mean for the node, and the width of the diamond represents the variance of the attribute at that node. A thinner diamond indicates that the node can create a more accurate prediction. The viewer also displays the regression equation, which is used to determine the split in the node.

6|Page

Infosys Decision Tree:

The above figure shows the decision tree for the share price of Infosys Technologies. As explained above, decision tree splits as we move from left to right. In case of Infosys the first split is on the basis of GDP as it is the strongest link. The tree is further split on the basis of Forex and EPS. In the above figure we have shown three level expansion of the decision tree. The stock prices can be ascertained on the basis of equation of each split.

7|Page

Patni Decision Tree:

The above figure shows the decision tree for the share price of Patni Systems. As explained above, decision tree splits as we move from left to right. In case of Patni the first split is on the basis of GDP as it is the strongest link. The tree is further split on the basis of Forex and EPS. The third level split is on the basis of EPS and Interest Rate. In the above figure we have shown three level expansion of the decision tree. The stock prices can be ascertained on the basis of equation of each split.

8|Page

TCS Decision Tree:

The above figure shows the decision tree for the share price of TCS. In case of TCS the first split is on the basis of EPS as it is the strongest link. The tree is further split on the basis of Forex and Interest Rate. In the above figure we have shown two level expansion of the decision tree. The stock prices can be ascertained on the basis of equation of each split.

9|Page

Dependency Network:
The Dependency Network displays the dependencies between the input attributes and the predictable attributes in the model. The slider at the left of the viewer acts as a filter that is tied to the strengths of the dependencies. If you lower the slider, only the strongest links are shown in the viewer.

Mining Legend:
The Mining Legend displays the following information when you select a node in the decision tree model:
?

The number of cases in the node, broken down by the states of the predictable attribute.

? ? ? ?

The probability of each case of the predictable attribute for the node. A histogram that includes a count for each state of the predictable attribute. The conditions that are required to reach a specific node, also known as the node path. For linear regression models, the regression formula.

10 | P a g e

Infosys Dependency Network:

For Infosys, the strongest dependency of Stock price is on the independent variable GDP followed by EPS and Forex. The weakest dependency is on Interest Rate. This says that stock prices are strongly influenced by the changes mainly in GDP. EPS and Forex are other strong factors influencing Stock prices of Infosys. Stock prices are little influenced by change in Interest rate.

11 | P a g e

Patni Dependency Network:

For Patni as is the case with Infosys, the strongest dependency of Stock price is on the independent variable GDP followed by EPS and Forex. The weakest dependency is on Interest Rate. This says that stock prices are strongly influenced by the changes mainly in GDP. EPS and Forex are other strong factors influencing Stock prices of Patni. Stock prices are little influenced by change in Interest rate.

12 | P a g e

TCS Dependency Network:

For TCS, the strongest dependency of Stock price is on the independent variable EPS followed by Forex and Interest rate. Interesting point to note here is that Stock price of TCS is not dependent on GDP as it doesn’t have any link with the dependent variable i.e. Stock price in the dependency network. Hence stock prices of TCS only respond to changes in EPS, Forex and Interest Rate.

13 | P a g e

Linear Regression Model:
Linear regression is an approach to modeling the relationship between one or more variables denoted y and one or more variables denoted X, such that the model depends linearly on the unknown parameters to be estimated from the data. Such a model is called a "linear model." Most commonly, linear regression refers to a model in which the conditional mean of y given the value of X is an affine function of X. In our case the dependent variable or y is Share Price and Independent Variables or x are GDP, Interest Rate, Forex Reserve and EPS. The linear regression equation is generated using the statistical software SPSS.

Linear Regression on Infosys data:

Correlations Stock Price Pearson Correlation Stock Price EPS Forex Interest Rate GDP Sig. (1-tailed) Stock Price EPS Forex Interest Rate GDP N
14 | P a g e

EPS .895 1.000 -.125 -.129 -.256 .000 . .003 .002 .000 501

Forex .061 -.125 1.000 .302 .464 .086 .003 . .000 .000 501

Interest Rate .112 -.129 .302 1.000 .355 .006 .002 .000 . .000 501

GDP -.109 -.256 .464 .355 1.000 .007 .000 .000 .000 . 501

1.000 .895 .061 .112 -.109 . .000 .086 .006 .007 501

Stock Price

EPS Forex Interest Rate GDP

501 501 501 501

501 501 501 501

501 501 501 501

501 501 501 501

501 501 501 501

The correlation between Stock price and EPS is found to be strong with a Pearson Correlation coefficient of .895. The correlation of Stock Price with Interest rate and GDP is not strong and with Forex is very weak.

Model Summaryb Adjusted R Model 1 R .931a R Square .866 Square .865 Std. Error of the Estimate 160.96886 DurbinWatson .120

a. Predictors: (Constant), GDP, EPS, Interest Rate, Forex b. Dependent Variable: Stock Price

The Value of R square is .866. This means 86.6% of the variance in Stock Prices is explained by the Independent variables in the regression equation.

ANOVAb Sum of Model 1 Regression Residual Total Squares 8.320E7 1.285E7 9.605E7 df 4 496 500 Mean Square F Sig. .000a

2.080E7 802.705 25910.973

a. Predictors: (Constant), GDP, EPS, Interest Rate, Forex

15 | P a g e

ANOVAb Sum of Model 1 Regression Residual Total Squares 8.320E7 1.285E7 9.605E7 df 4 496 500 Mean Square F Sig. .000a

2.080E7 802.705 25910.973

b. Dependent Variable: Stock Price

The last column indicates the p-level to be .000. This indicates that the model is statistically significant at a confidence level of (1-0.000)*100 or 100. The p-level indicates the significance of the F value. Coefficientsa Unstandardized Coefficients Model 1 (Constant) EPS Forex Interest Rate GDP B -2470.962 30.689 49.391 77.418 4.488 Std. Error 331.225 .557 8.052 7.055 11.171 .936 .115 .196 .008 Standardized Coefficients Beta T -7.460 55.062 6.134 10.974 .402 Sig. .000 .000 .000 .000 .688

a. Dependent Variable: Stock Price

Stock Price = f (EPS,Forex,Interest rate, GDP) The equation of Stock Price can be written as: Stock price = -2470.96 + 30.69 (EPS) + 49.39 (Forex) + 77.41 (Interest Rate) + 4.48 (GDP)

16 | P a g e

Linear Regression on Patni data:
Correlations Stock Price Pearson Correlation Stock Price EPS Forex Interest Rate GDP Sig. (1-tailed) Stock Price EPS Forex Interest Rate GDP N Stock Price EPS Forex Interest Rate GDP 1.000 .071 -.051 .317 -.339 . .057 .129 .000 .000 501 501 501 501 501 EPS .071 1.000 -.757 -.578 -.751 .057 . .000 .000 .000 501 501 501 501 501 Forex -.051 -.757 1.000 .302 .464 .129 .000 . .000 .000 501 501 501 501 501 Interest Rate .317 -.578 .302 1.000 .355 .000 .000 .000 . .000 501 501 501 501 501 GDP -.339 -.751 .464 .355 1.000 .000 .000 .000 .000 . 501 501 501 501 501

The correlation of Stock Price with Interest Rate and GDP is not very strong with coefficient values at .317 and -.339 respectively. Correlation with EPS and Forex is very weak.

17 | P a g e

Model Summaryb Adjusted R Model 1 R .580a R Square .337 Square .332 Std. Error of the Estimate 44.1588 DurbinWatson .066

a. Predictors: (Constant), GDP, Interest Rate, Forex, EPS b. Dependent Variable: Stock Price

The R square value .337 which means only 33.7 % of the variance in Stock Prices is explained by the independent variables in the regression equation. ANOVAb Sum of Model 1 Regression Residual Total Squares 491521.927 967201.483 1458723.410 df 4 496 500 Mean Square 122880.482 1950.003 F 63.016 Sig. .000a

a. Predictors: (Constant), GDP, Interest Rate, Forex, EPS b. Dependent Variable: Stock Price

The last column has the p-value of 0.000 which means that the model is statistically significant at a confidence level of (1-0.000)*100 or 100.

Coefficientsa Unstandardized Coefficients Model B Std. Error Standardized Coefficients Beta T Sig.

18 | P a g e

1

(Constant) EPS Forex Interest Rate GDP

590.803 -1.067 .948 23.108 -39.805

181.565 1.461 3.191 2.316 4.112 -.068 .018 .474 -.567

3.254 -.730 .297 9.978 -9.680

.001 .466 .767 .000 .000

a. Dependent Variable: Stock Price

Stock Price = f (EPS, Forex, Interest Rate, GDP)

The equation of stock price can be written as Stock price = 590.80 – 1.07 (EPS) + 0.94 (Forex) + 23.11 (Interest Rate) – 39.81 (GDP)

19 | P a g e

Linear Regression on TCS data:
Correlations Stock Price Pearson Correlation Stock Price EPS Forex Interest Rate GDP Sig. (1-tailed) Stock Price EPS Forex Interest Rate GDP N Stock Price EPS Forex Interest Rate GDP 1.000 .890 .099 -.010 .016 . .000 .013 .415 .357 501 501 501 501 501 EPS .890 1.000 -.029 -.221 -.190 .000 . .261 .000 .000 501 501 501 501 501 Forex .099 -.029 1.000 .302 .464 .013 .261 . .000 .000 501 501 501 501 501 Interest Rate -.010 -.221 .302 1.000 .355 .415 .000 .000 . .000 501 501 501 501 501 GDP .016 -.190 .464 .355 1.000 .357 .000 .000 .000 . 501 501 501 501 501

The correlation of Stock Price with EPS is strong with coefficient value of 0.890 whereas with Forex, Interest rate and GDP is not found to be strong.

20 | P a g e

Model Summaryb Adjusted R Model 1 R .921a R Square .847 Square .846 Std. Error of the Estimate 112.6741 DurbinWatson .086

a. Predictors: (Constant), GDP, EPS, Interest Rate, Forex b. Dependent Variable: Stock Price

The R square value is .847 which means 84.7 % variation of Stock prices is explained by the independent variables in the regression equation. ANOVAb Sum of Model 1 Regression Residual Total Squares 3.495E7 6296946.762 4.124E7 df Mean Square F Sig. .000a

4 8736902.055 688.191 496 500 12695.457

a. Predictors: (Constant), GDP, EPS, Interest Rate, Forex b. Dependent Variable: Stock Price

The p-value is 0.000 which means that the model is statistically significant at a confidence level of (1-0.000)*100 or 100.

21 | P a g e

Coefficientsa Unstandardized Coefficients Model 1 (Constant) EPS Forex Interest Rate GDP B -936.437 32.304 5.511 37.991 50.508 Std. Error 228.491 .620 5.664 5.018 7.735 .949 .020 .147 .135 Standardized Coefficients Beta t -4.098 52.089 .973 7.571 6.530 Sig. .000 .000 .331 .000 .000

a. Dependent Variable: Stock Price

Stock Price = f (EPS, Forex, Interest Rate, GDP)

The equation of stock price can be written as

Stock price = -936.43 + 32.30 (EPS) + 5.51 (Forex) + 37.99 (Interest Rate) + 50.51 (GDP)

22 | P a g e

Conclusion:
From the project we could conclude that macroeconomic factors and the performance of the company can affect the share price of the company. We have used both the decision tree as well as linear regression for the analyzing the data. The model developed is specific to the IT Industry but it could be easily extended to other sectors taking relevant variables.

Future Scope:
1. The project could also be implemented using Artificial Neural network (ANN) 2. The project scope can be extended to different sectors.

23 | P a g e



doc_103127941.docx
 

Attachments

Back
Top