Stock Market Prediction Using Online Data:Fundamental and Technical ApproachesNikhil BakshiMaster’s Thesis in Computer ScienceETH Zurich, August 2008
3. Evaluating and visualizing trading strategiesThe system should evaluate and visualize the financial performance ofthe simulated strategies. This all
1.5 System OverviewThe system consists of three main components: a crawler, a simulation serverand a client interface. Figure 1.2 visualizes the syste
trading strategies when instructed by the client. The server’s logic iswritten in Java and is equipped with unit tests.3. ClientThe client offers users
Chapter 2The Crawler2.1 Data SourcesIn an initial phase, a large number of websites were studied and the onesmost suitable for the project were identi
Marketwatch• Regeneron reports favorable data from obesity trial [9:47am 05/19/03]• Incyte to cut 57% of jobs, close Calif. facility [4:24pm 02/02/04]
2.1.3 Yahoo Finance Historical PricesAfter analyzing OpenTick[1] and Yahoo Finance, Yahoo’s historical stockquotes were selected. They consist of dail
2.2.1 Preprocessing the NewsThe goal of the new preprocessing phase is to parse headlines and their exacttimestamps from the raw HTML. Below are some
2.2.2 Preprocessing the Analyst RecommendationsTwo issues surfaced while parsing analyst recommendations.Parsing research firmsSeveral notations were b
2.4 Data StatisticsA total of 381’479 historical quotes, 4’222 analyst recommendations, 31’651Marketwatch and 13’907 Reuters news articles were collec
Java Package Descriptionserver.data.crawler The implementation of the crawlerserver.data.crawler.analyst The analyst recommendations subcrawler, in-cl
AcknowledgementI would like to thank my advisor Prof. Gaston Gonnet and my mentor Prof.Friedemann Mattern for the opportunity to work on this topic an
Chapter 3The Simulation Server3.1 IntroductionThe simulation server’s job is to simulate trading strategies on the datacollected by the crawler. Table
3.1.2 Step 2: Computing Trading SignalsThe fundamental and technical signals for evaluating companies are describedin sections 3.2 and 3.3. A company
3.2 Fundamental Trading Signals3.2.1 NewsInitially, the Text Mining Handbook[4], the crawled news articles and existingresearch papers in the area of
hope that the good news will be followed by a continued positive price trend.Accordingly, 0.0 can be regarded as a signal for short selling due to bad
Figure 3.1: Share price and analyst sentiment (Cephalon Inc.)3.3 Technical Trading SignalsThe book ’New Trading Systems and Methods’[5] covers technic
3.3.1 Moving AverageA moving average is a simple technique to suggest buying and selling pointson a stock price chart. For this purpose, the average s
3.3.2 Bollinger BandsBollinger Bands are volatility-based upper and lower bands around the Mov-ing Average. Buy and sell signals are only triggered wh
3.4 Combining Trading SignalsA trading strategy can use one or more of the signals specified in sections3.2 and 3.3. When using more than one signal, a
Figure 3.4: The neural network setupInputs• Moving AverageThe Moving Average signal can be expressed in a price-independentway by computingpricemoving
Hidden LayerOne hidden layer is used with a configurable number of neurons. A regularsigmoid function is used as a transfer function.OutputThe output i
Contents1 Overview 71.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Basics . . . . . . . . . . . . . . . . . . . . . . .
3.5 ArchitectureThe simulation server underwent several iterations during the course of thethesis. The final architecture was designed with two primary
3.5.1 TimeLineFigure 3.5 shows the components of the timeline. It consists of a TimeSeriesper company containing TimePoints. Each TimePoint holds the
3.6 Unit TestingWhen working on a large project, small bugs can creep in and easily gounnoticed for some time (e.g. array indices off by one). Particul
Chapter 4The Client4.1 ArchitectureThe client lets users specify and simulate trading strategies. It is a browser-based interface built using the open
implement the AsyncCallback interface. All data transferred between theclient and the simulation server must be serializable and must implement theIsS
4.2.1 General SettingsClicking the tab ’Simulation Setup’ brings up some general settings (see figure4.3). This includes the simulation time period (st
Figure 4.4 shows the available options for a strategy. Each strategy can begiven a name (1) and a list of trading signals (2). Each selected signal’sp
Figure 4.6: Portfolio value chartStrategy DetailsFor each of the strategies, following details are listed (see figure 4.7 for anexample).(1) Annual ROI
Figure 4.7: Performance statistics for each strategy4.3 Source Code OrganizationBelow is a list of the client’s source code packages.Java Package Desc
Chapter 5Experimental Results5.1 Experiment DesignThe dataset was split into two disjoint sets and used for two experimentphases I and II. The followi
3.1.5 Step 5: Iteration . . . . . . . . . . . . . . . . . . . . . 203.2 Fundamental Trading Signals . . . . . . . . . . . . . . . . . . 213.2.1 News .
5.2 Phase I Results5.2.1 Moving Average and Bollinger BandsThis simulation compares the Moving Average and Bollinger Bands signalswith different window
ObservationSeveral observations can be made. Firstly, using Bollinger Bands reducesthe number of signals compared to the Moving Average. This is reflec
Figure 5.2: Simulation of MACD, RSI and Stochastic5.2.3 Analyst SentimentThis simulation tests analyst sentiment signals. Different values for the min-
Figure 5.3: Simulation of the analyst sentiment signalObservationBased on figure 5.3, one can observe that the strategies tend to follow thegeneral tre
Strategies and ResultsStrategy Annual ROI # Positions Average DurationBuy And Hold Index -2.93% - -5% News +40.86% 804 119 days7% News +49.71% 551 168
5.2.5 Simple CombinationsThese simulations test strategies that combine a technical and a fundamentalsignal. The resulting annual ROI values are liste
5.3 Phase II ResultsBased on the annual ROI values of the simulations in phase I, the followingsignals were selected for phase II.• 63-Day Bollinger B
RSI is a low performance strategy in both phases. Figure 5.5 visualizes theportfolio value over time for the strategies during phase II.Figure 5.5: Si
Common SetupInitial cash Cash to invest Maximum per trade Stop loss10’000 50% 500 -Strategies and ResultsStrategy Annual ROI duringphase IAnnual ROI d
Chapter 6ConclusionThe simulation results and observations in chapter 5 can be summarizedas follows. Note that these observations are restricted to th
6 Conclusion 48A The Nasdaq Biotech Index 50B Recommendation Phrases 53C Database Schema 54D Research Papers using News-Based Prediction 56E Technical
• The combined technical and fundamental strategies that were simulateddid not consistently show better results than using individual signalsseparatel
Appendix AThe Nasdaq Biotech IndexThe following 152 companies make up the Nasdaq Biotech Index as of July2008 [Source: Yahoo Finance].Symbol Company N
Symbol Company Name Symbol Company NameDRRX Durect Corp. DSCO Discovery Laboratories Inc.DVAX Dynavax Technologies Corp. DYAX Dyax Corp.ENDP Endo Phar
Symbol Company Name Symbol Company NameTHRX Theravance Inc. TRCA Tercica Inc.TRMS Trimeris Inc. TWTI Third Wave Technologies Inc.UTHR United Therapeut
Appendix BRecommendation PhrasesThe following 96 phrases were found in the analyst recommendations dataset.BuyAbove Average, Accumulate, Add, Attracti
Appendix CDatabase SchemaThe diagrams below are simple representations of the database schema.marketwatch news pagesid symbol url htmlbigint(20) varch
yahoo analyst recommendationsid symbol date researchfirm action fromOpinion ...bigint(20) varchar(8) date varchar(128) varchar(128) varchar(128) ...
Appendix DResearch Papers usingNews-Based PredictionBelow is a list of the relevant research papers that were studied; they aresorted chronologically
but the performance is not clearly documented. Two other research papersdescribing SVM-based approaches are [19] and [20].2004. Forecasting Intraday S
Appendix ETechnical Trading SignalsIn addition to the Moving Average and Bollinger Bands covered in section 3.3,here are summaries of the other three
List of Figures1.1 Nasdaq Biotech Index (2002 - 2008) . . . . . . . . . . . . . . . 91.2 System architecture . . . . . . . . . . . . . . . . . . . . .
signal =1.0 RSI(n) < 0.30.0 RSI(n) > 0.70.5 elsewhere RSI(n) =RS(n)1 + RS(n)and RS(n) =total upward price movements in the last n daystotal
References[1] OpenTick is a project that offers free historical stock market data.http://www.opentick.com[2] The Apache httpclient library is an open s
[8] Neural Networks for Technical Analysis: A Study on KLCIYao, J.; Tan, C.; Poh, H.-L.International Journal of Theoretical and Applied Finance, 1999[
[17] Language Models for Financial News RecommendationLavrenko, V.; Schmill, M.; Lawrie, D.; Ogilvie, P.; Jensen, D.; Al-lan, J.Ninth International Co
Project Source Code (CD)• /src contains the source code• /test contains the test cases• /doc contains the documentation• /lib contains the libraries63
List of Tables2.1 Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Sample analyst recommendations for Amgen . . . . . . . .
Chapter 1Overview1.1 IntroductionFrom mainstream books offering investing advice to research papers analyz-ing mathematical prediction models, the stoc
1.2 BasicsIn order to clarify the goal of the thesis, two dominant schools of thought oninvesting must first be introduced.Fundamental analysisThis app
Kommentare zu diesen Handbüchern