Spuddy
-Interested User-
Posts: 4
Joined: Aug 1, 2004
|
Posted: Jun 13, 2005 02:06 PM
Msg. 1 of 4
Hi, I've been developing my own T/A software using DTNs feeds for nearly 10 months now and right from day 1, I've been concerned about the accuracy of the historical data. Surpose I connect to DTNs servers and read as much historical data as possible (120 calender days) using say "HM,QQQQ,120,1;", parse the incoming data and plot a OHLC bar chart of the minute data, I'll notice minute bars that are very obviously wrong - they'll have a very long spike top or tail on the minute bar indicating a bad tick has sneaked into it's calculation. I notice these spikes on pretty much every stock and with a frequency of bad data maybe 1 in 5 days (very rough guess). Needless to say this makes the historical data very ugly without any filtering. Let me give you an example.
Today is Monday 06/13/05 and I just retrieved the last 3 trading days worth of SPY minute data using the "HM,SPY,5,1" request (5 calender days). After plotting a chart of the minute data for these last 3 days, 2 huge spikes and 1 possible dodgy looking spike immediately came to my attention. On closer examination of the data returned from DTN, it's obvious that the following Lows that I've marked >>Low<< are badly wrong:
2005-6-13 9:32:00, 120.000000, 119.910000, 119.960000, 119.980000 2005-6-13 9:33:00, 120.090000, 119.950000, 120.000000, 120.060000 2005-6-13 9:34:00, 120.140000,>>112.040000<<,120.040000, 120.130000 2005-6-13 9:35:00, 120.140000, 120.080000, 120.100000, 120.110000 2005-6-13 9:36:00, 120.130000, 120.090000, 120.110000, 120.090000
2005-6-13 11:08:00, 121.020000, 120.840000, 120.850000, 120.980000 2005-6-13 11:09:00, 121.080000, 120.910000, 121.000000, 121.000000 2005-6-13 11:10:00, 121.050000,>>120.050000,<<121.000000, 121.010000 2005-6-13 11:11:00, 121.050000, 120.970000, 121.010000, 120.970000 2005-6-13 11:12:00, 120.990000, 120.940000, 120.980000, 120.950000
2005-6-9 12:24:00, 120.270000, 120.230000, 120.260000, 120.270000 2005-6-9 12:25:00, 120.330000, 120.250000, 120.260000, 120.320000 2005-6-9 12:26:00, 120.450000,>>119.340000,<<120.330000, 120.410000 2005-6-9 12:27:00, 120.540000, 120.420000, 120.420000, 120.520000 2005-6-9 12:28:00, 120.570000, 120.500000, 120.500000, 120.560000
I have written very simple filtering algorithms to scan through the minute data I receive and chop the tops and tails off the bars that are obviously wrong, but this leaves me with an inaccurate minute bar in my data as I don't know what the actual High and Low was and also, sometimes it's not quite so obvious that a minute bar has bad data - I would have to delve into the actual tick data to filter out each bad tick in order to obtain an accurate result. Filtering bad ticks isn't that difficult actually and that's what I've been doing recently. I walk through all the ticks at the end of the day and look for a tick with a price that pops up or down outside a threshold percentage I set, and then pops back on the next tick, suggesting that it was bad - or ticks that are wildly outside the bid and ask are also suspects. So it's not that difficult to write a tick filtering algorithm. I added a fairly effective one to my software in less than a day. However it takes quite a long time for my software to retrieve every tick for a particular stock at the end of each day. I'm looking at maybe 20 to 60 seconds per stock per day of tick data. This might not seem like long, but I want to retrieve accurate minute data for a thousand or so stocks at the end of each day so I can scan for setups and retrieving the tick data for a thousand stocks, just so I can filter out the bad ticks to obtain accurate minute data is not really an option. Also, if I go on holiday for over a week and can't run my software then my data will be at the mercy of your minute database as you only let us retrieve up to 6 trading days of ticks. I'm also suspect you'd rather we didn't hog your bandwidth downloading tons of ticks, so .... is their ANY chance that when DTN calculates the intraday and end of day minute data for each stock, that it can apply some simple bad tick filtering ? It shouldn't take a competent programmer very long to write some code that filters out bad ticks and also cleans up you archives of minute data. I don't like retrieving historical data that is very obviously bad a lot of the time. The accuracy of the minute data also makes me wonder about the accuracy of other time frames such as 60 minute, daily etc if they're calculated from the minute data. Makes your historical data much less useful if it's bad. I did post a message about this about 6 months ago but nothing productive appears to have come of it. Thanks !
|
ER2
-Interested User-
Posts: 4
Joined: Jul 24, 2005
|
Posted: Jul 24, 2005 09:35 AM
Msg. 4 of 4
Its been a long time coming, be we are wrapping up development on new servers which will process tick corrections from the exchange. In addition, our market data group is implementing their own tick scrubbing algorithms which should greatly enhance the cleanliness of our data. Once development is complete, it will go to QA for testing. Once all is tested OK, it will be released live and you should notice better minute data at that time.
-------------------------------------------------------------------------------- Jay Froscheiser DTN - Trading Markets
Greetings Jay ...
I'm not a developer, but if I may put in my nickel's worth.
I'm a new customer of IQ Feed, and I notice as well, the "minute" chart accuracy is a bit below par. After updating the 1 second (tick data), the data was quite accurate.
Jay, I'm sure you're privy on how us chartists long for accurate data, and I wish you godspeed in the endeavor.
Thanks and Regards ~
If you can't be a good example, you can only be a horrible warning.
|