Skip to content

Instantly share code, notes, and snippets.

@chicago-joe
Created November 20, 2019 17:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chicago-joe/ed1b235ea53be801f473c04293219ebf to your computer and use it in GitHub Desktop.
Save chicago-joe/ed1b235ea53be801f473c04293219ebf to your computer and use it in GitHub Desktop.
C:\Users\jloss\venv\ITCH50parser\Scripts\python.exe C:\Users\jloss\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\192.7142.56\helpers\pydev\pydevconsole.py --mode=client --port=10634
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['C:\\Users\\jloss\\PyCharmProjects\\NASDAQ-ITCH-5.0-VWAP-PARSER', 'C:\\Users\\jloss\\PyCharmProjects\\NASDAQ-ITCH-5.0-VWAP-PARSER\\src', 'C:\\Users\\jloss\\PyCharmProjects\\NASDAQ-ITCH-5.0-VWAP-PARSER\\data', 'C:/Users/jloss/PyCharmProjects/NASDAQ-ITCH-5.0-VWAP-PARSER'])
PyDev console: starting.
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] on win32
>>> #!/usr/bin/env python
... # coding: utf-8
...
... # # Working with Order Book Data: NASDAQ ITCH
...
... # The primary source of market data is the order book, which is continuously updated in real-time throughout the day to reflect all trading activity. Exchanges typically offer this data as a real-time service and may provide some historical data for free.
... #
... # The trading activity is reflected in numerous messages about trade orders sent by market participants. These messages typically conform to the electronic Financial Information eXchange (FIX) communications protocol for real-time exchange of securities transactions and market data or a native exchange protocol.
...
... # ## Background
...
... # ### The FIX Protocol
...
... # Just like SWIFT is the message protocol for back-office (example, for trade-settlement) messaging, the [FIX protocol](https://www.fixtrading.org/standards/) is the de facto messaging standard for communication before and during, trade execution between exchanges, banks, brokers, clearing firms, and other market participants. Fidelity Investments and Salomon Brothers introduced FIX in 1992 to facilitate electronic communication between broker-dealers and institutional clients who by then exchanged information over the phone.
... #
... # It became popular in global equity markets before expanding into foreign exchange, fixed income and derivatives markets, and further into post-trade to support straight-through processing. Exchanges provide access to FIX messages as a real-time data feed that is parsed by algorithmic traders to track market activity and, for example, identify the footprint of market participants and anticipate their next move.
...
... # ### Nasdaq TotalView-ITCH Order Book data
...
... # While FIX has a dominant large market share, exchanges also offer native protocols. The Nasdaq offers a [TotalView ITCH direct data-feed protocol](http://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/NQTVITCHspecification.pdf) that allows subscribers to track
... # individual orders for equity instruments from placement to execution or cancellation.
... #
... # As a result, it allows for the reconstruction of the order book that keeps track of the list of active-limit buy and sell orders for a specific security or financial instrument. The order book reveals the market depth throughout the day by listing the number of shares being bid or offered at each price point. It may also identify the market participant responsible for specific buy and sell orders unless it is placed anonymously. Market depth is a key indicator of liquidity and the potential price impact of sizable market orders.
...
... # The ITCH v5.0 specification declares over 20 message types related to system events, stock characteristics, the placement and modification of limit orders, and trade execution. It also contains information about the net order imbalance before the open and closing cross.
...
... # ## Imports
...
... # In[1]:
...
...
... import gzip
... import shutil
... from pathlib import Path
... from urllib.request import urlretrieve
... from urllib.parse import urljoin
... from clint.textui import progress
... from datetime import datetime
... import pandas as pd
... import numpy as np
... import matplotlib.pyplot as plt
... from matplotlib.ticker import FuncFormatter
... from struct import unpack
... from collections import namedtuple, Counter
... from datetime import timedelta
... from time import time
...
...
... # ## Get NASDAQ ITCH Data from FTP Server
...
... # The Nasdaq offers [samples](ftp://emi.nasdaq.com/ITCH/) of daily binary files for several months.
... #
... # We are now going to illustrates how to parse a sample file of ITCH messages and reconstruct both the executed trades and the order book for any given tick.
...
... # The data is fairly large and running the entire example can take a lot of time and require substantial memory (16GB+). Also, the sample file used in this example may no longer be available because NASDAQ occasionaly updates the sample files.
...
... # The following table shows the frequency of the most common message types for the sample file date March 29, 2018:
...
... # | Name | Offset | Length | Value | Notes |
... # |-------------------------|---------|---------|------------|--------------------------------------------------------------------------------------|
... # | Message Type | 0 | 1 | S | System Event Message |
... # | Stock Locate | 1 | 2 | Integer | Always 0 |
... # | Tracking Number | 3 | 2 | Integer | Nasdaq internal tracking number |
... # | Timestamp | 5 | 6 | Integer | Nanoseconds since midnight |
... # | Order Reference Number | 11 | 8 | Integer | The unique reference number assigned to the new order at the time of receipt. |
... # | Buy/Sell Indicator | 19 | 1 | Alpha | The type of order being added. B = Buy Order. S = Sell Order. |
... # | Shares | 20 | 4 | Integer | The total number of shares associated with the order being added to the book. |
... # | Stock | 24 | 8 | Alpha | Stock symbol, right padded with spaces |
... # | Price | 32 | 4 | Price (4) | The display price of the new order. Refer to Data Types for field processing notes. |
... # | Attribution | 36 | 4 | Alpha | Nasdaq Market participant identifier associated with the entered order |
...
... # ### Set Data paths
...
... # We will store the download in a `data` subdirectory and convert the result to `hdf` format (discussed in the last section of chapter 2).
...
... # In[80]:
...
...
... data_path = Path('C://Users//jloss//PyCharmProjects//NASDAQ-ITCH-5.0-VWAP-PARSER//data')
... itch_store = str(data_path / 'itch.h5')
... order_book_store = data_path / 'order_book.h5'
...
...
... # The FTP address, filename and corresponding date used in this example:
...
... # This is already updated from the 2018 example used in the book:
...
... # In[22]:
...
...
... FTP_URL = 'ftp://emi.nasdaq.com/ITCH/Nasdaq_ITCH/'
... SOURCE_FILE = '01302019.NASDAQ_ITCH50.gz'
...
...
... # ### Download & unzip
...
... # In[25]:
...
...
... def may_be_download(url):
... """Download & unzip ITCH data if not yet available"""
... filename = data_path / url.split('/')[-1]
... if not data_path.exists():
... print('Creating directory')
... data_path.mkdir()
... if not filename.exists():
... print('Downloading...', url)
... urlretrieve(url, filename)
... unzipped = data_path / (filename.stem + '.bin')
... if not (data_path / unzipped).exists():
... print('Unzipping to', unzipped)
... with gzip.open(str(filename), 'rb') as f_in:
... with open(unzipped, 'wb') as f_out:
... shutil.copyfileobj(f_in, f_out)
... return unzipped
...
...
... # This will download 5.1GB data that unzips to 12.9GB.
...
... # In[26]:
...
...
... file_name = may_be_download(urljoin(FTP_URL, SOURCE_FILE))
... date = file_name.name.split('.')[0]
...
...
... # ## ITCH Format Settings
...
... # ### The `struct` module for binary data
...
... # The ITCH tick data comes in binary format. Python provides the `struct` module (see [docs])(https://docs.python.org/3/library/struct.html) to parse binary data using format strings that identify the message elements by indicating length and type of the various components of the byte string as laid out in the specification.
...
... # From the docs:
... #
... # > This module performs conversions between Python values and C structs represented as Python bytes objects. This can be used in handling binary data stored in files or from network connections, among other sources. It uses Format Strings as compact descriptions of the layout of the C structs and the intended conversion to/from Python values.
...
... # Let's walk through the critical steps to parse the trading messages and reconstruct the order book:
...
... # ### Defining format strings
...
... # The parser uses format strings according to the following formats dictionaries:
...
... # In[58]:
...
...
... event_codes = {'O': 'Start of Messages',
... 'S': 'Start of System Hours',
... 'Q': 'Start of Market Hours',
... 'M': 'End of Market Hours',
... 'E': 'End of System Hours',
... 'C': 'End of Messages'}
...
...
... # In[59]:
...
...
... encoding = {'primary_market_maker': {'Y': 1, 'N': 0},
... 'printable' : {'Y': 1, 'N': 0},
... 'buy_sell_indicator' : {'B': 1, 'S': -1},
... 'cross_type' : {'O': 0, 'C': 1, 'H': 2},
... 'imbalance_direction' : {'B': 0, 'S': 1, 'N': 0, 'O': -1}}
...
...
... # In[60]:
...
...
... formats = {
... ('integer', 2): 'H',
... ('integer', 4): 'I',
... ('integer', 6): '6s',
... ('integer', 8): 'Q',
... ('alpha', 1) : 's',
... ('alpha', 2) : '2s',
... ('alpha', 4) : '4s',
... ('alpha', 8) : '8s',
... ('price_4', 4): 'I',
... ('price_8', 8): 'Q',
... }
...
...
... # ### Create message specs for binary data parser
...
... # The ITCH parser relies on message specifications that we create in the following steps.
...
... # #### Load Message Types
...
... # The file `message_types.xlxs` contains the message type specs as laid out in the [documentation](https://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/NQTVITCHSpecification.pdf)
...
... # In[61]:
...
...
... message_data = (pd.read_excel('C://Users//jloss//PyCharmProjects//NASDAQ-ITCH-5.0-VWAP-PARSER//src//message_types.xlsx',
... sheet_name='messages',
... encoding='latin1')
... .sort_values('id')
... .drop('id', axis=1))
...
...
... # #### Basic Cleaning
...
... # The function `clean_message_types()` just runs a few basic string cleaning steps.
...
... # In[62]:
...
...
... def clean_message_types(df):
... df.columns = [c.lower().strip() for c in df.columns]
... df.value = df.value.str.strip()
... df.name = (df.name
... .str.strip() # remove whitespace
... .str.lower()
... .str.replace(' ', '_')
... .str.replace('-', '_')
... .str.replace('/', '_'))
... df.notes = df.notes.str.strip()
... df['message_type'] = df.loc[df.name == 'message_type', 'value']
... return df
...
...
... # In[63]:
...
...
... message_types = clean_message_types(message_data)
...
...
... # #### Get Message Labels
...
... # We extract message type codes and names so we can later make the results more readable.
...
... # In[64]:
...
...
... message_labels = (message_types.loc[:, ['message_type', 'notes']]
... .dropna()
... .rename(columns={'notes': 'name'}))
... message_labels.name = (message_labels.name
... .str.lower()
... .str.replace('message', '')
... .str.replace('.', '')
... .str.strip().str.replace(' ', '_'))
... # message_labels.to_csv('message_labels.csv', index=False)
... message_labels.head()
...
...
... # ### Finalize specification details
...
... # Each message consists of several fields that are defined by offset, length and type of value. The `struct` module will use this format information to parse the binary source data.
...
... # In[65]:
...
...
... message_types.message_type = message_types.message_type.ffill()
... message_types = message_types[message_types.name != 'message_type']
... message_types.value = (message_types.value
... .str.lower()
... .str.replace(' ', '_')
... .str.replace('(', '')
... .str.replace(')', ''))
... message_types.info()
...
...
... # In[68]:
...
...
... message_types.head()
...
...
... # Optionally, persist/reload from file:
...
... # In[67]:
...
...
... message_types.to_csv('message_types.csv', index=False)
... message_types = pd.read_csv('message_types.csv')
...
...
... # The parser translates the message specs into format strings and namedtuples that capture the message content. First, we create `(type, length)` formatting tuples from ITCH specs:
...
... # In[72]:
...
...
... message_types.loc[:, 'formats'] = (message_types[['value', 'length']]
... .apply(tuple, axis=1).map(formats))
...
...
... # Then, we extract formatting details for alphanumerical fields
...
... # In[73]:
...
...
... alpha_fields = message_types[message_types.value == 'alpha'].set_index('name')
... alpha_msgs = alpha_fields.groupby('message_type')
... alpha_formats = {k: v.to_dict() for k, v in alpha_msgs.formats}
... alpha_length = {k: v.add(5).to_dict() for k, v in alpha_msgs.length}
...
...
... # We generate message classes as named tuples and format strings
...
... # In[74]:
...
...
... message_fields, fstring = {}, {}
... for t, message in message_types.groupby('message_type'):
... message_fields[t] = namedtuple(typename=t, field_names=message.name.tolist())
... fstring[t] = '>' + ''.join(message.formats.tolist())
...
...
... # Fields of `alpha` type (alphanumeric) require post-processing as defined in the `format_alpha` function:
...
... # In[75]:
...
...
... def format_alpha(mtype, data):
... """Process byte strings of type alpha"""
...
... for col in alpha_formats.get(mtype).keys():
... if mtype != 'R' and col == 'stock':
... data = data.drop(col, axis=1)
... continue
... data.loc[:, col] = data.loc[:, col].str.decode("utf-8").str.strip()
... if encoding.get(col):
... data.loc[:, col] = data.loc[:, col].map(encoding.get(col))
... return data
...
...
... # ## Process Binary Message Data
...
... # The binary file for a single day contains over 350,000,000 messages worth over 12 GB.
...
... # In[76]:
...
...
... def store_messages(m):
... """Handle occasional storing of all messages"""
... with pd.HDFStore(itch_store) as store:
... for mtype, data in m.items():
... # convert to DataFrame
... data = pd.DataFrame(data)
...
... # parse timestamp info
... data.timestamp = data.timestamp.apply(int.from_bytes, byteorder='big')
... data.timestamp = pd.to_timedelta(data.timestamp)
...
... # apply alpha formatting
... if mtype in alpha_formats.keys():
... data = format_alpha(mtype, data)
...
... s = alpha_length.get(mtype)
... if s:
... s = {c: s.get(c) for c in data.columns}
... dc = ['stock_locate']
... if m == 'R':
... dc.append('stock')
... store.put(mtype,
... data,
... format='t',
... min_itemsize=s,
... data_columns=dc)
...
...
... # In[77]:
...
...
... messages = {}
... message_count = 0
... message_type_counter = Counter()
...
...
... # The script appends the parsed result iteratively to a file in the fast HDF5 format using the `store_messages()` function we just defined to avoid memory constraints (see last section in chapter 2 for more on this format).
...
... # The following (simplified) code processes the binary file and produces the parsed orders stored by message type:
...
... # In[78]:
...
...
... start = time()
... with file_name.open('rb') as data:
... while True:
...
... # determine message size in bytes
... message_size = int.from_bytes(data.read(2), byteorder='big', signed=False)
...
... # get message type by reading first byte
... message_type = data.read(1).decode('ascii')
...
... # create data structure to capture result
... if not messages.get(message_type):
... messages[message_type] = []
...
... message_type_counter.update([message_type])
...
... # read & store message
... record = data.read(message_size - 1)
... message = message_fields[message_type]._make(unpack(fstring[message_type], record))
... messages[message_type].append(message)
...
... # deal with system events
... if message_type == 'S':
... timestamp = int.from_bytes(message.timestamp, byteorder='big')
... print('\n', event_codes.get(message.event_code.decode('ascii'), 'Error'))
... print('\t{0}\t{1:,.0f}'.format(timedelta(seconds=timestamp * 1e-9),
... message_count))
... if message.event_code.decode('ascii') == 'C':
... store_messages(messages)
... break
...
... message_count += 1
... if message_count % 2.5e7 == 0:
... timestamp = int.from_bytes(message.timestamp, byteorder='big')
... print('\t{0}\t{1:,.0f}\t{2}'.format(timedelta(seconds=timestamp * 1e-9),
... message_count,
... timedelta(seconds=time() - start)))
... store_messages(messages)
... messages = {}
...
...
... print(timedelta(seconds=time() - start))
...
...
... # ## Summarize Trading Day
...
... # ### Trading Message Frequency
...
... # In[79]:
...
...
... counter = pd.Series(message_type_counter).to_frame('# Trades')
... counter['Message Type'] = counter.index.map(message_labels.set_index('message_type').name.to_dict())
... counter = counter[['Message Type', '# Trades']].sort_values('# Trades', ascending=False)
... print(counter)
...
...
... # In[81]:
...
...
... with pd.HDFStore(itch_store) as store:
... store.put('summary', counter)
...
...
... # ### Top Equities by Traded Value
...
C:\Users\jloss\venv\ITCH50parser\lib\site-packages\pandas\core\generic.py:5208: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[name] = value
<class 'pandas.core.frame.DataFrame'>
Int64Index: 152 entries, 1 to 172
Data columns (total 6 columns):
name 152 non-null object
offset 152 non-null int64
length 152 non-null int64
value 152 non-null object
notes 152 non-null object
message_type 152 non-null object
dtypes: int64(2), object(4)
memory usage: 8.3+ KB
Start of Messages
3:03:59.687761 0
Start of System Hours
4:00:00.000181 219,799
Start of Market Hours
9:30:00.000036 10,532,163
9:39:49.689879 25,000,000 0:01:19.884404
10:01:44.569840 50,000,000 0:04:43.833597
10:26:57.655610 75,000,000 0:07:58.918175
10:55:11.316923 100,000,000 0:11:11.258512
11:23:10.732310 125,000,000 0:14:24.130732
11:57:40.768604 150,000,000 0:17:37.293754
12:36:14.416343 175,000,000 0:20:49.210650
13:22:12.680450 200,000,000 0:24:02.729754
14:00:58.959369 225,000,000 0:27:20.667001
14:20:21.253174 250,000,000 0:30:37.013699
14:42:13.639272 275,000,000 0:33:49.794699
15:05:20.251304 300,000,000 0:37:03.289516
15:30:15.801362 325,000,000 0:40:24.247659
15:53:07.026351 350,000,000 0:43:42.851925
End of Market Hours
16:00:00.000113 365,323,584
End of System Hours
20:00:00.000021 368,335,086
End of Messages
20:05:00.000034 368,366,633
0:48:10.344354
Message Type # Trades
A add_order_no_mpid_attribution 162970455
D order_delete 158273361
U order_replace 27222746
E order_executed 8096995
X order_cancel 4669874
I noii 3684511
F add_order_mpid_attribution 1725898
P trade 1326184
L market_participant_position 193769
C order_executed_with_price 158886
Q cross_trade 17430
Y reg_sho_short_sale_price_test_restricted_indic... 8821
H stock_trading_action 8805
R stock_directory 8714
B broken_trade 116
J luld_auction_collar 62
S system_event 6
V market_wide_circuit_breaker_decline_level 1
>>> from main import *
...
...
... # build order book flow for the given day
... stock = 'TSLA'
... order_dict = {-1: 'sell', 1: 'buy'}
...
... # get all messages for the chosen stock
... def get_messages(date, stock = stock):
... with pd.HDFStore(itch_store) as store:
... stock_locate = store.select('R', where='stock = stock').stock_locate.iloc[0]
... target = 'stock_locate = stock_locate'
...
... data = {}
... trading_msgs = ['A', 'F', 'E', 'C', 'X', 'D', 'U', 'P', 'Q']
... for msg in trading_msgs:
... data[msg] = store.select(msg, where = target).drop('stock_locate', axis = 1).assign(type = msg)
...
... # public key records in each type of message (order_ref_number, stock locate code, etc)
... order_cols = ['order_reference_number', 'buy_sell_indicator', 'shares', 'price']
...
... # 'A' and 'F' message types are for Add Orders (with and without unattributed orders/quotes)
... orders = pd.concat([data['A'], data['F']], sort=False, ignore_index=True).loc[:, order_cols]
...
... for msg in trading_msgs[2: -3]:
... data[msg] = data[msg].merge(orders, how = 'left')
...
... # Msg for whenever an order on the book has been cancel-replaced
... data['U'] = data['U'].merge(orders, how = 'left',
... right_on = 'order_reference_number',
... left_on = 'original_order_reference_number',
... suffixes = ['', '_replaced'])
...
... # Cross Trade messages:
... data['Q'].rename(columns = {'cross_price': 'price'}, inplace = True)
...
... # Order Cancel Messages:
... data['X']['shares'] = data['X']['cancelled_shares']
... data['X'] = data['X'].dropna(subset = ['price'])
...
... data = pd.concat([data[msg] for msg in trading_msgs],
... ignore_index = True,
... sort = False)
...
... data['date'] = pd.to_datetime(date, format = '%m%d%Y')
... data.timestamp = data['date'].add(data.timestamp)
... data = data[data.printable != 0]
...
... drop_cols = ['tracking_number', 'order_reference_number', 'original_order_reference_number',
... 'cross_type', 'new_order_reference_number', 'attribution', 'match_number',
... 'printable', 'date', 'cancelled_shares']
... return data.drop(drop_cols, axis = 1).sort_values('timestamp').reset_index(drop = True)
...
...
... messages = get_messages(date = date)
... messages.info(null_counts = True)
...
... with pd.HDFStore(order_book_store) as store:
... key = '{}/messages'.format(stock)
... store.put(key, messages)
... print(store.info())
...
... # combine trade orders (reconstruct successful trades)
... def get_trades(msg):
... """Combine C, E, P and Q messages into trading records"""
... trade_dict = {'executed_shares': 'shares', 'execution_price': 'price'}
... cols = ['timestamp', 'executed_shares']
... trades = pd.concat([msg.loc[msg.type == 'E', cols + ['price']].rename(columns = trade_dict),
... msg.loc[msg.type == 'C', cols + ['execution_price']].rename(columns = trade_dict),
... msg.loc[msg.type == 'P', ['timestamp', 'price', 'shares']],
... msg.loc[msg.type == 'Q', ['timestamp', 'price', 'shares']].assign(cross = 1),
... ], sort=False).dropna(subset=['price']).fillna(0)
... return trades.set_index('timestamp').sort_index().astype(int)
...
...
... trades = get_trades(messages)
... print(trades.info())
...
... with pd.HDFStore(order_book_store) as store:
... store.put('{}/trades'.format(stock), trades)
...
... # create orders - accumulate sell orders in ascending and buy orders in desc. order for given timestamps
... def add_orders(orders, buysell, nlevels):
... new_order = []
... items = sorted(orders.copy().items())
... if buysell == 1:
... items = reversed(items)
... for i, (p, s) in enumerate(items, 1):
... new_order.append((p, s))
... if i == nlevels:
... break
... return orders, new_order
...
... # save orders
... def save_orders(orders, append=False):
... cols = ['price', 'shares']
... for buysell, book in orders.items():
... df = (pd.concat([pd.DataFrame(data = data, columns = cols).assign(timestamp = t)
... for t, data in book.items()]))
... key = '{}/{}'.format(stock, order_dict[buysell])
... df.loc[:, ['price', 'shares']] = df.loc[:, ['price', 'shares']].astype(int)
... with pd.HDFStore(order_book_store) as store:
... if append:
... store.append(key, df.set_index('timestamp'), format = 't')
... else:
... store.put(key, df.set_index('timestamp'))
...
... ## iterate over all ITCH msgs to process orders/replacement orders as specified:
... order_book = {-1:{}, 1:{}}
... current_orders = {-1: Counter(), 1: Counter()}
... message_counter = Counter()
... nlevels = 100
...
... start = time()
... for msg in messages.itertuples():
... i = msg[0]
... if i % 1e5 == 0 and i > 0:
... print('{:,.0f}\t\t{}'.format(i, timedelta(seconds=time() - start)))
... save_orders(order_book, append=True)
... order_book = {-1: {}, 1: {}}
... start=time()
... if np.isnan(msg.buy_sell_indicator):
... continue
... message_counter.update(msg.type)
...
... buysell = msg.buy_sell_indicator
... price, shares = None, None
...
... if msg.type in ['A', 'F', 'U']:
... price = int(msg.price)
... shares = int(msg.shares)
... current_orders[buysell].update({price: shares})
... current_orders[buysell], new_order = add_orders(current_orders[buysell], buysell, nlevels)
... order_book[buysell][msg.timestamp] = new_order
...
... if msg.type in ['E', 'C', 'X', 'D', 'U']:
... if msg.type == 'U':
... if not np.isnan(msg.shares_replaced):
... price = int(msg.price_replaced)
... shares = -int(msg.shares_replaced)
... else:
... if not np.isnan(msg.price):
... price = int(msg.price)
... shares = -int(msg.shares)
... if price is not None:
... current_orders[buysell].update({price: shares})
... if current_orders[buysell][price] <= 0:
... current_orders[buysell].pop(price)
... current_orders[buysell], new_order = add_orders(current_orders[buysell], buysell, nlevels)
... order_book[buysell][msg.timestamp] = new_order
...
...
... message_counter = pd.Series(message_counter)
... print(message_counter)
...
... with pd.HDFStore(order_book_store) as store:
... print(store.info())
...
...
...
...
...
...
...
...
...
...
...
...
...
...
message_type name
0 S system_event
5 R stock_directory
23 H stock_trading_action
31 Y reg_sho_short_sale_price_test_restricted_indic...
37 L market_participant_position
<class 'pandas.core.frame.DataFrame'>
Int64Index: 152 entries, 1 to 172
Data columns (total 6 columns):
name 152 non-null object
offset 152 non-null int64
length 152 non-null int64
value 152 non-null object
notes 152 non-null object
message_type 152 non-null object
dtypes: int64(2), object(4)
memory usage: 8.3+ KB
('\n', None)
('\n', name offset length value notes message_type
1 stock_locate 1 2 integer Always 0 S)
('\n', 'Start of Messages')
3:03:59.687761 0
('\n', 'Start of System Hours')
4:00:00.000181 219,799
('\n', 'Start of Market Hours')
9:30:00.000036 10,532,163
9:39:49.689879 25,000,000 0:01:29.215000
10:01:44.569840 50,000,000 0:03:02.431030
10:26:57.655610 75,000,000 0:04:36.968581
10:55:11.316923 100,000,000 0:06:12.014789
11:23:10.732310 125,000,000 0:07:47.788707
11:57:40.768604 150,000,000 0:09:26.672726
12:36:14.416343 175,000,000 0:11:01.649949
13:22:12.680450 200,000,000 0:12:43.246565
14:00:58.959369 225,000,000 0:14:22.839424
14:20:21.253174 250,000,000 0:15:57.240454
14:42:13.639272 275,000,000 0:17:33.944669
15:05:20.251304 300,000,000 0:19:13.228699
15:30:15.801362 325,000,000 0:20:50.367699
15:53:07.026351 350,000,000 0:22:23.574699
('\n', 'End of Market Hours')
16:00:00.000113 365,323,584
('\n', 'End of System Hours')
20:00:00.000021 368,335,086
('\n', 'End of Messages')
20:05:00.000034 368,366,633
0:23:46.911519
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88335 entries, 0 to 88334
Data columns (total 9 columns):
timestamp 88335 non-null datetime64[ns]
buy_sell_indicator 68612 non-null float64
shares 70208 non-null float64
price 70208 non-null float64
type 88335 non-null object
executed_shares 8420 non-null float64
execution_price 7 non-null float64
shares_replaced 37 non-null float64
price_replaced 37 non-null float64
dtypes: datetime64[ns](1), float64(7), object(1)
memory usage: 6.1+ MB
<class 'pandas.io.pytables.HDFStore'>
File path: C:\Users\jloss\PyCharmProjects\NASDAQ-ITCH-5.0-VWAP-PARSER\data\order_book.h5
/TSLA/messages frame (shape->[88335,9])
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 13172 entries, 2019-01-30 15:53:07.518469637 to 2019-01-30 19:59:58.535897223
Data columns (total 3 columns):
shares 13172 non-null int32
price 13172 non-null int32
cross 13172 non-null int32
dtypes: int32(3)
memory usage: 257.3 KB
None
A 30103
P 5375
E 7789
D 25142
X 130
F 29
U 37
C 7
dtype: int64
<class 'pandas.io.pytables.HDFStore'>
File path: C:\Users\jloss\PyCharmProjects\NASDAQ-ITCH-5.0-VWAP-PARSER\data\order_book.h5
/TSLA/messages frame (shape->[88335,9])
/TSLA/trades frame (shape->[13172,3])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment