Interactive Brokers provide activity statements in various formats, including PDF or CSV. That said, if you have a large number of transactions, it is better to insert all your IB statements into a database, which you can query as needed. It saves a lot of time. It automates logging of your trades. It makes performance analysis, accounting, and taxes much easier.
Below I introduce the rough approach how I do it. I will not go into too much detail of individual table schemas and scripts, just the big picture and a few useful hints. I use Python and MariaDB, but the logic is also valid for other programming languages and DBMS.
Downloading CSV statements from IB
First step is to download the statements from IB.
Go to your Account Management, Statements, choose Activity Statements.
Choose the CSV format, which is easiest to process.
IB offers them in various frequencies – daily, monthly, annual, or custom periods.
For my database I choose daily, as I want daily mark-to-market and valuations.
I also download monthly and annual, but only use some of the data from those (the transactions are obviously the same).
I download all the statements manually, which is fast, as the IB statement download interface has improved recently (previously the form did not remember the date you requested last, so you always had to click through the calendars when downloading statements from a more distant past – not it does and everything is quick).
IB CSV statement structure
IB Activity Statement CSV has a variable number of columns, but the first two are always the same.
The first column is statement section, which can have the following values (there may be more if you trade something different than I do):
- Account Information
- Net Asset Value
- Change in NAV
- Mark-to-Market Performance Summary
- Realized & Unrealized Performance Summary
- Month & Year to Date Performance Summary
- Cash Report
- Open Positions
- Collateral for Customer Borrowing
- Forex Balances
- Interest Accruals
- Financial Instrument Information
- Base Currency Exchange Rate
- Notes/Legal Notes
Each of these statement sections has different structure (different column count and meaning), but there are a few useful features:
- Number and meaning of columns is consistent within the same section.
- Rows belonging to same section are consecutive.
- The first row of each section is header row.
- Second column in all sections indicates row type. It can be either "Header" or "Data".
For further processing it is best to split the statement CSV file into individual section CSV files.
Splitting statement CSV into sections
A few hints (Python):
Read the CSV with
readlines(). Something like
pandas.read_csv() will not work due to the inconsistent structure between sections.
in_path = ... # path to CSV file
with open(in_path, 'r') as f:
rows = f.readlines()
Because the first column is always section name, which we need to allocate the row to the right section, we can split each row by the first comma into two parts:
- Section name
- Rest of row
Just beware double quotes. Some section names contain a comma and may be enclosed in double quotes.
The output of this process can be a
dict with keys = section names and values = list of rows for that section. Then it is easy to create individual section CSVs from that.
Most of the above listed statement sections have a corresponding table in my database. Therefore, the schema is mostly determined by the IB CSV statement sections and columns (with only minor changes in most tables).
There are five parts of my IB statements database:
- Balance = balances at the end of statement period
- Performance = position value changes, interest accruals, and other items affecting total profit/loss for the statement period
- Statement = meta and aggreate information about each statement
- Structure = meta and helper tables with information about the database structure
- Transactions = trades, interest, fees, deposits and withdrawals during the statement period
Each part includes several tables, which are listed below.
bal_cash= "Forex Balances" CSV statement section
bal_complex_positions= "Complex Positions Summary"
bal_nav= "Net Asset Value"
bal_positions= "Open Positions"
per_cash_changes= "Cash Report"
per_interest_accruals= "Interest Accruals"
per_mtm= "Mark-to-Market Performance Summary"
per_nav_changes= "Change in NAV"
per_performance= "Realized & Unrealized Performance Summary"
per_ytd_performance= "Month & Year to Date Performance Summary"
Tables in this part contain meta and aggregate data for individual statements, such as exchange rates used to valuate non-base currency positions or total profit/loss and performance for the statement period.
smt_currency_rates= "Base Currency Exchange Rate"
smt_notes= "Notes-Legal Notes"
smt_statements= "Statement" + "Total P-L" + "Total Return for Statement Period"
smt_symbols= "Financial Instrument Information"
smt_symbols contains symbols and other specifications for any financial instrument that appears in the statements. I only insert each symbol the first time, from the first statement where it appears.
Tables in this part do not correspond to CSV statement sections. They are mainly for my own reference.
str_columns= dict of CSV statement section column names (values in the Header column) to database table column names
str_instrument_types= equity / futures / option / forex
str_sections= dict of CSV sections to database table names
str_statement_types= daily / monthly / yearly / custom range / special
str_columns are used by the update script to find the correct database table and column for every CSV statement section and column.
str_statement_types is used to give each statement a standardized name. For instance, all daily statements are named "Dyymmdd" (year, month, day), all monthly statements are named "Myymm" etc.
trn_funding= "Deposits & Withdrawals"
Transactions are the same in all statements covering a particular trading day. Therefore, I only insert them from the daily statements to avoid duplicates.