Ezequiel Mastrasso

View Original

Data Analysis meets Personal Finances – Part One

An insight into your finances

Gathering the information

In this blog entry i will go roughly through the process to get my financial transactions for the last 8 years into QIF format that any financial software can read.
Once you have all Incoming & Outgoing Financial Records into a dataset, you can graph, dissect, and make projections of your finances and savings.

Data wrangling – the bad

My bank doesn’t give you .csv files for transactions older than 12 months. This left me with a bunch of .pdf files with transactions for 8 years, and a csv file for the last year.

If your online banking gives you csv from the very beginning, you are lucky! skip this section about how i scrapped the information from different sources

This might as well be your case once you moved banks, you are left with pdfs, and no online banking to create the .csv.
As a starting point i wanted to have all the data as csv, a good old and reliable format that you can easily modify by hand if necessary.
I had to consolidate all this data from different sources into csv format.

Csv and QIF formats are widely used in personal financial software’s out there.

In python using a combination of PyPDF2, textract, regex -and some scrapping- I came up with a python code to decipher -my bank’s specifically- the pdf files. The code was different for the 1st page and last 2 pages to accommodate differences in tables (first page), balance forwards (2nd and Nnd pages), and closing balance on the last page.

This got me 3000+ entries in a .csv with all the transactions for the last 8 years! Great!

Categories Planning

Now it was time to do a bit of clean up and categorizing, of the .csv file. The file contained Date,Description, In,Out,Balance fields.
I started reading through the description list trying to make some sense. It was clear that it will need different approaches to group different transactions.

Before diving into assigning categories I sat down and started thinking what was my expending modus operandi.
Everyone has their own, either tapping for every little transaction, or extracting monies for the week, the month,
where does you daily commuting fares come from, do you use cash for dining out, or the card, etc.

Grouping them properly will be the foundation of any financial dissection, planning, and graphing you will make.
This is where you don’t jump into it: you draw up a plan of nested categories that would make sense according to how you dispose of your money.
I had a few attempts at categories, before deciding for one.

Data Primer – Categories – Applying categories

Ready to categorize all 3000+ entries!

Regex – Descriptions

Regular expressions were extremely useful to separate for example POS (Point of Sale) and ATM.
They start with POS or ATM, followed by a date.
Overseas extractions and expenses got a similar treatment where the last part is the amount extracted, followed by a @ the currency exchange rate.
If the name of a supermarket is in the name field, goes to groceries -unless it starts with POS or ATM- or ends with @
and currency rate. You get the idea.

Difflib – Descriptions

A second take on categories was done using difflib and string sequences approximation ratio.
Any entry that fell out of the regexs, were tackled by difflib by similarities in the Description.
After some trial an error to test approximation ratios of different description strings, it was easy to process the remaining un-categorized entries.

Other entries that were still non categorized went into the category unknown for later manual assignments.

The financial software

Having the csv file, categorized, and ready: It was the perfect time to start testing the different software with the actual data.

What my expectations were for the software:

  1. Online is not required.
    I won’t be checking the finances, entries, graphs, and planning anything on the go, or every that often to need it online

  2. Mobile not required, Desktop is!
    Doesn’t make much sense to see financial data in the a small screen, even for something simple as dissecting expenses of the last 3 months I want screen real state.

  3. Ability to get data in and out easily for analysis

  4. Ability to graph, and do some basic analysis of your expenses data in the software

  5. Simple, no “accountants” ubber-finances/double entry required, but enough to have different accounts, assets, liabilities, etc.

  6. Connection to my bank is not required (and not wanted!).
    Call me old-school, but i try not to connect things that do not need to be connected.
    More over, I wouldn’t be able to sleep at night knowing my “online financial software” is connected to my online bank!

  7. Linux version? is a nice to have plus -as i use RHEL- but I don’t mind what OS is for, if required running this on a VM is not a bother.

  8. Licensed or free, I don’t mind, for something as important as this for your financial future, you can spare a few bucks!

Tested software:

  • PL Cash

  • GnuCash

  • Buddi

  • AceMoney

  • Moneydance

  • YNAB

  • QUICKEN

  • Mint

  • HomeBank

  • KMyMoney

  • Banktivity

  • Money Manager Ex

  • FireFly

Given these expectations, some options were discarded after the basic testing, due to FOMO (fear of missing out) features I had to give them a chance anyway.
This was the list I wrote down to make sure I got the right software for me, some of them really caught my eye on simplicity, ui, etc.
Obviously certain aspects of each one of the software became part of my expectations and the list grew on the go.

I won’t go into details of each software as this is not a software review rant.

MoneyDance was the absolute winner of the list.
It’s simplicity to use, able to have different accounts, assets, budgets, budget tracking (categories driven), double entry is optional, and its amazing graphing flexibility was top notch.
Exactly what I was expecting from the software.

It also has a couple of extensions like:
-Custom home screen, to show specifics about your expenditures/savings ratios
-Balance predictor/money foresight that given an section of time in the past predicts your balances in the future.

2 handy extensions, I specially liked the balance predictor as a way to project future balance.

Graph here a random generated dataset with random expenditures peaks to show