Pattern sniffing
Every three months, a special kind of dread would settle over me. Accounting time. The ritual was always the same:
- Download bank statements as pdfs (why pdfs? Why not?)
- Gather all digital receipts from my inbox.
- Armed with green, yellow, blue and red highlighters in Adobe Acrobat, I start the matching game.
- Highlight bank statement with green for «found the receipt», red for «receipt missing + here’s why explanation», blue for taxes, salaries or incomes. Yellow for stuff that deviates from the normal.
- Repeat for all
3 months in the quarter while wishing I could spend the time designing or coding something else.
By the end, I have a marked pdf that looks like this:
The better way
After building my previous custom tools, I started wondering: Could an ai understand the visual patterns of invoice pdfs, sniff out and extract the sum and match it to the bank statement?
I gathered a stack of typical invoices: hosting providers, equipment rentals, software subscriptions, and asked Claude: «Can you identify patterns in where the total amount appears on different types of invoices?». Here is the first prompt:
I need some help find and match sums on incoming invoices to an account balance. So can you help me with:
Look at the amounts in the «Belopp"-column in Transaktioner_jan_2025. csv
Find an invoince in the «Invoices"-directory that has a sum that also exists in the «Belopp"-column in the account balance pdf.
Invoices that has a matching sum shall be moved to the «Found"-directory.
Create a .md file that lists all the names of the invoices found and the sum that matched.
The path to the directory in which you find the files is: /a-pretty/long/path/where/my-bokföringsfiler-is/2025/ Q1/
To help you find the amounts on the pdfs in the «Invoices"-directory the sums can be on words like:
«Att betala», «Total», «Belopp», «Summa att betala», «Total due», «Brutto» and «Amount paid».
The answer was in short «No». But it took a while to get there, and that is one of my big issues with Ai … they are like dogs, they try their hardest to please and fulfil the task you give them. Claude tried and tried and tried and had a million different ways of trying:
… let’s examine each pdf.
Let’s improve the approach.
Let me try again with a simpler approach.
Let me try a different approach.
Let me manually check a few of the pdf.
Let me try a more systematic approach.
Let me try a more straightforward approach.
Let me try a simplified approach
So after a while, I interrupted and asked what the blockers were.
- Claude don’t have reliable pdf parsing capabilities.
- Each invoice likely has a different layout and format.
- Text recognition limitations.
- Some of the files have Swedish characters, which adds another layer of complexity to text extraction.
Currently, I’m essentially trying to match
numbers without being able to properly see
what’s in the documents.» was the final answer.
Fine, Claude, no problem. But why didn’t you say that straight up? Let’s build an app instead!
Building without a prompt and no wireframes.
For the image bank solution, I wireframed what I wanted, for this I didn’t care. What was important was the precision with which the app would find and match amounts.
And since Claude had spent
After a bit of back and forth, it was decided the best solution would be a Python app that would try to find sums in close proximity to keywords like:
- «Total Due»
- «Amount Due»
- «Total»
- «Paid»
- «Att betala»
- «Fakturabelopp»
And they’re usually in larger fonts, often at the bottom right of documents, frequently preceded, or succeeded, by currency symbols.
Claude also guided me on how to set up and run the Python server on my laptop (see more on servers and stuff in part
The new workflow
Unfortunately the workflow still contains downloading bank statements (but now as a .csv) and download all invoice pdfs from my Gmail.
Step
Step
Step
Step
Step
Step Costarea_Nofont_Month_Year_invoicename.pdf
Step X
for each invoice that matches the bank statement.
For matches it can’t make automatically, the app displays each invoice pdf and asks me to type in the amount and confirm the match manually.
Since I didn’t care about wireframing, the app looked like any other Ai-built app.
The pattern sniffing magic
What amazed me was how quickly the ai built a very powerful algorithm and set of rules to read different invoice formats. Swedish invoices, international invoices, receipts, subscription confirmations – the system adapted to each layout and correctly identified the key information.
It even handles edge cases I hadn’t considered: - Invoices with multiple amounts (subtotal, vat, total). - Foreign currency conversions. - Partial payments split across multiple transactions.
The app in action
Final version of the app with my branding … now in hindsight it still kind of looks like shit, just with my fonts and colours on it. 🙈
The results speak Swedish
The first version of the app had a success rate of about
Next up: How to further automate and extend your apps with web browsers with built-in Ai agents. In my case speeding up the downloading of all invoice pdfs from my Gmail.