By Nathan Adlam
An interesting job posting I found recently is the inspiration for my latest project, a brief money-laundering analysis using Python. A bit of background reading has demonstrated some challenges with anti money laundering, so I thought I would try to take a hands-on approach to learning more about this field.
After some thought I decided a realistic goal for this project would be to find at least 3 areas of suspicious activity in the dataset.
I found a simulated transaction dataset meant for the development of transaction monitoring approaches. While Machine Learning will not be the focus of this exercise, I decided I would try some manual analysis and see where it takes me. Let's dive in.
So, let's start with some basics with the data, to get a better idea of what we're working with. The first thing I took a look at were the top 20 transactions by amount, to see how big they were.
I found some transactions of a very high amount (tens of millions of pounds), as well as some transactions between countries and currencies that could be considered risky.
One useful piece of information would be a histogram of differing transaction amounts. This could tell us how much money the most common transactions are and if there are any amounts that could be flagged based on their amount (over a certain threshold, or just under a certain threshold to attempt to avoid detection, often called smurfing in the money-laundering world.)
For this histogram, deposits over 15,000 pounds were excluded because of how they skewed the data, making a histogram plot less useable. Those could be examined separately.
While "small" transactions between 0-500 are the most common, there is a consistent trend downwards, until about 4000 pounds. This could be a threshold after which some type of reporting may be necessary. After 4500 pounds starts another smooth downward trend, which may represent larger verified transactions.
Next, we can look at which accounts sent and received the highest number of transactions and some statistics regarding those transactions. Here are the top 10 senders in terms of transaction counts.
And here are the top 10 receivers in terms of transaction counts.
One big thing that stands out here is that some of the accounts appear on both the sender and receiver accounts. For example, account 2938210715 appears in 2nd place in sender accounts and receiver accounts for number of transactions. The account sends 753 transactions and receives 745, which seems suspiciously similar.
Another type of account that stands out is one that sends a minimum of 10.96 pounds and a max of 285805.91 pounds. The max amount is greater than half of the total amount sent. This seems like a rather peculiar use case.
The top 10 senders appear to be businesses, given their high number of transactions (to pay employees, suppliers, etc.), range between minimums and maximums (payments to both low and high-level employees), and even similar total amounts. A couple of them have very small minimum payments, which are a bit surprisingly small.
I'm noticing some symmetry between both the top sender and receiver accounts, in terms of rough transaction counts, mins, maxes, and total amounts. I expected the top receiver accounts to not have as many transactions as a business who has a lot of transactions to make, so maybe there's a use case I'm not seeing. I will keep that in mind going forward.
And while we're on the topic of count of transactions, we can examine the distribution of transaction counts for senders.
As a disproportionate number of accounts made transactions from 0-20, the range that stands out from this chart is around 75-100 transactions.
This distribution excludes accounts with transactions between 0-20. Measured over the course of about 11 months, This seems like a low risk for money laundering, but let's take a look at what it looks like from 0 to 20.
Here we can see that by far the highest number of sender transactions is 12. The data for this timeline spans around that many months, so if a sender sends that many transactions, that could be someone's account paying someone monthly.
There are also a notable number of accounts that only sent 1 payment, which could be a risk for money laundering.
And on the reciever side, here we have a histogram of the transactions per receivers. This one was scaled from 25 to 325 because a disproportionate number of receiver accounts have between 15-20 transactions (almost 50% of the number of unique receivers).
There seems to be a strangely high number of transaction counts per receiver somewhere around 75.
But for now, I would like to take a quick look at an interesting part of the graph between 0 and 50, since I found a couple interesting values there.
Zooming in between 0 to 35, the number of transactions that stood out are 1 and 13. Almost 100,000 accounts received only 1 transaction, and almost 200,000 received exactly 13 transactions. As the number of receiving accounts was about 650,000, these 2 values account for a large plurality of those. I can see a use case for an account receiving 1 transaction (laundering money) but I can't yet see a use case for 13.
Another thing to look at would be the senders with the most unique receivers. This could represent a business account that has a lot of people to pay, or it could be a great way to launder money. Here are the top to senders with the most unique receivers and some of their statistics.
Comparing this to the senders with the highest transaction counts, the minimum amounts are typically lower here. This could represent a business account with a very small expense, or an attempt to throw off the system by including a low payment.
And one final piece to this analysis, admittedly still in development however more ambitious than the rest of the analysis, is to find loops where money leaves one account and follows a loop back to that original account. This can be accomplished using depth-first search (DFS) as part of a graph-theory approach.
I have been able to locate some cycles where money leaves one account and returns back to it after one intermediate step. While not all of these transactions must be so, I am under the impression that this may qualify as suspicious activity.
Here is one specific example. One account sends about 43.5 thousand pounds to another account, which sends a similar-but-slightly-less amount back to it around 6 weeks later.
So to recap, some suspicious activity that was discovered includes:
large transactions to risky countries (Switzerland, Nigeria, Morocco, etc.)
transactions just under around 4500 pounds (smurfing - or transacting just under a necessary reporting limit)
accounts that receive or send only 1 transaction
high-transaction accounts that both send and receive a similar number of transactions
accounts that cycle money back to themselves after a number of intermediate steps