This interactive Neo4j graph tutorial shows how to detect a popular fraud scam called "carousel fraud". It is written by Scott Mongeau (Data Scientist @ SARK7) and Jean Villedieu (Co-Founder of Linkurious). Recently a £1 billion VAT fraud connected to terrorism was discovered.
Table of Contents
-
Introduction
-
Data Model
-
Database Setup
-
Cypher Query
-
Conclusion
The Value Added Tax, or VAT is a consumption tax assessed on the value added to goods and services. In the countries that apply it, like the European countries, the consumers pay the VAT tax every time they buy a product or service. It is a very important source of income : France for example levies 135 billions of euros with the VAT tax. That is two times what French citizens pay in income tax.
Contrary to sales taxes that are paid and calculated once, the VAT taxes involve a lot of paperwork as each company has to keep track of it when it makes a transaction. Regardless of whether it is dealing with a simple consumer of with another business.
The VAT tax is complex and costs a lot of money. It is no surprise there are fraudsters trying to take advantage of this, particularly in Europe where it is called "carousel fraud".
In 2012, in the United Kingdom, a fraud ringleader was jailed for 17 years together with fifteen of his accomplices in five trials for faking sale of 4m phones through ghost companies in a complex £176m VAT scam.
In the carousel fraud, a fraudster import goods VAT-free. He the sells the goods to a company controlled by an accomplice and charge him the VAT tax. The goods are then sold through a series of companies, each liable to VAT, and finally exported.
The first link in the chain always disappears. He vanishes with the VAT that he has charged to his customer and that he should report and transfer to the tax agency. The final link also disappears but not before he has reclaimed the VAT it has paid from the tax agency.
The carousel fraud requires a lot of sophistication. The criminals have to invest money, create companies and execute a series of transactions in a short amount of time to be successful. The returns are very substantial with various cases in Europe where more than a hundred millions of euros where appropriated by a few criminals.
The added benefits of this fraud is that it is hard to prove and even harder to detect before the criminals vanished. Thankfully graphs can change that!
The national tax agencies have data they could use to identify fraudtsers. It includes company information, business transactions, tax reports and access to a blacklist of known fraudsters. Each of this source of information can help tax investigators identify potential criminals. The problem though is that the information usually exists in separated silos. It is thus very hard to piece it together and build a complete picture : criminals use this to their advantage and slip through the cracks.
A graph data model is going to help us solve the technical dimension of the challenge :
Here we can see in a single picture how each thing relates to each other. This makes things easier to understand. Furthermore, the graph data model is going to allow us to ask questions by looking at all the data at the same time, instead of focusing on a specific silo.
For this GraphGist, we have prepared a small dataset of companies, transactions and people.
// Create entities
CREATE (CO1:Company { name: '1-Red Phonecard Co.', country: 'USA', type: 'LLC', creation_date: '19/09/2013', epoch:1375674103})
CREATE (CO2:Company { name: '2-Black Phonecard Co.', country: 'USA', type: 'LLC', creation_date: '11/09/2013', epoch:1376472512})
CREATE (CO3:Company { name: '3-Southern Europa Telco', country: 'Italy', type: 'SRL', creation_date: '24/08/2013', epoch:1375931887})
CREATE (CO4:Company { name: '4-Joint Bridge Co.', country: 'Italy', type: 'SRL', creation_date: '15/08/2013', epoch:1375399377})
CREATE (CO5:Company { name: '5-Joint Telco Co.', country: 'Italy', type: 'SRL', creation_date: '26/09/2013', epoch:1375795123})
CREATE (CO6:Company { name: '6-Swift Co.', country: 'Italy', type: 'SpA', creation_date: '18/08/2013', epoch:1377926936})
CREATE (CO7:Company { name: '7-Chips Trading Ltd.', country: 'UK', type: 'LTD', creation_date: '11/08/2013', epoch:1375385104})
CREATE (CO8:Company { name: '8-Chips Global', country: 'UK', type: 'LLC', creation_date: '11/09/2013', epoch:1377453990})
CREATE (CO9:Company { name: '9-Strand VI Co.', country: 'UK', type: 'LLC', creation_date: '15/09/2013', epoch:1375730265})
CREATE (CO10:Company { name: '10-Nexus Trading UK Ltd.', country: 'UK', type: 'LTD', creation_date: '17/09/2013', epoch:1377178159})
CREATE (CO11:Company { name: '11-Nexus Global US Ltd.', country: 'USA', type: 'LTD', creation_date: '15/09/2013', epoch:1376409943})
CREATE (CO12:HoldingCo { name: 'A-Joint IT Group', country: 'Italy', type: 'Holding', creation_date: '23/07/2013', epoch:1374132360})
CREATE (CO13:HoldingCo { name: 'B-Chips UK Group', country: 'UK', type: 'Holding', creation_date: '27/07/2013', epoch:1373826123})
CREATE (CO14:HoldingCo { name: 'C-Nexus Intl Group', country: 'UK', type: 'Holding', creation_date: '14/07/2013', epoch:1373562646})
CREATE (P01:Person { name: 'Alberico Goffredo', country: 'Italy', age: '51', criminal_status: 'nothing'})
CREATE (P02:Person { name: 'Charlie Walt', country: 'USA', age: '41', criminal_status: 'nothing'})
CREATE (P03:Person { name: 'Cletis Bysshe', country: 'USA', age: '54', criminal_status: 'nothing'})
CREATE (P04:Person { name: 'Nicodemo Gionata', country: 'Italy', age: '59', criminal_status: 'known crook'})
CREATE (P05:Person { name: 'Carmelo Achille', country: 'Italy', age: '48', criminal_status: 'nothing'})
CREATE (P06:Person { name: 'Edoardo Primo', country: 'Italy', age: '58', criminal_status: 'nothing'})
CREATE (P07:Person { name: 'Cam Esmond', country: 'UK', age: '41', criminal_status: 'known crook'})
CREATE (P08:Person { name: 'Peyton Ewart', country: 'UK', age: '44', criminal_status: 'nothing'})
CREATE (P09:Person { name: 'Vivian Vann', country: 'UK', age: '57', criminal_status: 'nothing'})
CREATE (P10:Person { name: 'Madilyn Hailey', country: 'UK', age: '30', criminal_status: 'known crook'})
CREATE (P11:Person { name: 'Suzanna Salvage', country: 'UK', age: '32', criminal_status: 'nothing'})
CREATE (P12:Person { name: 'John Hudson', country: 'UK', age: '36', criminal_status: 'nothing'})
// Create relationships
CREATE (CO1)-[:SELLS_TO{date: '41548', item_type: 'phone cards rights', epoch: 1380617873, amt: '10000000'}]->(CO3)
CREATE (CO2)-[:SELLS_TO{date: '41548', item_type: 'phone cards rights', epoch: 1380617873, amt: '15000000'}]->(CO3)
CREATE (CO3)-[:SELLS_TO{date: '41557', item_type: 'phone cards rights', epoch: 1381395473, amt: '25000000'}]->(CO4)
CREATE (CO12)-[:SELLS_TO{date: '41562', item_type: 'phone cards rights', epoch: 1381827473, amt: '25000000'}]->(CO6)
CREATE (CO6)-[:SELLS_TO{date: '41567', item_type: 'phone cards rights', epoch: 1382259473, amt: '25000000'}]->(CO7)
CREATE (CO6)-[:SELLS_TO{date: '41572', item_type: 'phone cards rights', epoch: 1382691473, amt: '25000000'}]->(CO11)
CREATE (CO8)-[:SELLS_TO{date: '41577', item_type: 'phone cards rights', epoch: 1383123473, amt: '25000000'}]->(CO9)
CREATE (CO3)-[:COLLECTS_VAT{date: '41557', item_type: 'VAT paid', epoch: 1381395473, amt: '10000000'}]->(CO4)
CREATE (CO12)-[:COLLECTS_VAT{date: '41562', item_type: 'VAT paid', epoch: 1381827473, amt: '10000000'}]->(CO6)
CREATE (CO12)-[:PARENT_OF{legal_status: 'parent company'}]->(CO4)
CREATE (CO12)-[:PARENT_OF{legal_status: 'parent company'}]->(CO5)
CREATE (CO13)-[:PARENT_OF{legal_status: 'parent company'}]->(CO7)
CREATE (CO13)-[:PARENT_OF{legal_status: 'parent company'}]->(CO8)
CREATE (CO14)-[:PARENT_OF{legal_status: 'parent company'}]->(CO10)
CREATE (CO14)-[:PARENT_OF{legal_status: 'parent company'}]->(CO11)
CREATE (P01)-[:DIRECTOR_OF{position: 'director'}]->(CO1)
CREATE (P02)-[:DIRECTOR_OF{position: 'director'}]->(CO2)
CREATE (P03)-[:DIRECTOR_OF{position: 'director'}]->(CO3)
CREATE (P04)-[:DIRECTOR_OF{position: 'director'}]->(CO4)
CREATE (P05)-[:DIRECTOR_OF{position: 'director'}]->(CO5)
CREATE (P06)-[:DIRECTOR_OF{position: 'director'}]->(CO6)
CREATE (P07)-[:DIRECTOR_OF{position: 'director'}]->(CO7)
CREATE (P08)-[:DIRECTOR_OF{position: 'director'}]->(CO8)
CREATE (P09)-[:DIRECTOR_OF{position: 'director'}]->(CO9)
CREATE (P10)-[:DIRECTOR_OF{position: 'director'}]->(CO10)
CREATE (P02)-[:DIRECTOR_OF{position: 'director'}]->(CO11)
CREATE (P04)-[:DIRECTOR_OF{position: 'director'}]->(CO12)
CREATE (P11)-[:DIRECTOR_OF{position: 'director'}]->(CO13)
CREATE (P12)-[:DIRECTOR_OF{position: 'director'}]->(CO14)
RETURN *
You can download the complete dataset here.
To identify potential fraudsters based on the data we have, we are going to look for :
-
a set of at least three transaction that includes companies from two different countries ;
-
we want the company in the middle of the series to be young (fraudsters like to create dummy companies they can easily discard when they disappear) ;
-
the transactions have to occur in a short amount of time ;
Together, these characteristics define a fraud pattern. Fraud analysts are experts at articulating these patterns. It reflects their experience of the scams and the signs they look out for to identify them. The fraud analysts cannot however analyse hundred of millions of datapoints.
This where Cypher comes in. With the help of Jim Biard, we have prepared a query that identify suspicious transactions automatically :
MATCH p=(a:Company)-[rs:SELLS_TO*]->(c:Company)
WHERE a.country <> c.country
WITH p, a, c, rs, nodes(p) AS ns
WITH p, a, c, rs, filter(n IN ns WHERE n.epoch - 1383123473 < (90*60*60*24)) AS bs
WITH p, a, c, rs, head(bs) AS b
WHERE NOT b IS NULL
WITH p, a, b, c, head(rs) AS r1, last(rs) AS rn
WITH p, a, b, c, r1, rn, rn.epoch - r1.epoch AS d
WHERE d < (15*60*60*24)
RETURN a, b, c, d, r1, rn
There are two chains of transactions that match our pattern. As a fraud analyst, now we’d want to dive deeper in the data (with a tool like Linkurious for example).
Fraud detection is always challenging. For tax authorities is means analyzing big volume of complex highly connected data. We have seen that Neo4j can help turn that data into real insights. The fraud analysts can then focus on visualizing the suspicious cases and stop the criminals!
For more graph-related use cases, make sure to check the blog of Linkurious.