Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johan--/b91027467be49f576151 to your computer and use it in GitHub Desktop.
Save johan--/b91027467be49f576151 to your computer and use it in GitHub Desktop.

Credit Card Fraud Detection

This interactive Neo4j graph tutorial covers a common credit card fraud detection scenario.


Table of Contents


Introduction to Problem

Banks, merchants and credit card processors companies lose billions of dollars every year to credit card fraud. Credit card data can be stolen by criminals but sometimes the criminal is simply the clerk that processes your card when you buy things.


Typical Scenario

In December 2013, the Abington Police arrested two Post Office employees for stealing credit cards and using it to buy more than $50,000 worth of merchandise. Their scheme is quite typical, here is how they operated :

  • the Post Office clerks copied the credit card information of some of the customer they served ;

  • they looked up the customers to get their address ;

  • they would make orders online for goods or gift cards and have them delivered at their victims' home address ;

  • an accomplice would post at the address and try to intercept the deliveries ;

It took police forces a few months before identifying the criminals.


Explanation of Solution

Graph databases can help find credit card thieves faster. By representing transactions as a graph, we can look for the common denominator in the fraud cases and find the point of origin of the scam.


Credit Card Fraud Graph Data Model

A series of credit card transactions can be represented as a graph. Each transaction involves two nodes : a person (the customer) and a merchant. The nodes are linked by the transaction itself. A transaction has a date and a status.

Legitimate transactions have the status "Undisputed". Fraudulent transactions are "Disputed".

The graph data model below represents how the data looks as a graph.

Credit Card Fraud

Sample Data Set

// Create customers
CREATE (Paul:person {id:'1', name:'Paul', gender:'man', age:'50'})
CREATE (Jean:person {id:'2', name:'Jean', gender:'man', age:'48'})
CREATE (Dan:person {id:'3', name:'Dan', gender:'man', age:'23'})
CREATE (Marc:person {id:'4', name:'Marc', gender:'man', age:'30'})
CREATE (John:person {id:'5', name:'John', gender:'man', age:'31'})
CREATE (Zoey:person {id:'6', name:'Zoey', gender:'woman', age:'52'})
CREATE (Ava:person {id:'7', name:'Ava', gender:'woman', age:'23'})
CREATE (Olivia:person {id:'8', name:'Olivia', gender:'woman', age:'58'})
CREATE (Mia:person {id:'9', name:'Mia', gender:'woman', age:'51'})
CREATE (Madison:person {id:'10', name:'Madison', gender:'woman', age:'37'})

// Create merchants
CREATE (Amazon:merchant {id:'11', name:'Amazon', street:'2626 Wilkinson Court', address:'San Bernardino, CA 92410'})
CREATE (Abercrombie:merchant {id:'12', name:'Abercrombie', street:'4355 Walnut Street', age:'San Bernardino, CA 92410'})
CREATE (Wallmart:merchant {id:'13', name:'Wallmart', street:'2092 Larry Street', age:'San Bernardino, CA 92410'})
CREATE (MacDonalds:merchant {id:'14', name:'MacDonalds', street:'1870 Caynor Circle', age:'San Bernardino, CA 92410'})
CREATE (American_Apparel:merchant {id:'15', name:'American Apparel', street:'1381 Spruce Drive', age:'San Bernardino, CA 92410'})
CREATE (Just_Brew_It:merchant {id:'16', name:'Just Brew It', street:'826 Anmoore Road', age:'San Bernardino, CA 92410'})
CREATE (Justice:merchant {id:'17', name:'Justice', street:'1925 Spring Street', age:'San Bernardino, CA 92410'})
CREATE (Sears:merchant {id:'18', name:'Sears', street:'4209 Elsie Drive', age:'San Bernardino, CA 92410'})
CREATE (Soccer_for_the_City:merchant {id:'19', name:'Soccer for the City', street:'86 D Street', age:'San Bernardino, CA 92410'})
CREATE (Sprint:merchant {id:'20', name:'Sprint', street:'945 Kinney Street', age:'San Bernardino, CA 92410'})
CREATE (Starbucks:merchant {id:'21', name:'Starbucks', street:'3810 Apple Lane', age:'San Bernardino, CA 92410'})
CREATE (Subway:merchant {id:'22', name:'Subway', street:'3778 Tenmile Road', age:'San Bernardino, CA 92410'})
CREATE (Apple_Store:merchant {id:'23', name:'Apple Store', street:'349 Bel Meadow Drive', age:'Kansas City, MO 64105'})
CREATE (Urban_Outfitters:merchant {id:'24', name:'Urban Outfitters', street:'99 Strother Street', age:'Kansas City, MO 64105'})
CREATE (RadioShack:merchant {id:'25', name:'RadioShack', street:'3306 Douglas Dairy Road', age:'Kansas City, MO 64105'})
CREATE (Macys:merchant {id:'26', name:'Macys', street:'2912 Nutter Street', age:'Kansas City, MO 64105'})

// Create transaction history
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'986', time:'4/17/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'239', time:'5/15/2014', status:'Undisputed'}]->(Starbucks)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'475', time:'3/28/2014', status:'Undisputed'}]->(Sears)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'654', time:'3/20/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'196', time:'7/24/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'502', time:'4/9/2014', status:'Undisputed'}]->(Abercrombie)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'848', time:'5/29/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'802', time:'3/11/2014', status:'Undisputed'}]->(Amazon)
CREATE (Jean)-[:HAS_BOUGHT_AT {amount:'203', time:'3/27/2014', status:'Undisputed'}]->(Subway)
CREATE (Dan)-[:HAS_BOUGHT_AT {amount:'35', time:'1/23/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Dan)-[:HAS_BOUGHT_AT {amount:'605', time:'1/27/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Dan)-[:HAS_BOUGHT_AT {amount:'62', time:'9/17/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Dan)-[:HAS_BOUGHT_AT {amount:'141', time:'11/14/2014', status:'Undisputed'}]->(Amazon)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'134', time:'4/14/2014', status:'Undisputed'}]->(Amazon)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'336', time:'4/3/2014', status:'Undisputed'}]->(American_Apparel)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'964', time:'3/22/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'430', time:'8/10/2014', status:'Undisputed'}]->(Sears)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'11', time:'9/4/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'545', time:'10/6/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'457', time:'10/15/2014', status:'Undisputed'}]->(Sprint)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'468', time:'7/29/2014', status:'Undisputed'}]->(Justice)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'768', time:'11/28/2014', status:'Undisputed'}]->(American_Apparel)
CREATE (John)-[:HAS_BOUGHT_AT {amount:'921', time:'3/12/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'740', time:'12/15/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'510', time:'11/27/2014', status:'Undisputed'}]->(Abercrombie)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'414', time:'1/20/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'721', time:'7/17/2014', status:'Undisputed'}]->(Amazon)
CREATE (Zoey)-[:HAS_BOUGHT_AT {amount:'353', time:'10/25/2014', status:'Undisputed'}]->(Subway)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'681', time:'12/28/2014', status:'Undisputed'}]->(Sears)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'87', time:'2/19/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'533', time:'8/6/2014', status:'Undisputed'}]->(American_Apparel)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'723', time:'1/8/2014', status:'Undisputed'}]->(American_Apparel)
CREATE (Ava)-[:HAS_BOUGHT_AT {amount:'627', time:'5/20/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'74', time:'9/4/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'231', time:'7/12/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'924', time:'10/4/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'742', time:'8/12/2014', status:'Undisputed'}]->(Just_Brew_It)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'276', time:'12/24/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'66', time:'4/16/2014', status:'Undisputed'}]->(Starbucks)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'467', time:'12/23/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'830', time:'3/13/2014', status:'Undisputed'}]->(Sears)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'240', time:'7/9/2014', status:'Undisputed'}]->(Amazon)
CREATE (Mia)-[:HAS_BOUGHT_AT {amount:'164', time:'12/26/2014', status:'Undisputed'}]->(Soccer_for_the_City)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'630', time:'10/6/2014', status:'Undisputed'}]->(MacDonalds)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'19', time:'7/29/2014', status:'Undisputed'}]->(Abercrombie)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'352', time:'12/16/2014', status:'Undisputed'}]->(Subway)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'147', time:'8/3/2014', status:'Undisputed'}]->(Amazon)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'91', time:'6/29/2014', status:'Undisputed'}]->(Wallmart)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'1021', time:'7/18/2014', status:'Disputed'}]->(Apple_Store)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'1732', time:'5/10/2014', status:'Disputed'}]->(Urban_Outfitters)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'1415', time:'4/1/2014', status:'Disputed'}]->(RadioShack)
CREATE (Paul)-[:HAS_BOUGHT_AT {amount:'1849', time:'12/20/2014', status:'Disputed'}]->(Macys)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'1914', time:'7/18/2014', status:'Disputed'}]->(Apple_Store)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'1424', time:'5/10/2014', status:'Disputed'}]->(Urban_Outfitters)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'1721', time:'4/1/2014', status:'Disputed'}]->(RadioShack)
CREATE (Marc)-[:HAS_BOUGHT_AT {amount:'1003', time:'12/20/2014', status:'Disputed'}]->(Macys)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'1149', time:'7/18/2014', status:'Disputed'}]->(Apple_Store)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'1152', time:'8/10/2014', status:'Disputed'}]->(Urban_Outfitters)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'1884', time:'8/1/2014', status:'Disputed'}]->(RadioShack)
CREATE (Olivia)-[:HAS_BOUGHT_AT {amount:'1790', time:'12/20/2014', status:'Disputed'}]->(Macys)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'1925', time:'7/18/2014', status:'Disputed'}]->(Apple_Store)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'1374', time:'7/10/2014', status:'Disputed'}]->(Urban_Outfitters)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'1368', time:'7/1/2014', status:'Disputed'}]->(RadioShack)
CREATE (Madison)-[:HAS_BOUGHT_AT {amount:'1816', time:'12/20/2014', status:'Disputed'}]->(Macys)

RETURN *

You can download the complete dataset here : https://www.dropbox.com/s/4uij4gs2iyva5bd/credit%20card%20fraud.zip

Identify the Fraudulent Transactions

We collect all the fraudulent transactions.

MATCH (victim:person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
RETURN victim.name as customer_name, merchant.name as store_name, r.amount as amount, r.time as transaction_time
ORDER BY transaction_time DESC

Identify the Point of Origin of the Fraud

Now we know which customers and which merchants are involved in our fraud case. But where is the criminal we are looking for? What’s going to help use here is the transaction date on each fraudulent transaction.

The criminal we are looking for is involved in a legitimate transaction during which he captures his victims credit card numbers. After that, he can execute his illegitimate transactions. That means that we not only want the illegitimate transactions but also the transactions happening before the theft.

MATCH (victim:person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
MATCH victim-[t:HAS_BOUGHT_AT]->(othermerchants)
WHERE t.status = "Undisputed" AND t.time < r.time
WITH victim, othermerchants, t ORDER BY t.time DESC
RETURN victim.name as customer_name, othermerchants.name as store_name, t.amount as amount, t.time as transaction_time
ORDER BY transaction_time DESC

Zero in on the criminal

Now we want to find the common denominator. Is there a common merchant in all of these seemingly innocuous transactions? We just have to tweak the Cypher query to sort out the previous results according to the number of occurrences of each merchant.

MATCH (victim:person)-[r:HAS_BOUGHT_AT]->(merchant)
WHERE r.status = "Disputed"
MATCH victim-[t:HAS_BOUGHT_AT]->(othermerchants)
WHERE t.status = "Undisputed" AND t.time < r.time
WITH victim, othermerchants, t ORDER BY t.time DESC
RETURN DISTINCT othermerchants.name as suspicious_store, count(DISTINCT t) as count, collect(DISTINCT victim.name) as victims
ORDER BY count DESC
Where is the thief

Each time a fraudulent transactions has occurred in the days leading to it, the credit card user has visited Wallmart. We know the place and the moment when the customers credit cards numbers were stolen. With a graph visualization solution like Linkurious we could inspect the data to confirm our intuition. Now We can alert the authorities and the merchant on the situation. They should have enough information to do the rest!

For more graph-related use cases, make sure to check the blog of Linkurious : http://linkurio.us/blog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment