Answer to this Stackoverflow Question
How would one model this kind of data in Neo4j
?:
> HOLDINGS Portfolio Holding Instrument Date BALANCE.USD 1 ABC Stock 1 Share Class A Stock 1 2013-12-31 25360291 2 ABC Stock 1 Share Class A Stock 1 2014-01-31 25302011 3 ABC Stock 1 Share Class B Stock 1 2013-12-31 12264011 4 ABC Stock 1 Share Class B Stock 1 2014-01-31 12893201 5 DEF Fund 1 Share Class EUR Series 1 Fund 1 2013-12-31 21012222 6 DEF Fund 1 Share Class EUR Series 1 Fund 1 2014-01-31 21632101 7 DEF Fund 1 Share Class EUR Series 2 Fund 1 2013-12-31 8214325 8 DEF Fund 1 Share Class EUR Series 2 Fund 1 2014-01-31 8292630 9 DEF Portfolio ABC Account Portfolio ABC 2013-12-31 155364592 10 DEF Portfolio ABC Account Portfolio ABC 2014-01-31 156202162
> FACTORS Instrument Factor ExposureStrength 1 Stock 1 North America:US 1.00 2 Stock 1 Industrials 1.00 3 Fund 1 Liquidity:Low 0.05 4 Fund 1 North America 0.70 5 Fund 1 Europe:Eurozone:Germany 0.20 6 Fund 1 Industrials:Capital Goods:Building Products 0.25
To give a little narrative, a portfolio invests into changing set of instruments (stocks, bonds, other portfolios …) with time-varying value. The instruments in turn are sensitive to factors, which can be hierarchical, such as sectors [(e.g. GICS classification)](http://www.spindices.com/documents/index-policies/methodology-gics.pdf), regions/countries, size etc. Although it is not reflected in the FACTORS
table above, ExposureStrength
can vary with time too.
Portfolio exposure to each factor is then calculated as Balance * ExposureStrength
. The aim is to later aggregate or slice&dice the exposures by different factor sets.
I am not yet an user of Neo4j
but considering learning Cypher
query language, so apologies for not posting reproducible example yet. Was curious how complex would Cypher
snippets look like for this kind of problem.
You could model the time variance by creating one HoldingBalance node that is connected to a Portfolio and Instrument with a date reference, which have :NEXT relationships and are at the same time anchored in a time-tree for quick access of sub-sequences.
Something like this:
(Portfolio)-->(Holding)<--(Instrument) | v (HoldingBalance)<-[:NEXT]-(HoldingBalance)-->(day)-->(month)-->(year)
Same goes for factors
(Factor)-->(Exposure)<--(Instrument) | v (ExposureStrength)<-[:NEXT]-(ExposureStrength)-->(day)-->(month)-->(year)
LOAD CSV WITH HEADERS FROM "https://gist.github.com/jexp/cace2732effa846d9fc2/raw/holdings.csv" AS line
MERGE (p:Portfolio {name:line.Portfolio})
MERGE (h:Holding {name:line.Holding})
MERGE (i:Instrument {name:line.Instrument})
MERGE (h)-[:IN_PORTFOLIO]->(p)
MERGE (h)-[:HOLDING_INSTRUMENT]->(i)
MERGE (hb:HoldingBalance {name:line.Holding, date:line.Date}) ON CREATE SET hb.balance=toInt(line.BalanceUsd)
MERGE (hb)-[:BALANCE_OF]->(h)
LOAD CSV WITH HEADERS FROM "https://gist.githubusercontent.com/jexp/cace2732effa846d9fc2/raw/factors.csv" AS line
WITH line, split(line.Factor,":") as factors
MERGE (f:Factor {name:factors[length(factors)-1]})
FOREACH (r in range(1,length(factors)-1) |
MERGE (f1:Factor {name:factors[r-1]})
MERGE (f2:Factor {name:factors[r]})
MERGE (f1)-[:SUB_FACTOR]->(f2))
MERGE (i:Instrument {name:line.Instrument})
MERGE (f)-[:FACTOR_INSTRUMENT]->(i)
MERGE (es:ExposureStrength {name:line.Factor, date:line.Date}) ON CREATE SET es.exposureStrength=toFloat(line.ExposureStrength)
MERGE (es)-[:EXPOSURE_STRENGTH]->(f)
MATCH (hb:HoldingBalance)
MERGE (y:Year {year:substring(hb.date,0,4)})
MERGE (y)<-[:IN_YEAR]-(m:Month {month:substring(hb.date,5,2)})
MERGE (m)<-[:IN_MONTH]-(d:Day {day:substring(hb.date,8,2)})
MERGE (hb)-[:FOR_DAY]->(d)
WITH hb
ORDER BY hb.date ASC
WITH hb.name as name, collect(hb) as events
FOREACH (r in range(0,size(events)-2) |
FOREACH (e1 in [events[r]] | FOREACH (e2 in [events[r+1]] | MERGE (e1)-[:NEXT]->(e2))))
MATCH (es:ExposureStrength)
MERGE (y:Year {year:substring(es.date,0,4)})
MERGE (y)<-[:IN_YEAR]-(m:Month {month:substring(es.date,5,2)})
MERGE (m)<-[:IN_MONTH]-(d:Day {day:substring(es.date,8,2)})
MERGE (es)-[:FOR_DAY]->(d)
WITH es
ORDER BY es.date ASC
WITH es.name as name, collect(es) as events
FOREACH (r in range(0,size(events)-2) |
FOREACH (e1 in [events[r]] | FOREACH (e2 in [events[r+1]] | MERGE (e1)-[:NEXT]->(e2))))
MATCH (i)<-[:HOLDING_INSTRUMENT]-(h)-[:IN_PORTFOLIO]->(p:Portfolio { name:"DEF" }),(hb)-[:BALANCE_OF]->(h),(es)-[:EXPOSURE_STRENGTH]->(f)-[:FACTOR_INSTRUMENT]->(i)
RETURN p.name AS portfolio, i.name AS instrument, f.name AS factor, hb.date AS date, hb.balance, es.exposureStrength, hb.balance * es.exposureStrength AS exposure