Skip to content

Instantly share code, notes, and snippets.

@cj2001
Created February 9, 2021 23:51
Show Gist options
  • Save cj2001/a6c193019f98793e8bed2e08179ba7c6 to your computer and use it in GitHub Desktop.
Save cj2001/a6c193019f98793e8bed2e08179ba7c6 to your computer and use it in GitHub Desktop.
Add arXiv paper nodes and all edges
def add_papers(rows, batch_size=5000):
# Adds paper nodes and (:Author)--(:Paper) and
# (:Paper)--(:Category) relationships to the Neo4j graph as a
# batch job.
query = '''
UNWIND $rows as row
MERGE (p:Paper {id:row.id}) ON CREATE SET p.title = row.title
// connect categories
WITH row, p
UNWIND row.category_list AS category_name
MATCH (c:Category {category: category_name})
MERGE (p)-[:IN_CATEGORY]->(c)
// connect authors
WITH distinct row, p // reduce cardinality
UNWIND row.cleaned_authors_list AS author
MATCH (a:Author {name: author})
MERGE (a)-[:AUTHORED]->(p)
RETURN count(distinct p) as total
'''
return insert_data(query, rows, batch_size)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment