Skip to content

Instantly share code, notes, and snippets.

@rvanbruggen
Last active April 26, 2021 19:36
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rvanbruggen/6006f01ed4ecd8cce604806720228a84 to your computer and use it in GitHub Desktop.
Save rvanbruggen/6006f01ed4ecd8cce604806720228a84 to your computer and use it in GitHub Desktop.
News Analysis with Neo4j, APOC and Google Cloud NLP

Quickly making sense of the news with Neo4j, APOC and Google's NLP engine

The setup

Search for articles on https://eventregistry.org/intelligence?type=articles

Download as .csv file

Or put it into a google sheet: https://docs.google.com/spreadsheets/d/1G1Dfh-6Ue2nK17CGoEMiNfIjhYO05h9EJaRX9sDUxJI/edit?usp=sharing, which you can download as CSV from https://docs.google.com/spreadsheets/d/1G1Dfh-6Ue2nK17CGoEMiNfIjhYO05h9EJaRX9sDUxJI/gviz/tq?tqx=out:csv&sheet=news

Setup the indexes

create index on :Article(title);
create index on :Article(body);
create index on :Source(name);

Load from the csv

load csv with headers from "https://docs.google.com/spreadsheets/d/1G1Dfh-6Ue2nK17CGoEMiNfIjhYO05h9EJaRX9sDUxJI/gviz/tq?tqx=out:csv&sheet=news" as line 
    create (a:Article)
    set a = line;

Refactor the article sources

match (a:Article)
merge (s:Source {name: a.`source.title`, uri: a.`source.uri`})
create (s)-[:PUBLISHES]->(a);

Enrich the news articles with GCP NLP

Take a look at the APOC page: https://neo4j.com/labs/apoc/4.1/nlp/gcp/ Uses the Google Cloud Natural Language API: https://cloud.google.com/natural-language/ Install APOC Install the dependencies on local Neo4j server: https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.1.0.6/apoc-nlp-dependencies-4.1.0.6.jar

Prepare the GCP

Create API key on https://console.cloud.google.com/apis/credentials?project=neo4j-mqls

:param apiKey =>("<<your API KEY>>")

Think about the languages

match (a:Article)
return a.lang, count(*);

Otherwise error: Failed to invoke procedure apoc.nlp.gcp.entities.graph: Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: ...

Therefore work with different queries for different languages:

For Spanish:

MATCH (a:Article)
where a.lang = "spa"
CALL apoc.nlp.gcp.entities.graph(a, {
  key: $apiKey,
  nodeProperty: "body",
  scoreCutoff: 0.01,
  writeRelationshipType: "HAS_ENTITY",
  writeRelationshipProperty: "gcpEntityScore",
  write: true
})
YIELD graph AS g
RETURN "Success!";

For English:

MATCH (a:Article)
where a.lang = "eng"
CALL apoc.nlp.gcp.entities.graph(a, {
  key: $apiKey,
  nodeProperty: "body",
  scoreCutoff: 0.01,
  writeRelationshipType: "HAS_ENTITY",
  writeRelationshipProperty: "gcpEntityScore",
  write: true
})
YIELD graph AS g
RETURN "Success!";

For German:

MATCH (a:Article)
where a.lang = "deu"
CALL apoc.nlp.gcp.entities.graph(a, {
  key: $apiKey,
  nodeProperty: "body",
  scoreCutoff: 0.01,
  writeRelationshipType: "HAS_ENTITY",
  writeRelationshipProperty: "gcpEntityScore",
  write: true
})
YIELD graph AS g
RETURN "Success!";

For French:

MATCH (a:Article)
where a.lang = "fra"
CALL apoc.nlp.gcp.entities.graph(a, {
  key: $apiKey,
  nodeProperty: "body",
  scoreCutoff: 0.01,
  writeRelationshipType: "HAS_ENTITY",
  writeRelationshipProperty: "gcpEntityScore",
  write: true
})
YIELD graph AS g
RETURN "Success!";

Result in Bloom!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment