Skip to content

Instantly share code, notes, and snippets.

@georgebrock
Last active August 31, 2021 04:40
Show Gist options
  • Save georgebrock/df69c34fbbff53407963 to your computer and use it in GitHub Desktop.
Save georgebrock/df69c34fbbff53407963 to your computer and use it in GitHub Desktop.
Hockeypuck PGP key server: Postgres export / Neo4j import
COPY (SELECT substring(uuid for 16) as r_keyid, * FROM openpgp_pubkey)
TO '/tmp/openpgp_pubkey.csv'
WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE '\');
COPY (SELECT * FROM openpgp_sig
WHERE subkey_uuid IS NULL
AND uid_uuid IS NULL
AND uat_uuid IS NULL
AND sig_uuid IS NULL)
TO '/tmp/openpgp_sig_pubkey.csv'
WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE '\');
COPY (SELECT * FROM openpgp_sig
WHERE subkey_uuid IS NOT NULL)
TO '/tmp/openpgp_sig_subkey.csv'
WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE '\');
COPY (SELECT * FROM openpgp_sig
WHERE uid_uuid IS NOT NULL)
TO '/tmp/openpgp_sig_uid.csv'
WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE '\');
COPY (SELECT * FROM openpgp_sig
WHERE uat_uuid IS NOT NULL)
TO '/tmp/openpgp_sig_uat.csv'
WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE '\');
COPY (SELECT substring(uuid for 16) as r_keyid, * FROM openpgp_subkey)
TO '/tmp/openpgp_subkey.csv'
WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE '\');
COPY (SELECT * FROM openpgp_uat)
TO '/tmp/openpgp_uat.csv'
WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE '\');
COPY (SELECT * FROM openpgp_uid)
TO '/tmp/openpgp_uid.csv'
WITH (FORMAT CSV, HEADER TRUE, FORCE_QUOTE *, ESCAPE '\');
#!/bin/sh
(date; neo4j-shell < import.cypher 2>&1; date) | tee output
// Create indices
// status: success
CREATE INDEX ON :UID(uuid);
CREATE INDEX ON :UID(pubkey_uuid);
CREATE INDEX ON :PubKey(uuid);
CREATE INDEX ON :PubKey(r_keyid);
CREATE INDEX ON :SubKey(uuid);
CREATE INDEX ON :SubKey(r_keyid);
CREATE INDEX ON :UAT(uuid);
RETURN TIMESTAMP();
// Load UIDs
// status: success (4433416 nodes created; matches CSV)
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_uid.csv" AS row
CREATE (uid:UID {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
revsig_uuid: row.revsig_uuid,
keywords: row.keywords,
keywords_fulltext: row.keywords_fulltext,
pubkey_uuid: row.pubkey_uuid
});
RETURN TIMESTAMP();
// Load PubKeys
// status: success (3788102 nodes created; matches CSV)
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_pubkey.csv" AS row
CREATE (pubkey:PubKey {
r_keyid: row.r_keyid,
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
ctime: row.ctime,
mtime: row.mtime,
md5: row.md5,
sha256: row.sha256,
revsig_uuid: row.revsig_uuid,
primary_uid: row.primary_uid,
primary_uat: row.primary_uat,
algorithm: row.algorithm,
bit_len: row.bit_len,
unsupp: row.unsupp
});
RETURN TIMESTAMP();
// Create UID IDENTIFIES PubKey relationships
// status: success (4435853 relationships created)
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_pubkey.csv" AS row
WITH *
MATCH (pubkey:PubKey {uuid: row.uuid})
MATCH (uid:UID {pubkey_uuid: row.uuid})
MERGE (uid)-[:IDENTIFIES]->(pubkey);
RETURN TIMESTAMP();
// Create PubKey PRIMARILY_IDENTIFIED_BY UID relationships
// status: success (3788697 relationships created)
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_pubkey.csv" AS row
WITH *
MATCH (pubkey:PubKey {uuid: row.uuid})
MATCH (primary_uid:UID {uuid: row.primary_uid})
MERGE (pubkey)-[:PRIMARILY_IDENTIFIED_BY]->(primary_uid);
RETURN TIMESTAMP();
// Load SubKeys
// status: success (3524655 nodes created; matches CSV)
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_subkey.csv" AS row
CREATE (subkey:SubKey {
r_keyid: row.r_keyid,
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
revsig_uuid: row.revsig_uuid,
algorithm: row.algorithm,
bit_len: row.bit_len
});
RETURN TIMESTAMP();
// Create SubKey BELONGS_TO PubKey relationships
// status: success (3525090 relationships created; 435 more than CSV but there
// are some duplicate PubKeys in the database)
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_subkey.csv" AS row
WITH *
MATCH (subkey:SubKey {uuid: row.uuid})
MATCH (pubkey:PubKey {uuid: row.pubkey_uuid})
MERGE (subkey)-[:BELONGS_TO]->(pubkey);
RETURN TIMESTAMP();
// Load UATs
// status: success (61235 nodes created; matches CSV)
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_uat.csv" AS row
CREATE (uat:UAT {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
revsig_uuid: row.revsig_uuid
});
RETURN TIMESTAMP();
// Created UAT IDENTIFIES relationships
// status: success (61240 relationships created; more than the CSV)
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_uat.csv" AS row
WITH *
MATCH (uat:UAT {uuid: row.uuid})
MATCH (pubkey:PubKey {uuid: row.pubkey_uuid})
MERGE (uat)-[:IDENTIFIES]->(pubkey);
RETURN TIMESTAMP();
// Create PubKey SIGNS PubKey relationships
// status: success (182678 relationships created)
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_sig_pubkey.csv" AS row
WITH *
MATCH (signed:PubKey {uuid: row.pubkey_uuid})
MATCH (signer:PubKey {r_keyid: row.signer})
MERGE (signer)-[:SIGNS {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
sig_type: row.sig_type
}]->(signed);
RETURN TIMESTAMP();
// Create SubKey SIGNS PubKey relationships
// status: failed after 136ms (null)
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_sig_pubkey.csv" AS row
WITH *
MATCH (signed:PubKey {uuid: row.pubkey_uuid})
OPTIONAL MATCH (signer_subkey:SubKey {r_keyid: row.signer})
MERGE (signer_subkey)-[:SIGNS {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
sig_type: row.sig_type
}]->(signed);
RETURN TIMESTAMP();
// Create SubKey SIGNS SubKey relationships
// status: success (311 relationships created; with PubKeys, 197 short of the CSV)
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_sig_subkey.csv" AS row
WITH *
MATCH (signed:SubKey {uuid: row.subkey_uuid})
MATCH (signer:SubKey {r_keyid: row.signer})
MERGE (signer)-[:SIGNS {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
sig_type: row.sig_type
}]->(signed);
RETURN TIMESTAMP();
// Create PubKey SIGNS SubKey relationships
// status: success (3661324 relationships created; with SubKeys, 197 short of the CSV)
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_sig_subkey.csv" AS row
WITH *
MATCH (signed:SubKey {uuid: row.subkey_uuid})
MATCH (signer:PubKey {r_keyid: row.signer})
MERGE (signer)-[:SIGNS {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
sig_type: row.sig_type
}]->(signed);
RETURN TIMESTAMP();
// Create SubKey SIGNS UID relationships
// status: success (1126 relationships created)
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_sig_uid.csv" AS row
WITH *
MATCH (signed:UID {uuid: row.uid_uuid})
MATCH (signer:SubKey {r_keyid: row.signer})
MERGE (signer)-[:SIGNS {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
sig_type: row.sig_type
}]->(signed);
RETURN TIMESTAMP();
// Create PubKey SIGNS UID relationships
// status: failed (errored)
// java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is:
// java.net.SocketException: Operation timed out
// at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:228)
// at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161)
// at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:194)
// at java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:148)
// at com.sun.proxy.$Proxy1.interpretLine(Unknown Source)
// at org.neo4j.shell.impl.AbstractClient.evaluate(AbstractClient.java:110)
// at org.neo4j.shell.impl.AbstractClient.evaluate(AbstractClient.java:94)
// at org.neo4j.shell.impl.AbstractClient.grabPrompt(AbstractClient.java:74)
// at org.neo4j.shell.StartClient.grabPromptOrJustExecuteCommand(StartClient.java:357)
// at org.neo4j.shell.StartClient.startRemote(StartClient.java:303)
// at org.neo4j.shell.StartClient.start(StartClient.java:175)
// at org.neo4j.shell.StartClient.main(StartClient.java:120)
// Caused by: java.net.SocketException: Operation timed out
// at java.net.SocketInputStream.socketRead0(Native Method)
// at java.net.SocketInputStream.read(SocketInputStream.java:152)
// at java.net.SocketInputStream.read(SocketInputStream.java:122)
// at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
// at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
// at java.io.DataInputStream.readByte(DataInputStream.java:265)
// at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:214)
// ... 11 more
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_sig_uid.csv" AS row
WITH *
MATCH (signed:UID {uuid: row.uid_uuid})
MATCH (signer:PubKey {r_keyid: row.signer})
MERGE (signer)-[:SIGNS {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
sig_type: row.sig_type
}]->(signed);
RETURN TIMESTAMP();
// Create SubKey SIGNS UAT relationships
// status: pending
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_sig_uat.csv" AS row
WITH *
MATCH (signed:UAT {uuid: row.uat_uuid})
MATCH (signer:SubKey {r_keyid: row.signer})
MERGE (signer)-[:SIGNS {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
sig_type: row.sig_type
}]->(signed);
RETURN TIMESTAMP();
// Create PubKey SIGNS UAT relationships
// status: pending
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:/tmp/openpgp_sig_uat.csv" AS row
WITH *
MATCH (signed:UAT {uuid: row.uat_uuid})
MATCH (signer:PubKey {r_keyid: row.signer})
MERGE (signer)-[:SIGNS {
uuid: row.uuid,
creation: row.creation,
expiration: row.expiration,
state: row.state,
packet: row.packet,
sig_type: row.sig_type
}]->(signed);
RETURN TIMESTAMP();
@twocathouse
Copy link

Would this be faster if we created the indices at the end, after all the nodes have been imported?

@mike-burns
Copy link

How much memory are you giving it now?

@georgebrock
Copy link
Author

I'm creating indices first to make the lookups faster when we try to build the relationships, and using these memory settings:

wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096
neostore.nodestore.db.mapped_memory=50M
neostore.relationshipstore.db.mapped_memory=500M
neostore.propertystore.db.mapped_memory=100M
neostore.propertystore.db.strings.mapped_memory=100M
neostore.propertystore.db.arrays.mapped_memory=0M

All of this is on the advice from this blog post: http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/

@jakewins
Copy link

@Euraldius no, for these queries you'll want the indexes up-front, since MERGE will need to do a lookup by key for each entry you're merging - without the index it'll need to do a scan for each one.

On the queries that are "successful" but with differing data from the csv's - it might be worth using unique constraints for the keys there, rather than plain indexes. That way the db will ensure there are no duplicates.

I asked george just now to see if he could upload the CSVs to S3 or something, and I can run this in our lab with some probes plopped into neo, to see if we can figure out whats causing these hiccups. The 'null' error from the shell certainly is not helpful. I think that has been resolved, but I'm gonna verify that.

If we can't do that, we'll play a game of asking neo and the JVM to log a bunch of flight data and have a look at that to see if we can find the culprit(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment