Skip to content

Instantly share code, notes, and snippets.

@jruts
jruts / test.json
Last active September 6, 2021 14:26
[
{
"product": { "name": "testing shit", "product_number": "asdasdasd" },
"retailer": { "id": 1, "name": "shit"},
"branding": {
"manufacturer": { "id": 1, "name": "supermanufacturer" },
"brand": { "id": 1, "name": "superbrand" },
"sub_brand": { "id": 1, "name": "supersubrand" }
}
},
@jruts
jruts / neo4j_delete_duplicate_nodes.md
Last active September 28, 2023 14:22
How to delete duplicate nodes and their relationships in neo4j with cypher?

How to delete duplicate nodes and their relationships in neo4j with cypher based on a property of that node?

The problem is easy to understand. We have 'duplicate' nodes in our database based on the 'id' field on the node properties.

Well this sounds easy enough, until you have to actually do it.

First step

My first attempt was to try and figure out which nodes are actualy duplicate (based on a property on the node). This seems to be pretty straightforward.

Cypher:

{
"nodes": [
{
"id": 516,
"label": "geo",
"title": "London"
},
{
"id": 1650,
"label": "geo",

PROPOSALS: Tag Storage in S3

Currently we are storing our tag information in json files in S3. The idea is good because we get versioning and replayability out of the box.

This is how the current structure looks like:

ci
├── amenity
├── geo
├── hotels

SUGGESTION: CloudSearch (taggy-ci)

We are using cloudsearch mainly for autocompletion and to get the ids linked to them. To make this work for multiple markets and multiple languages/market (even synonyms) we only need 3 fields.

Fields

context

The first field we need is a context that defines market and language.

market:langage E.g.: uk:en