Skip to content

Instantly share code, notes, and snippets.

@enlacee
Last active October 19, 2021 02:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save enlacee/8483c7d06e059de794f8eb92d46a0bea to your computer and use it in GitHub Desktop.
Save enlacee/8483c7d06e059de794f8eb92d46a0bea to your computer and use it in GitHub Desktop.
delete data duplicate in dynamo - create file JSON to process

Create a JSON file to delete duplicate documents with aws dynamodb

Data necessary ('primary key' & shortkey) important
Ref = Delete several items at once on AWS DynamoDB using AWS CLI

Requirements

  • AWS CLI
  • AWS Login
  • node 14
  • mysql
  • phpmyadmin (database manager)
  • adminer (database manager)
  • Visual Code

Step 1: Create file: resultsproducts.txt

  aws dynamodb scan --table-name UE1NPRODTESTDBADYNH2H014 --output text --max-items 1000 > resultsproducts.txt --query "Items[*].[ID.S, dateTimeCreation.N, codeAX.S, company.S, product.S]"

Step 2: Open with Excel Opensource

Change name of resultsproducts.txt to resultsproducts.csv Open file and formating by tab

Step 3: import resultsproducts.csv

Import from PHPMYADMIN TOOL (it will be create the records) Query for remove all duplicate data

  SELECT S1.`COL 1`, S1.`COL 2` FROM `resultsproductsv2_ids` AS S1 
  INNER JOIN `resultsproductsv2_ids` AS S2   
  WHERE S1.theid < S2.theid AND S1.`COL 1` = S2.`COL 1`

Subfilter: (for to great performace to delete)

  SELECT  DISTINCT ( `COL 1`), `COL 2` from resultsproductsv2_ids

Step 4: Copy and paste this select into Your text editor (visualcode)

Select all and crete the script.

  let IDS = [
      ["13a84e72-2d85-4852-9670-7adea92c8d4e", "1632326405429"],
      ["a010b860-7bd2-460e-8799-bc491670f294", "1632326405429"],
      ["1e543a6d-98e0-4150-b32a-097e688958b0", "1632326405429"],
      ["dada1596-c358-49f8-b477-35e79f294be7", "1632326405428"],
      ["9c3b17d6-a209-4dc9-aa3e-a72af3a554a8", "1632326405428"],
  ];
  let dynamoObject = { "UE1NPRODTESTDBADYNH2H014": [] };
  IDS.forEach(function(element, index) {
  // if (index >= 0 && index < 25) {
  // if (index >= 25 && index < 50) {
  // if (index >= 50 && index < 75) {
  // if (index >= 75 && index < 100) {
  // if (index >= 100 && index < 125) {
  // if (index >= 125 && index < 150) {
  // if (index >= 150 && index < 175) {
  // if (index >= 175 && index < 200) {
  if (index >= 200 && index < 225) {
  
      let objDeleteRequest = {
        "DeleteRequest": {
          "Key": {
              "ID": {
                  "S": element[0]
              },
              "dateTimeCreation": {
                "N": element[1]
            }
          }
        }
      };
      dynamoObject.UE1NPRODTESTDBADYNH2H014.push(objDeleteRequest);
    }
  });

  console.log(JSON.stringify(dynamoObject, null, 2))

Step 5: ejecute code and create file.json

This create the file: delete-duplicate-data.json useful for use aws cly DYNAMODB

  node delete-duplicate-data.js > delete-duplicate-data.json

Step 6: Inside your work folder. Where all the files are

And execute in bash

First if we have the table in dynamodb

 aws dynamodb scan --table-name UE1NPRODTESTDBADYNH2H014 --output text --max-items 1 > results.txt
  aws dynamodb batch-write-item --request-items file://delete-duplicate-data.json

NOTE-IMPORTANT!: aws dynamodb : only support 25 records into json file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment