Skip to content

Instantly share code, notes, and snippets.

@erikhansen
Last active April 2, 2024 01:48
Show Gist options
  • Star 26 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save erikhansen/d74f8022902b99fd8ce1c28c987fc43b to your computer and use it in GitHub Desktop.
Save erikhansen/d74f8022902b99fd8ce1c28c987fc43b to your computer and use it in GitHub Desktop.
Magento 2 - Remove duplicate store view-specific product and category data
/*
* IMPORTANT: The queries below are written for Magento Enterprise. If you're going to run them on Magento Community, you need
* to replace all instances of ".row_id" with ".entity_id". See this for context: http://magento.stackexchange.com/questions/139740/magento-2-schema-changes-for-ee-catalog-staging
*
* When importing products in Magento 2, if you specify store view codes in the store_view_code column, product data will be set at
* both the global scope as well as the specific store view scope. This is not ideal because now you have duplicate
* data at two different scopes that shouldn't actually be duplicated. The scripts below clean up this data by finding
* data set at specific store view scopes and if it's an exact match to the data set at the global store view, then it
* deletes the data set at the specific store view scope.
*
* Note: the queries below use inefficient subselects to avoid an error: http://stackoverflow.com/questions/4429319/you-cant-specify-target-table-for-update-in-from-clause
*/
/*
* This first set of queries just shows you what data is duplicated at specific store view scopes. The results returned by these
* queries are what are deleted by the queries further down. You can use these queries to determine if you have data you need to clean up.
*/
-- catalog_product_entity_datetime
SELECT a.*
FROM (SELECT * FROM catalog_product_entity_datetime) AS a
-- This inner join finds all store view-specific rows that exactly match the global scope value. The b.store_id = 0 conditional is key as it targets the global row
INNER JOIN (SELECT * FROM catalog_product_entity_datetime) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
-- catalog_product_entity_decimal
SELECT a.*
FROM (SELECT * FROM catalog_product_entity_decimal) AS a
INNER JOIN (SELECT * FROM catalog_product_entity_decimal) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
-- catalog_product_entity_int
SELECT a.*
FROM (SELECT * FROM catalog_product_entity_int) AS a
INNER JOIN (SELECT * FROM catalog_product_entity_int) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
-- catalog_product_entity_text
SELECT a.*
FROM (SELECT * FROM catalog_product_entity_text) AS a
INNER JOIN (SELECT * FROM catalog_product_entity_text) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
-- catalog_product_entity_varchar
SELECT a.*
FROM (SELECT * FROM catalog_product_entity_varchar) AS a
INNER JOIN (SELECT * FROM catalog_product_entity_varchar) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
/*
* These queries handle deleting the values duplicated at the specific store view scopes
*/
DELETE FROM `catalog_product_entity_datetime`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_product_entity_datetime) AS a
INNER JOIN (SELECT * FROM catalog_product_entity_datetime) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
DELETE FROM `catalog_product_entity_decimal`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_product_entity_decimal) AS a
INNER JOIN (SELECT * FROM catalog_product_entity_decimal) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
DELETE FROM `catalog_product_entity_int`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_product_entity_int) AS a
INNER JOIN (SELECT * FROM catalog_product_entity_int) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
DELETE FROM `catalog_product_entity_text`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_product_entity_text) AS a
INNER JOIN (SELECT * FROM catalog_product_entity_text) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
DELETE FROM `catalog_product_entity_varchar`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_product_entity_varchar) AS a
INNER JOIN (SELECT * FROM catalog_product_entity_varchar) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
/*
* Once you've run the queries above, you can find remaining store view-specific data using these queries.
* Results returned by these queries is store view-specfic data that is unique from the global data.
* You'll want to manually review these products in the admin to ensure nothing is awry.
*/
SELECT `eav_attribute`.`attribute_code`, `catalog_product_entity_datetime`.`row_id` as 'product_id', `catalog_product_entity_datetime`.`value` FROM catalog_product_entity_datetime LEFT JOIN eav_attribute ON catalog_product_entity_datetime.attribute_id = eav_attribute.attribute_id WHERE `store_id` <> '0' ORDER BY `row_id`;
SELECT `eav_attribute`.`attribute_code`, `catalog_product_entity_decimal`.`row_id` as 'product_id', `catalog_product_entity_decimal`.`value` FROM catalog_product_entity_decimal LEFT JOIN eav_attribute ON catalog_product_entity_decimal.attribute_id = eav_attribute.attribute_id WHERE `store_id` <> '0' ORDER BY `row_id`;
SELECT `eav_attribute`.`attribute_code`, `catalog_product_entity_int`.`row_id` as 'product_id', `catalog_product_entity_int`.`value` FROM catalog_product_entity_int LEFT JOIN eav_attribute ON catalog_product_entity_int.attribute_id = eav_attribute.attribute_id WHERE `store_id` <> '0' ORDER BY `row_id`;
SELECT `eav_attribute`.`attribute_code`, `catalog_product_entity_text`.`row_id` as 'product_id', `catalog_product_entity_text`.`value` FROM catalog_product_entity_text LEFT JOIN eav_attribute ON catalog_product_entity_text.attribute_id = eav_attribute.attribute_id WHERE `store_id` <> '0' ORDER BY `row_id`;
SELECT `eav_attribute`.`attribute_code`, `catalog_product_entity_varchar`.`row_id` as 'product_id', `catalog_product_entity_varchar`.`value` FROM catalog_product_entity_varchar LEFT JOIN eav_attribute ON catalog_product_entity_varchar.attribute_id = eav_attribute.attribute_id WHERE `store_id` <> '0' ORDER BY `row_id`;
/*
* Start by reviewing the comment block at the top of the product_queries.sql file above.
*
* IMPORTANT: The queries below are written for Magento Enterprise. If you're going to run them on Magento Community, you need
* to replace all instances of ".row_id" with ".entity_id". See this for context: http://magento.stackexchange.com/questions/139740/magento-2-schema-changes-for-ee-catalog-staging
*
* While Magento 2 doesn't allow you to import categories directly, it will automatically create them during product import
* if you type up the category path. If you import products and specify a value in the store_view_code column, the categories
* created during that process will also have data set at the store view-specific scope.
*/
/*
* This first set of queries just shows you what data is duplicated at specific store view scopes. The results returned by these
* queries are what are deleted by the queries further down. You can use these queries to determine if you have data you need to clean up.
*/
-- catalog_category_entity_datetime
SELECT a.*
FROM (SELECT * FROM catalog_category_entity_datetime) AS a
-- This inner join finds all store view-specific rows that exactly match the global scope value. The b.store_id = 0 conditional is key as it targets the global row
INNER JOIN (SELECT * FROM catalog_category_entity_datetime) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
-- catalog_category_entity_decimal
SELECT a.*
FROM (SELECT * FROM catalog_category_entity_decimal) AS a
INNER JOIN (SELECT * FROM catalog_category_entity_decimal) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
-- catalog_category_entity_int
SELECT a.*
FROM (SELECT * FROM catalog_category_entity_int) AS a
INNER JOIN (SELECT * FROM catalog_category_entity_int) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
-- catalog_category_entity_text
SELECT a.*
FROM (SELECT * FROM catalog_category_entity_text) AS a
INNER JOIN (SELECT * FROM catalog_category_entity_text) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
-- catalog_category_entity_varchar
SELECT a.*
FROM (SELECT * FROM catalog_category_entity_varchar) AS a
INNER JOIN (SELECT * FROM catalog_category_entity_varchar) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id;
/*
* These queries handle deleting the values duplicated at the specific store view scopes
*/
DELETE FROM `catalog_category_entity_datetime`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_category_entity_datetime) AS a
INNER JOIN (SELECT * FROM catalog_category_entity_datetime) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
DELETE FROM `catalog_category_entity_decimal`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_category_entity_decimal) AS a
INNER JOIN (SELECT * FROM catalog_category_entity_decimal) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
DELETE FROM `catalog_category_entity_int`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_category_entity_int) AS a
INNER JOIN (SELECT * FROM catalog_category_entity_int) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
DELETE FROM `catalog_category_entity_text`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_category_entity_text) AS a
INNER JOIN (SELECT * FROM catalog_category_entity_text) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
DELETE FROM `catalog_category_entity_varchar`
WHERE `value_id` IN (
SELECT a.value_id AS value_ids
FROM (SELECT * FROM catalog_category_entity_varchar) AS a
INNER JOIN (SELECT * FROM catalog_category_entity_varchar) AS b ON a.attribute_id = b.attribute_id AND a.row_id = b.row_id AND a.value = b.value AND b.store_id = 0
WHERE a.store_id <> 0
GROUP BY a.attribute_id, a.row_id, a.value, a.store_id
);
/*
* Once you've run the queries above, you can find remaining store view-specific data using these queries.
* Results returned by these queries is store view-specfic data that is unique from the global data.
* You'll want to manually review these categories in the admin to ensure nothing is awry.
*/
SELECT `catalog_category_entity_datetime`.* FROM catalog_category_entity_datetime WHERE `store_id` <> '0';
SELECT `catalog_category_entity_decimal`.* FROM catalog_category_entity_decimal WHERE `store_id` <> '0';
SELECT `catalog_category_entity_int`.* FROM catalog_category_entity_int WHERE `store_id` <> '0';
SELECT `catalog_category_entity_text`.* FROM catalog_category_entity_text WHERE `store_id` <> '0';
SELECT `catalog_category_entity_varchar`.* FROM catalog_category_entity_varchar WHERE `store_id` <> '0';
@Koc
Copy link

Koc commented Aug 6, 2018

Thank you so much for clean documented sollution.

@acampos1916
Copy link

This is very useful, I recently used it in an M2.3.1 Commerce edition. A recent CSV import "duplicated" some products even though this was a Single Store instance and the CSV only had sku, qty, and product_online. What's even weirder is that not every product imported was affected, so no way of telling what's the issue.

@iosoft
Copy link

iosoft commented May 30, 2023

Thank you very very much for this awesome fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment