Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Script to clean up the images tables in a Magento installation. Removes references to non-existing images, removes duplicate images, sets correct default image and deletes orphaned images from the filesystem.
<?php
/*
* This script deletes duplicate images and imagerows from the database of which the images are not present in the filesystem.
* It also removes images that are exact copies of another image for the same product.
* And lastly, it looks for images that are on the filesystem but not in the database (orphaned images).
*
* This script can most likely be optimized but since it'll probably only be run a few times, I can't be bothered.
*
* Place scripts in a folder named 'scripts' (or similar) in the Magento root.
*
* Note: needs 'fdupes' lib to run cleanDuplicates function.
*
*/
chdir(dirname(__FILE__));
require_once '../app/Mage.php';
Mage::app();
$resource = Mage::getSingleton('core/resource');
$db = $resource->getConnection('core_write');
$eavAttribute = new Mage_Eav_Model_Mysql4_Entity_Attribute();
$thumbnailAttrId = $eavAttribute->getIdByCode('catalog_product', 'thumbnail');
$smallImageAttrId = $eavAttribute->getIdByCode('catalog_product', 'small_image');
$imageAttrId = $eavAttribute->getIdByCode('catalog_product', 'image');
$cleanUpDuplicates = false;
$countProductWithoutImages = false;
$cleanUpOrphans = false;
$cleanUpTableRowsMediaGallery = false;
$cleanUpTableRowsVarchar = false;
$setDefaultImageForProductsWithoutDefaultImage = false;
if($countProductWithoutImages) {
$result = $db->fetchAll('SELECT * FROM `' . $resource->getTableName('catalog_product_entity_media_gallery') . '` as mediagallery RIGHT OUTER JOIN ' . $resource->getTableName('catalog_product_entity') . ' as entitytable ON entitytable.entity_id = mediagallery.entity_id WHERE mediagallery.value is NULL');
echo count($result) . ' products without images' . "\n";
}
if($cleanUpDuplicates) {
$directory = Mage::getBaseDir('media') . DS . 'catalog' . DS . 'product' . DS; //. '/catalog/product/z/o/';
$output = shell_exec('find ' . $directory . ' -type d -exec fdupes -n {} \;'); // find duplicates
$before = substr(shell_exec('find ' . $directory . ' -type f | wc -l'),0,-1); // count files for difference calculation
$total = shell_exec('du -h ' . $directory); $total = explode("\n",$total); array_pop($total); $total = array_pop($total); $total = explode("\t",$total); $total = array_shift($total);
$totalBefore = $total;
$chunks = explode("\n\n",$output);
/* Run through duplicates and replace database rows */
foreach($chunks as $chunk) {
$files = explode("\n",$chunk);
$original = array_shift($files);
foreach($files as $file) {
// update database where filename=file set filename=original
$original = DS . implode(DS,array_slice(explode(DS,$original), -3));
$file = DS . implode(DS,array_slice(explode(DS,$file), -3));
$oldFileOnServer = Mage::getBaseDir('media') . DS . 'catalog' . DS . 'product' . $file;
$newFileOnServer = Mage::getBaseDir('media') . DS . 'catalog' . DS . 'product' . $original;
if(file_exists($newFileOnServer) && file_exists($oldFileOnServer)) {
$db->beginTransaction();
$resultVarchar = $db->update('catalog_product_entity_varchar', array('value'=>$original), $db->quoteInto('value =?',$file));
$db->commit();
$db->beginTransaction();
$resultGallery = $db->update('catalog_product_entity_media_gallery', array('value'=>$original), $db->quoteInto('value =?',$file));
$db->commit();
echo 'Replaced ' . $file . ' with ' . $original . ' (' . $resultVarchar . '/' . $resultGallery . ')' . "\n";
unlink($oldFileOnServer);
if(file_exists($oldFileOnServer)) {
die('File ' . $oldFileOnServer . ' not deleted; permissions issue?');
}
} else {
if(!file_exists($oldFileOnServer)) {
echo 'File ' . $oldFileOnServer . ' does not exist.' . "\n";
}
if(!file_exists($newFileOnServer)) {
echo 'File ' . $newFileOnServer . ' does not exist.' . "\n";
}
}
}
}
$after = substr(shell_exec('find ' . $directory . ' -type f | wc -l'),0,-1); // calculate difference
$total = shell_exec('du -h ' . $directory); $total = explode("\n",$total); array_pop($total); $total = array_pop($total); $total = explode("\t",$total); $total = array_shift($total);
$totalAfter = $total;
echo 'In directory ' . $directory . ' the script has deleted ' . ($before-$after) . ' files - went from ' . $totalBefore . ' to ' . $totalAfter . "\n";
}
if($cleanUpOrphans) {
/* Clean up orphaned images */
$dir = Mage::getBaseDir('media') . DS . 'catalog' . DS . 'product';
$files = glob($dir . DS . '[A-z0-9]' . DS . '[A-z0-9]' . DS . '*');
foreach($files as $file) {
if(!is_file($file)) continue;
$filename = DS . implode(DS,array_slice(explode(DS,$file),-3));
//echo $filename."\n";
$results = $db->fetchAll("SELECT * FROM " . $resource->getTableName('catalog_product_entity_media_gallery') . " WHERE value='".$filename."'");
if(count($results)==0) {
unlink($file);
echo 'Deleting orphaned image ' . $filename . "\n";
$deleted++;
}
$total++;
}
echo 'Deleted ' . $deleted . ' of total ' . $total;
}
if($cleanUpTableRowsMediaGallery) {
/* Clean up images from media gallery tables */
$images = $db->fetchAll("SELECT value,value_id FROM " . $resource->getTableName('catalog_product_entity_media_gallery'));
foreach($images as $image) {
if(!file_exists(Mage::getBaseDir('media') . DS . 'catalog' . DS . 'product' . $image['value'])) {
echo $image['value'] . ' does not exist; deleting.' . "\n";
$db->query("DELETE FROM " . $resource->getTableName('catalog_product_entity_media_gallery') . " WHERE value_id = ?",array($image['value_id']));
$db->query("DELETE FROM " . $resource->getTableName('catalog_product_entity_media_gallery_value') . " WHERE value_id = ?",array($image['value_id']));
}
}
}
if($cleanUpTableRowsVarchar) {
/* Clean up images from varchar table */
$images = $db->fetchAll("SELECT value,value_id FROM " . $resource->getTableName('catalog_product_entity_varchar') . " WHERE attribute_id = ? OR attribute_id = ? OR attribute_id = ?",array($thumbnailAttrId,$smallImageAttrId,$imageAttrId));
foreach($images as $image) {
if(!file_exists(Mage::getBaseDir('media') . DS . 'catalog' . DS . 'product' . $image['value'])) {
echo $image['value'] . ' does not exist; deleting.' . "\n";
$db->query("DELETE FROM " . $resource->getTableName('catalog_product_entity_varchar') . " WHERE value_id = ?",array($image['value_id']));
}
}
}
if($setDefaultImageForProductsWithoutDefaultImage) {
$products = $db->fetchAll('SELECT sku,entity_id FROM catalog_product_entity');
foreach($products as $product) {
$chooseDefaultImage = false;
$images = $db->fetchAll('select * from catalog_product_entity_varchar where `entity_id` = ? AND (`attribute_id` = ? OR `attribute_id` = ? OR `attribute_id` = ?)', array($product['entity_id'], $imageAttrId,$smallImageAttrId,$thumbnailAttrId));
if(count($images) == 0) {
$chooseDefaultImage = true;
} else {
foreach($images as $image) {
if($image['value']== 'no_selection') {
$chooseDefaultImage = true;
break;
}
}
}
if($chooseDefaultImage) {
$defaultImage = $db->fetchOne('SELECT value FROM catalog_product_entity_media_gallery WHERE entity_id = ? AND attribute_id = ? LIMIT 1', array($product['entity_id'],82));
if($defaultImage) {
$db->query('INSERT INTO catalog_product_entity_varchar SET entity_type_id = ?, attribute_id = ?, store_id = ?, entity_id = ?, value = ? ON DUPLICATE KEY UPDATE value = ?', array(4,$imageAttrId,0,$product['entity_id'],$defaultImage, $defaultImage));
$db->query('INSERT INTO catalog_product_entity_varchar SET entity_type_id = ?, attribute_id = ?, store_id = ?, entity_id = ?, value = ? ON DUPLICATE KEY UPDATE value = ?', array(4,$smallImageAttrId,0,$product['entity_id'],$defaultImage, $defaultImage));
$db->query('INSERT INTO catalog_product_entity_varchar SET entity_type_id = ?, attribute_id = ?, store_id = ?, entity_id = ?, value = ? ON DUPLICATE KEY UPDATE value = ?', array(4,$thumbnailAttrId,0,$product['entity_id'],$defaultImage, $defaultImage));
echo 'New default image has been set for ' . $product['sku'] . PHP_EOL;
}
}
}
}
@thomasklosinsky

This comment has been minimized.

Copy link

@thomasklosinsky thomasklosinsky commented Sep 2, 2014

Line 37.: change medigallery.value to mediagallery.value

@peterjaap

This comment has been minimized.

Copy link
Owner Author

@peterjaap peterjaap commented Sep 8, 2014

Thanks @thomasklosinsky, fixed.

@jeroenvermeulen

This comment has been minimized.

Copy link

@jeroenvermeulen jeroenvermeulen commented Nov 26, 2014

I was missing images on the frontend on 2 shops after using $cleanUpOrphans and
$cleanUpDuplicates, so these functions do not seem to be 100% safe. These 2 shops are the only ones I tested it on. Magento 1.9.0.2 and 1.7.0.2.

@nbijl

This comment has been minimized.

Copy link

@nbijl nbijl commented Mar 12, 2015

When the cleanUpduplicates is false, the whole script failes because $total is then undined on line 105

@nbijl

This comment has been minimized.

Copy link

@nbijl nbijl commented Mar 12, 2015

Could you also add a way to only count the duplicates and orphans, without removing the files?

@Shumani

This comment has been minimized.

Copy link

@Shumani Shumani commented Dec 8, 2015

where must i copy and paste it??

@grantkemp

This comment has been minimized.

Copy link

@grantkemp grantkemp commented Jan 6, 2016

This looks great. nice one

@grantkemp

This comment has been minimized.

Copy link

@grantkemp grantkemp commented Jan 6, 2016

@peterjaap why don't you delete the product image via the product object?
Isn't manual delete dangerous?

@rwgraphics

This comment has been minimized.

Copy link

@rwgraphics rwgraphics commented Oct 17, 2016

Sorry to need to ask but how do I do I check to see if fdupes lib is installed. When I run the script nothing happens. I only have a blank browser screen and no indication anything is happening. I checked in Magento Connect manager to see if fdupes was mentioned and did not see it there.

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Nov 10, 2016

That's simple Lukin! Run your Terminal and write fdupes.
If you get command not found...

@seansan

This comment has been minimized.

Copy link

@seansan seansan commented Sep 8, 2017

Does this still work? (anno 2017)

Also just tested on test - ahum with a symlinked media folder ---- the symlink does not return true ;P maybe something to add

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment