Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save schumachera/84c0b18fc970a32beb01fedafd0d39cf to your computer and use it in GitHub Desktop.
Save schumachera/84c0b18fc970a32beb01fedafd0d39cf to your computer and use it in GitHub Desktop.
Speed up catalog_category_product full reindexing in Magento 1 EE

Speed up Magento EE 1.x catalog_category_product indexer

For refreshing the catalog_category_product index Magento EE generates SQL queries with joins over many tables to determine which product relates to what category and may show up in what category.

The queries fired by the indexer will look something like this:

INSERT IGNORE INTO `catalog_category_product_index_tmp` (`category_id`, `product_id`, `position`, `is_parent`, `store_id`, `visibility`) SELECT `cc`.`entity_id` AS `category_id`, `ccp`.`product_id`, ccp.position + 10000 AS `position`, 0 AS `is_parent`, 20 AS `store_id`, IFNULL(cpvs.value, cpvd.value) AS `visibility` FROM `catalog_category_entity` AS `cc`
 INNER JOIN `catalog_category_entity` AS `cc2` ON cc2.path LIKE CONCAT(`cc`.`path`, '/%') AND cc.entity_id NOT IN (1)
 INNER JOIN `catalog_category_product` AS `ccp` ON ccp.category_id = cc2.entity_id
 INNER JOIN `catalog_product_website` AS `cpw` ON cpw.product_id = ccp.product_id
 INNER JOIN `catalog_product_entity_int` AS `cpsd` ON cpsd.entity_id = ccp.product_id AND cpsd.store_id = 0 AND cpsd.attribute_id = 84
 LEFT JOIN `catalog_product_entity_int` AS `cpss` ON cpss.entity_id = ccp.product_id AND cpss.attribute_id = cpsd.attribute_id AND cpss.store_id = 20
 INNER JOIN `catalog_product_entity_int` AS `cpvd` ON cpvd.entity_id = ccp.product_id AND cpvd.store_id = 0 AND cpvd.attribute_id = 89
 LEFT JOIN `catalog_product_entity_int` AS `cpvs` ON cpvs.entity_id = ccp.product_id AND cpvs.attribute_id = cpvd.attribute_id AND cpvs.store_id = 20
 INNER JOIN `catalog_category_entity_int` AS `ccad` ON ccad.entity_id = cc.entity_id AND ccad.store_id = 0 AND ccad.attribute_id = 43
 LEFT JOIN `catalog_category_entity_int` AS `ccas` ON ccas.entity_id = cc.entity_id AND ccas.attribute_id = ccad.attribute_id AND ccas.store_id = 20 WHERE (cpw.website_id = '11') AND (IFNULL(cpss.value, cpsd.value) = 1) AND (IFNULL(cpvs.value, cpvd.value) IN (4, 2)) AND (IFNULL(ccas.value, ccad.value) = 1) AND (`cc`.`entity_id` >= '1') AND (`cc`.`entity_id` < 1001)

The more categories, products, root categories and store views you have the longer the execution time of that query will be. For 4 root categories with 500 categories each, more than 5000 products and 25 store views the full reindex took 3 days.

As you can see at the end of the query Magento does not select all categories at once but in steps of 1000. This step size is declared in the class constant Enterprise_Catalog_Model_Index_Action_Catalog_Category_Product_Refresh::RANGE_CATEGORY_STEP.

Try to capture one of that queries while your catalog_category_product indexer is running. Then run the select part of the query with smaller ranges, e. g. 5, 10, 50, 100, 200 and so on and write down the execution time. Multiply the time by the runs that are necessary to process all category IDs.

The step size with the least duration is the one which fits best for your Magento installation. Change the constant to that value.

For our purposes a step size of 100 was the best choice. So we were able to reduce the duration for catalog_category_product reindexing from 3 days to 40 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment