Skip to content

Instantly share code, notes, and snippets.

@ecogswell
Created December 13, 2012 16:12
Show Gist options
  • Save ecogswell/4277521 to your computer and use it in GitHub Desktop.
Save ecogswell/4277521 to your computer and use it in GitHub Desktop.
Dealing With Large Collections in Magento
- Collections are a good thing
- make deailing with multiple data structure types easy: Standard and EAV
- Lazy loading keep database traffic to a minimum
- Collections have some shortcomings
- When dealing with a large number of customers, getting EAV data can be difficult
The situation:
- Database with 30,000 Customers
- We want to get EAV data for those customers. In this case we will go for first and last name
[code]
// we want to get first and last name from the eav structure
$collection = Mage::getModel("customer/customer")->getCollection();
$collection->addAttributeToSelect("*");
$collection
->joinAttribute('firstname', 'customer/firstname', 'entity_id', null, 'left')
->joinAttribute('lastname', 'customer/lastname', 'entity_id', null, 'left');
// we also want some other data for these customers
$collection->getSelect()
->joinLeft(array("clinic_link"=>$clinicLinkTable),"e.entity_id = clinic_link.customer_id",array())
->joinLeft(array("clinic"=>$clinicTable),"clinic_link.clinic_id = clinic.id",array())
->joinLeft(array("distributor_link"=>$distributorLinkTable),"e.entity_id = distributor_link.customer_id",array())
->joinLeft(array("distributor"=>$distributorTable),"distributor_link.distributor_id = distributor.id",array())
->joinLeft(array("employer_link"=>$employerLinkTable),"e.entity_id = employer_link.customer_id",array())
->joinLeft(array("employer"=>$employerTable),"employer_link.employer_id = employer.id",array())
->joinLeft(array('newsletter'=>$newsletterTable), "e.entity_id = newsletter.customer_id and newsletter.subscriber_status = 1", array());
// here we are in an admin reporting grid, so we will set the collection and return as usual
$this->setCollection($collection);
return parent::_prepareCollection();
[/code]
This approach will work fine for paginated admin report grids where
The problem arises when we attempt to export all data, without the paging limit/offset params.
Here comes the interesting part.
If you add a breakpoint before jsut before the return in the code example above, you can inspect the collection object.
[code]
$collection->getSelect()->__toString();
[/code]
will produce the following:
[code]
SELECT
`e`.*, `at_firstname`.`value` AS `firstname`,
`at_lastname`.`value` AS `lastname`,
`at_gender`.`value` AS `gender`,
`at_dob`.`value` AS `dob`,
`at_owned_dog_count`.`value` AS `owned_dog_count`,
`at_owned_cat_count`.`value` AS `owned_cat_count`,
`clinic`.`name` AS `clinic_name`,
`employer`.`name` AS `employer_name`,
`distributor`.`name` AS `distributor_name`,
IF(newsletter.subscriber_id,1,0) AS `email_opt_in`, `signup_answer`.`answer` AS `title`
FROM `customer_entity` AS `e`
LEFT JOIN `customer_entity_varchar` AS `at_firstname`
ON (`at_firstname`.`entity_id` = `e`.`entity_id`) AND (`at_firstname`.`attribute_id` = '5')
LEFT JOIN `customer_entity_varchar` AS `at_lastname`
ON (`at_lastname`.`entity_id` = `e`.`entity_id`) AND (`at_lastname`.`attribute_id` = '7')
LEFT JOIN `customer_entity_int` AS `at_gender`
ON (`at_gender`.`entity_id` = `e`.`entity_id`) AND (`at_gender`.`attribute_id` = '18')
LEFT JOIN `customer_entity_datetime` AS `at_dob`
ON (`at_dob`.`entity_id` = `e`.`entity_id`) AND (`at_dob`.`attribute_id` = '11')
LEFT JOIN `customer_entity_int` AS `at_owned_dog_count`
ON (`at_owned_dog_count`.`entity_id` = `e`.`entity_id`) AND (`at_owned_dog_count`.`attribute_id` = '175')
LEFT JOIN `customer_entity_int` AS `at_owned_cat_count`
ON (`at_owned_cat_count`.`entity_id` = `e`.`entity_id`) AND (`at_owned_cat_count`.`attribute_id` = '176')
LEFT JOIN `purinapro_groups_clinic_link` AS `clinic_link`
ON e.entity_id = clinic_link.customer_id
LEFT JOIN `purinapro_groups_clinic` AS `clinic`
ON clinic_link.clinic_id = clinic.id
LEFT JOIN `purinapro_groups_distributor_link` AS `distributor_link`
ON e.entity_id = distributor_link.customer_id
LEFT JOIN `purinapro_groups_distributor` AS `distributor`
ON distributor_link.distributor_id = distributor.id
LEFT JOIN `purinapro_groups_employer_link` AS `employer_link`
ON e.entity_id = employer_link.customer_id
LEFT JOIN `purinapro_groups_employer` AS `employer`
ON employer_link.employer_id = employer.id
LEFT JOIN `newsletter_subscriber` AS `newsletter`
ON e.entity_id = newsletter.customer_id and newsletter.subscriber_status = 1
LEFT JOIN `purinapro_registration_signup` AS `signup`
ON e.entity_id = signup.customer_id
LEFT JOIN `purinapro_registration_signup_answer` AS `signup_answer`
ON signup.id = signup_answer.signup_id and signup_answer.question_code='title'
WHERE
(`e`.`entity_type_id` = '1')
[/code]
So, when you load that grid, you would expect magento to run that query and return the results, right?
Well, that's not what happens. Instead, Magento does something else.
Magento starts by getting a list of all customer ids returned by the current where clause, then uses that list to get each of the specified EAV attributes.
You end up with something like this instead.
[code]
SELECT
`customer_entity_varchar`.`entity_id`,
`customer_entity_varchar`.`attribute_id`,
`customer_entity_varchar`.`value`
FROM
`customer_entity_varchar`
WHERE
(entity_type_id =1) AND
(entity_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)) AND
(attribute_id IN ('16', '3', '6', '12', '4', '223', '194', '179', '8', '15'))
UNION ALL
SELECT
`customer_entity_int`.`entity_id`,
`customer_entity_int`.`attribute_id`,
`customer_entity_int`.`value`
FROM
`customer_entity_int`
WHERE
(entity_type_id =1) AND
(entity_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)) AND
(attribute_id IN ('13', '14', '140', '141'))
UNION ALL
SELECT
`customer_entity_datetime`.`entity_id`,
`customer_entity_datetime`.`attribute_id`,
`customer_entity_datetime`.`value`
FROM
`customer_entity_datetime`
WHERE
(entity_type_id =1) AND
(entity_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)) AND
(attribute_id IN ('180', '197', '198'))
[/code]
As mentioned before, this works fine for paginated admin grids where you only deal with 20 or so customer records at a time.
However, if you attempt to load data for all 30,000 customers using this method by removing the page value and thus removing the limit, those IN clauses will each contain all 30,000 customer IDs and will likely choke your database server or cause a timout out on the site while waiting for a return.
My solution was to make the export run the way I thought it should have to begin with. Simply execute the SQL generated by the collection rather than running separate queries for the EAV data and the rest of the standard data.
[code]
public function getCsv()
{
$csv = '';
$this->_isExport = true;
$this->_prepareGrid();
// the collection has already been prepared and set. Grab it and get the query string.
$sql = (string)$this->getCollection()->getSelect()->__toString();
$db = Mage::getResourceSingleton('core/resource')->getReadConnection();
$result = $db->query($sql);
// the rest of this code is copied from Mage_Adminhtml_Block_Widget_Grid->getCsv
// and slightly modified to deal with this specific collection
$data = array();
foreach ($this->_columns as $column) {
if (!$column->getIsSystem()) {
$data[] = '"'.$column->getExportHeader().'"';
}
}
$csv.= implode(',', $data)."\n";
foreach($result->fetchAll(PDO::FETCH_ASSOC) as $item) {
$obj = new Mage_Customer_Model_Customer();
$obj->setData($item);
$data = array();
foreach ($this->_columns as $column) {
try {
if (!$column->getIsSystem()) {
try {
$blah = $column->getRowFieldExport($obj);
$data[] = '"' . str_replace(array('"', '\\'), array('""', '\\\\'), $blah) . '"';
}
catch(Exception $e) {
echo $e->getMessage();
}
}
}
catch(Exception $e) {
echo $e->getMessage();
}
}
$csv.= implode(',', $data)."\n";
}
if ($this->getCountTotals())
{
$data = array();
foreach ($this->_columns as $column) {
if (!$column->getIsSystem()) {
$data[] = '"' . str_replace(array('"', '\\'), array('""', '\\\\'),
$column->getRowFieldExport($this->getTotals())) . '"';
}
}
$csv.= implode(',', $data)."\n";
}
return $csv;
}
[/code]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment