Skip to content

Instantly share code, notes, and snippets.

@jraddaoui
Last active January 8, 2020 22:08
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jraddaoui/17f83962edb34dbc3c1407e4d83867de to your computer and use it in GitHub Desktop.
Save jraddaoui/17f83962edb34dbc3c1407e4d83867de to your computer and use it in GitHub Desktop.
CTE queries for AtoM hierarchy management

MODELS WITH NESTED SET:

  • Actor
  • Repository (inherited)
  • Function
  • InformationObject
  • Menu
  • PhysicalObject
  • Taxonomy
  • Term
  • AclGroup

USES OF NESTED SET:

  • Full-width treeview. It's loaded dinamically with lazzy load and "pagination". Current implementation will not benefit from this changes but using a siblings order column shouldn't be a limitation.
  • Get descendant event types on IO calculateDatesAction and arCalculateDescendantDatesJob.
  • Bad? check on default moveSuccess.
  • Bad? check on term autocompleteAction.
  • Check descendants to:
    • Expand old IO _treeView and treeViewSuccess nodes.
    • Add field to IO updatePublicationStatusSuccess.
    • Expand old term _treeView and treeViewSuccess nodes.
    • Know if a record is inside a collection on QubitOaiCollectionSet.
  • Get descendants on:
    • DO imageflowComponent.
    • Term deleteAction.
    • arGenerateReportJob may need order by LFT!!.
    • QubitOaiCollectionSet.
    • digitalObjectRegenDerivativesTask.
    • exportBulkBaseTask may need order by LFT!!.
    • sfSkosPlugin indexAction.
    • deleteDraftsTask.
    • unlinkCreatorTask.
    • arElasticSearchInformationObject.
    • arElasticSearchTermPdo (only the count).
    • QubitAcl.
    • sfEadPlugin indexSuccessBody.xml.
  • Update descendants on:
    • IO editAction (using BasePeer and criteria).
  • Get ancestors on:
    • IO creatorComponent.
    • IO creatorDetailComponent.
    • IO indexAction.
    • IO treeViewComponent.
    • IO _genreAccessPoints.
    • IO inventorySuccess.
    • IO itemOrFileListSuccess.
    • Object _placeAccessPoints.
    • Object _subjectAccessPoints.
    • Term treeviewComponent.
    • Term indexSuccess.
    • QubitHelper render_show_repository.
    • arGenerateReportJob.
    • QubitValidatorDates.
    • unlinkCreatorTask.
    • arDacsPlugin indexAction.
    • arDacsPlugin indexSuccess.
    • arElasticSearchInformationObjectPdo and get collection root.
    • ApiInformationObjectsReadAction.
    • QubitAcl.
    • aclGroup indexActorAclAction.
    • aclGroup indexInformationObjectAclAction.
    • aclGroup indexRepositoryAclAction.
    • aclGroup indexTermAclAction.
    • sfDcPlugin.
    • sfDcPlugin indexSuccess.
    • sfEadPlugin.
    • sfEadPlugin indexSuccessBodyDidElement.xml.
    • sfIsadPlugin indexAction.
    • sfIsadPlugin indexSuccess.
    • sfModsPlugin indexSuccess.
    • sfRadPlugin indexSuccess.
  • Used lft to sort on:
    • arInformationObjectCsvExportJob.
    • arInformationObjectXmlExportJob.
    • arObjectMoveJob.
    • QubitCsvTransform.
    • arElasticSearchActor.
    • arElasticSearchFunction.
    • arElasticSearchInformationObject.
    • arElasticSearchTerm
  • Schema declaration (YML and SQL).
  • Table maps.
  • Base models (lib/model/om):
    • Why there is less matches on function?
    • Can the old nested set methods be removed from QubitDigitalObject?
    • addOrderByPreorder only used to order ancestors and descendants.
    • Replace several helper functions by raw CTE queries.
  • Models (lib/model).
  • QubitObjectBuilder:
    • Change uses of nestedSetLeftColumn and nestedSetRightColumn.
    • Create a new parentColumn type and use it to add new hierarchy interaction methods to replace the existing ones.
    • Create a new orderColumn type and use to add new update siblings methods.
    • It may be needed to combine the two to add some of the methods.
  • Migrations:
    • How much old migrations depend on the models lft and rgt fields?
    • Removing them may require to change old migrations (e.g.: arUpgrader110:744).
    • Or force an upgrade to 2.5 before upgrading to 2.6.
  • Descendants count on plugins/sfSkosPlugin/test/unit/importTest.
  • Full nested set update on propelBuildNestedSetTask.
  • Added to ES index mapping for IOs (only lft).
  • Crazy JOIN on arElasticSearchActorPdo similar to the one Steve fixed.
  • Steve fix could probably be improved by using CTE on single IO updates. However, the current implementation is probably better for the full indexing.
  • Generate tags based on depth on EAD XML export. Use of CTE and level in this case.

OTHER THINGS TO CHECK:

  • Use of transaction in deleteDescriptionTask, which calls $io->deleteFullHierarchy().
  • Avoid nested set update until the end on deleteDraftsTask.
  • Avoid nested set update in other hierarchy deletions like recently done for IOs.
  • Need of nested set on ...?:
    • Actor
    • Repository
    • Function
    • PhysicalObject
    • Taxonomy

TREE VS LEVELED ORDER:

TREE:

  • Top-level
  • Child 1
  • Grandchild 1.1
  • Grandchild 1.2
  • Child 2
  • Grandchild 2.1
  • Grandchild 2.2

LEVELED:

  • Top-level
  • Child 1
  • Child 2
  • Grandchild 1.1
  • Grandchild 1.2
  • Grandchild 2.1
  • Grandchild 2.2
/*
GET FULL HIERARCHY:
- Use io1.id on the where to get itself + all descendants.
- Use io1.parent_id on the where to get only descendants.
- Generate level to order alongside lft (lft can be replaced by an order column).
- The current ordering by left to get a tree representation instead of
a leveled resulset can be complicated, but hopefully it's not needed.
- Current: 0,269 sec.
- New: 0,339 sec.
- New (without level): 0,301 sec.
*/
WITH RECURSIVE cte AS
(
SELECT io1.id, io1.parent_id, io1.lft, io1.rgt, 1 as lev FROM information_object io1 WHERE io1.id=581488
UNION ALL
SELECT io2.id, io2.parent_id, io2.lft, io2.rgt, cte.lev + 1 FROM information_object io2 JOIN cte ON cte.id=io2.parent_id
)
SELECT id, parent_id, lft, rgt FROM cte ORDER BY lev, lft;
/* NON CTE IMPROVEMENT (using only lft to avoid scaning for the rgt column) */
SELECT id, parent_id, lft, rgt FROM information_object WHERE lft >= 1393334 AND lft <= 1562511 ORDER BY lft;
/* CURRENT QUERY */
SELECT id, parent_id, lft, rgt FROM information_object WHERE lft >= 1393334 AND rgt <= 1562511 ORDER BY lft;
/*
GET FULL HIERARCHY:
- Use io1.id on the where to get itself + all descendants.
- Use io1.parent_id on the where to get only descendants.
- Generate level to reverse, for example for deletion.
- Same times than the not reversed.
*/
WITH RECURSIVE cte AS
(
SELECT io1.id, io1.parent_id, io1.lft, io1.rgt, 1 as lev FROM information_object io1 WHERE io1.id=581488
UNION ALL
SELECT io2.id, io2.parent_id, io2.lft, io2.rgt, cte.lev + 1 FROM information_object io2 JOIN cte ON cte.id=io2.parent_id
)
SELECT id, parent_id, lft, rgt FROM cte ORDER BY lev DESC;
/* CURRENT QUERY */
SELECT id, parent_id, lft, rgt FROM information_object WHERE lft >= 1393334 AND rgt <= 1562511 ORDER BY rgt;
/*
GET ANCESTORS:
- Use resource's id on the first where to get ancestors + ifself.
- Use resource's parent id on the first where to get only ancestors.
- Exclude ROOT resource on the second where.
- Generate level to reverse, for example for a breadcrumb representation.
- Current: 0,267 sec.
- New: 0,00053 sec.
*/
WITH RECURSIVE cte AS
(
SELECT io1.id, io1.parent_id, io1.lft, io1.rgt, 1 as lev FROM information_object io1 WHERE io1.id=2003417
UNION ALL
SELECT io2.id, io2.parent_id, io2.lft, io2.rgt, cte.lev + 1 FROM information_object io2 JOIN cte ON cte.parent_id=io2.id WHERE io2.id!=1
)
SELECT id, parent_id, lft, rgt FROM cte ORDER BY lev DESC;
/* CURRENT QUERY */
SELECT id, parent_id, lft, rgt FROM information_object WHERE lft < 1393556 AND rgt > 1393557 AND id!=1 ORDER BY lft;
/*
GET ANCESTORS ORM:
- Similar to the one above, using a subquery to ease ORM integration.
- Current: 0,268 sec.
- New: 0,00055 sec.
*/
SELECT `object` . `class_name` ,
`object` . `created_at` ,
`object` . `updated_at` ,
`object` . `id` ,
`object` . `serial_number` ,
`information_object` . `id` ,
`information_object` . `identifier` ,
`information_object` . `oai_local_identifier` ,
`information_object` . `level_of_description_id` ,
`information_object` . `collection_type_id` ,
`information_object` . `repository_id` ,
`information_object` . `parent_id` ,
`information_object` . `description_status_id` ,
`information_object` . `description_detail_id` ,
`information_object` . `description_identifier` ,
`information_object` . `source_standard` ,
`information_object` . `display_standard_id` ,
`information_object` . `lft` ,
`information_object` . `rgt` ,
`information_object` . `source_culture`
FROM `object` , `information_object`
WHERE `information_object` . `id` IN (
WITH RECURSIVE cte AS
(
SELECT io1.id, io1.parent_id FROM information_object io1 WHERE io1.id=771558
UNION ALL
SELECT io2.id, io2.parent_id FROM information_object io2 JOIN cte ON cte.parent_id=io2.id
)
SELECT id FROM cte
)
AND `information_object` . `id` = `object` . `id`
ORDER BY `information_object` . `lft` asc;
/* CURRENT QUERY */
SELECT `object` . `class_name` ,
`object` . `created_at` ,
`object` . `updated_at` ,
`object` . `id` ,
`object` . `serial_number` ,
`information_object` . `id` ,
`information_object` . `identifier` ,
`information_object` . `oai_local_identifier` ,
`information_object` . `level_of_description_id` ,
`information_object` . `collection_type_id` ,
`information_object` . `repository_id` ,
`information_object` . `parent_id` ,
`information_object` . `description_status_id` ,
`information_object` . `description_detail_id` ,
`information_object` . `description_identifier` ,
`information_object` . `source_standard` ,
`information_object` . `display_standard_id` ,
`information_object` . `lft` ,
`information_object` . `rgt` ,
`information_object` . `source_culture`
FROM `object` , `information_object`
WHERE `information_object` . `lft` <= 1393556
AND `information_object` . `rgt` >= 1393557
AND `information_object` . `id` = `object` . `id`
ORDER BY `information_object` . `lft` asc;
/*
GET DESCENDANTS EVENT TYPES:
- Current: 0,274 sec.
- New: 0,0032 sec.
*/
WITH RECURSIVE cte AS
(
SELECT io1.id FROM information_object io1 WHERE io1.parent_id=2951985
UNION ALL
SELECT io2.id FROM information_object io2 JOIN cte ON cte.id=io2.parent_id
)
SELECT DISTINCT e.type_id FROM cte INNER JOIN event e ON cte.id=e.object_id;
/* CURRENT QUERY */
SELECT
DISTINCT e.type_id
FROM
information_object i
INNER JOIN event e ON i.id=e.object_id
WHERE
i.lft > 161310
AND i.rgt < 162249;
/*
COUNT TERM DESCENDANTS:
- Current: 0,0011 sec.
- New (CTE): 0,0050 sec.
- New (non CTE): 0,00028 sec.
*/
WITH RECURSIVE cte AS
(
SELECT t1.id FROM term t1 WHERE t1.parent_id=110
UNION ALL
SELECT t2.id FROM term t2 INNER JOIN cte ON cte.id=t2.parent_id
)
SELECT COUNT(*) FROM cte;
/* NON CTE IMPROVEMENT (not even needed if you have the rgt and lft values) */
SELECT (rgt - lft - 1) DIV 2 FROM term WHERE id=110;
/* CURRENT QUERY */
SELECT COUNT(*) FROM term WHERE lft > 1 AND rgt < 6306;
<?php
$timer = new QubitTimer;
$count = 0;
$results = QubitPdo::fetchAll('SELECT id FROM information_object ORDER BY lft DESC');
foreach ($results as $result)
{
print("\n".++$count."\n");
$io = QubitInformationObject::getById($result->id);
foreach ($io->getAncestors()->andSelf()->orderBy('lft') as $ancestor)
{
print(" ".$ancestor->getTitle(array('cultureFallback' => true))."\n");
}
}
print("Total time: ".$timer->elapsed()." seconds.\n")
<?php
$timer = new QubitTimer;
$options = array(
CURLOPT_URL => 'http://nginx/index.php/informationobject/browse?levels=200&collection=30765&onlyMedia=1&topLod=0&limit=100',
CURLOPT_RETURNTRANSFER => true
);
$count = 0;
$responseTime = 0;
while ($count < 100)
{
print(++$count."\n");
$ch = curl_init();
curl_setopt_array($ch, $options);
curl_exec($ch);
$info = curl_getinfo($ch);
$responseTime += $info['total_time'];
curl_close($ch);
}
print("Total time: ".$timer->elapsed()." seconds.\n");
print("Avg. response time: ". $responseTime / 100 ." seconds.\n");
<?php
$timer = new QubitTimer;
$count = 0;
$results = QubitPdo::fetchAll('SELECT id FROM information_object ORDER BY lft DESC');
foreach ($results as $result)
{
$io = QubitInformationObject::getById($result->id);
print(++$count." -> ".count($io->descendants)."\n");
}
print("Total time: ".$timer->elapsed()." seconds.\n");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment