Skip to content

Instantly share code, notes, and snippets.

@tobiemh
Last active November 15, 2018 12:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tobiemh/807e4604ab8f74ae8a6a to your computer and use it in GitHub Desktop.
Save tobiemh/807e4604ab8f74ae8a6a to your computer and use it in GitHub Desktop.
OrientDB Indexes

Summary

  • Issue 1: The less optimum index from the base class is used instead of the more optimum index on the inherited class, making it inefficient with many records.
  • Issue 2: #3624 The index used in the query is chosen based on the order in which it was created. In addition, it does not use multiple indexes to satisfy the query, making it inefficient with many records.
  • Issue 3: #3611 Using a fulltext index at the same time as another index does not work. Therefore you must use subqueries to limit the result, which is very inefficient when there are many records.
  • Issue 4: #3462 Doing a COUNT(*) in sql uses the index, but still reads every record, instead of just issuing the count on the index.
  • Issue 5: #3622 Doing a COUNT(fieldname) in sql does not use the index for that field, but reads all records.
  • Issue 6: #3623 Index is not used when filtering using multiple columns and OR

Setup

Simplified for example purposes...

# Create classes
CREATE CLASS OPV EXTENDS V
CREATE CLASS Account EXTENDS OPV
CREATE CLASS Person EXTENDS OPV
CREATE Class Tag EXTENDS OPV

# Create properties

CREATE PROPERTY OPV.account LINK Account
CREATE PROPERTY OPV.created_at DATETIME
CREATE PROPERTY OPV.updated_at DATETIME
CREATE PROPERTY OPV.deleted_at DATETIME
CREATE PROPERTY OPV.removed_at DATETIME

CREATE PROPERTY Person.firstname STRING
CREATE PROPERTY Person.lastname STRING

# Create indexes

CREATE INDEX OPV.account NOTUNIQUE_HASH_INDEX # No range queries
CREATE INDEX OPV.created_at NOTUNIQUE # May use range query
CREATE INDEX OPV.updated_at NOTUNIQUE METADATA { ignoreNullValues: false } # May be null and may use range query
CREATE INDEX OPV.deleted_at NOTUNIQUE METADATA { ignoreNullValues: false } # May be null and may use range query
CREATE INDEX OPV.removed_at NOTUNIQUE METADATA { ignoreNullValues: false } # May be null and may use range query
CREATE INDEX OPV.allowed ON OPV (account, deleted_at, removed_at) NOTUNIQUE METADATA { ignoreNullValues: false } # Dates may be null and may use range queries

CREATE INDEX Person.allowed ON Person (account, deleted_at, removed_at) NOTUNIQUE METADATA { ignoreNullValues: false } # Dates may be null and may use range queries
CREATE INDEX Person.firstname FULLTEXT ENGINE LUCENE
CREATE INDEX Person.lastname FULLTEXT ENGINE LUCENE

Index optimisation 1

Summary: The less optimum index from the base class is used instead of the more optimum index on the inherited class, making it inefficient with many records.

EXPLAIN SELECT * FROM Person WHERE removed_at IS NULL AND deleted_at IS NULL AND account=#15:106 LIMIT 100

This query uses the less optimum index for resolving the query. Instead of using Person.allowed index it uses the OPV.allowed index which includes both Person and Tag, and therefore must read more documents than necessary in order to detect whether the document is of the correct class. This gets very slow when there are many different classes and records extending the OPV class.

{
    "result": [
        {
            "compositeIndexUsed": 1,
            "involvedIndexes": [ "OPO.allowed" ],
            "evaluated": 100,
            "documentReads": 35387,
            "documentAnalyzedCompatibleClass": 100,
            "recordReads": 35387,
            "resultSize": 100,
        }
    ],
}

However if I remove the OPV.allowed index:

DROP INDEX OPV.allowed
EXPLAIN SELECT * FROM Person WHERE removed_at IS NULL AND deleted_at IS NULL AND account=#15:106 LIMIT 100

Then the query uses the Person.allowed index and does not have to read more records than the LIMIT, but this means that I am unable to use an index when querying the OPV class itself.

{
    "result": [
        {
            "compositeIndexUsed": 1,
            "involvedIndexes": [ "Person.allowed" ],
            "evaluated": 100,
            "documentReads": 100,
            "documentAnalyzedCompatibleClass": 100,
            "recordReads": 100,
            "resultSize": 100,
        }
    ],
}

Suggestions

  • Surely the index which is closer to the requested class (Person.allowed) should be used instead of the index on the base class (OPV.allowed)?
  • Perhaps it should be possible to include the @class column in an index, so that it does not have to read the records to determine which class each record is, and therefore only the OPV.allowed index would be necessary?

Index optimisation 2

Summary: The index used in the query is chosen based on the order in which it was created. In addition, it does not use multiple indexes to satisfy the query, making it inefficient with many records.

EXPLAIN SELECT * FROM Person WHERE removed_at IS NULL AND deleted_at IS NULL AND account=#15:106 LIMIT 100

Works correctly and uses the Person.allowed index.

{
    "result": [
        {
            "compositeIndexUsed": 1,
            "involvedIndexes": [ "Person.allowed" ],
            "evaluated": 100,
            "documentReads": 100,
            "documentAnalyzedCompatibleClass": 100,
            "recordReads": 100,
            "resultSize": 100,
        }
    ],
}

However if I change the query to search on another column:

EXPLAIN SELECT * FROM Person WHERE removed_at IS NULL AND deleted_at IS NULL AND account=#15:106 AND created_at > '2015-01-01' LIMIT 100

Then the query uses the OPV.account index, which is not the most optimum index at all.

If the OPV.account index is dropped and then recreated then the correct index is used which means that the index is being chosen based on the order in which was created which is surely wrong...?

{
    "result": [
        {
            "compositeIndexUsed": 1,
            "involvedIndexes": [ "OPV.account" ],
            "evaluated": 100,
            "documentReads": 3432,
            "documentAnalyzedCompatibleClass": 100,
            "recordReads": 3432,
            "resultSize": 100,
        }
    ],
}

However if I change the query to use brackets:

EXPLAIN SELECT * FROM Person WHERE (removed_at IS NULL AND deleted_at IS NULL AND account=#15:106) AND created_at > '2015-01-01' LIMIT 100

Then the query uses the Person.allowed index, but does not also make use of the OPV.created_at index, so therefore has to read through all records to determine which of the records are greater than 2015-01-01.

{
    "result": [
        {
            "compositeIndexUsed": 1,
            "involvedIndexes": [ "Person.allowed" ],
            "evaluated": 100,
            "documentReads": 3432,
            "documentAnalyzedCompatibleClass": 100,
            "recordReads": 3432,
            "resultSize": 100,
        }
    ],
}

Suggestions

  • Surely the query should use the most optimum index, not based on the order that the index was created?
  • Surely the query should use the most optimum index without the need for brackets?
  • Surely the query should use the Person.allowed index, in addition to the OPV.created_at index to optimise the query?

Index optimisation 3

Summary: Using a fulltext index at the same time as another index does not work. Therefore you must use subqueries to limit the result, which is very inefficient when there are many records.

EXPLAIN SELECT * FROM Person WHERE firstname LUCENE 'Tobie~ OR John~ OR Will~' LIMIT 100

Works correctly and uses the Person.firstname index.

{
    "result": [
        {
            "luceneIndex": true,
            "involvedIndexes": [ "Person.firstname" ],
            "evaluated": 100,
            "documentReads": 100,
            "documentAnalyzedCompatibleClass": 100,
            "recordReads": 100,
            "resultSize": 100,
        }
    ],
}
EXPLAIN SELECT * FROM Person WHERE (removed_at IS NULL AND deleted_at IS NULL AND account=#15:106) AND firstname LUCENE 'Tobie~ OR John~ OR Will~' LIMIT 100

Does not work and returns no records, but reads every record.

{
    "result": [
        {
            "compositeIndexUsed": 1,
            "involvedIndexes": [ "Person.allowed" ],
            "evaluated": 63967,
            "documentReads": 63967,
            "documentAnalyzedCompatibleClass": 63967,
            "recordReads": 63967,
            "resultSize": 0,
        }
    ],
}

However if I change the query to use a subquery:

EXPLAIN SELECT * FROM (
    SELECT * FROM Person WHERE firstname LUCENE 'Tobie~ OR John~ OR Will~'
) WHERE removed_at IS NULL AND deleted_at IS NULL AND account=#15:106 LIMIT 100

Then the query works and uses the Person.firstname lucene index, but because of it being a subquery, can not make use of any other indexes, forcing it to read all of the subquery records.

{
    "result": [
        {
            "luceneIndex": true,
            "evaluated": 3432,
            "documentReads": 3432,
            "recordReads": 3432,
            "resultSize": 100,
        }
    ],
}

Suggestions

  • Is it possible to filter a query with both fulltext and other filters, without using an inefficient subquery?

Index optimisation 4

Summary: Doing a COUNT in sql uses the index, but still reads every record, instead of just issuing the count on the index.

EXPLAIN SELECT COUNT(*) FROM Person WHERE removed_at IS NULL AND deleted_at IS NULL AND account=#15:106

Gives the following response...

{
    "result": [
        {
            "compositeIndexUsed": 1,
            "involvedIndexes": [ "Person.allowed" ],
            "evaluated": 35387,
            "documentReads": 35387,
            "documentAnalyzedCompatibleClass": 35387,
            "recordReads": 35387,
            "resultSize": 1,
        }
    ],
}

Suggestions

  • Does the query actually need to read all of the documents, or can it just count the total in the index?

Index optimisation 5

Summary: Doing a COUNT(fieldname) in sql does not use any index created on that field, but instead reads through every record.

EXPLAIN SELECT COUNT(name) FROM Person

Gives the following response...

{
    "result": [
        {
            "evaluated": 35387,
            "documentReads": 35387,
            "documentAnalyzedCompatibleClass": 35387,
            "recordReads": 35387,
            "resultSize": 1,
        }
    ],
}

Suggestions

  • Surely the index on the name field should be used to count the total number of different names?

Index optimisation 6

Summary: Index is not used when filtering using multiple columns and OR.

The following query correctly finds the best index to use...

EXPLAIN SELECT * FROM Person WHERE removed_at IS NULL AND deleted_at IS NULL AND account = #17:106 LIMIT 100
{
    "result": [
        {
            "compositeIndexUsed": 1,
            "involvedIndexes": [ "Person.allowed" ],
            "evaluated": 100,
            "documentReads": 100,
            "documentAnalyzedCompatibleClass": 100,
            "recordReads": 100,
            "resultSize": 100,
        }
    ],
}

But the following query does not use any index...

EXPLAIN SELECT * FROM Person WHERE removed_at IS NULL AND deleted_at IS NULL AND ( account = #17:106 OR account = #17:29 ) LIMIT 100
{
    "result": [
        {
            "evaluated": 35387,
            "documentReads": 35387,
            "documentAnalyzedCompatibleClass": 35387,
            "recordReads": 35387,
            "resultSize": 1,
        }
    ],
}

The following query uses the Person.account index which is CORRECT

EXPLAIN SELECT * FROM Person WHERE ( account = #17:106 OR account = #17:29 ) LIMIT 100

The following query uses the Person.allowed index which is CORRECT

EXPLAIN SELECT * FROM Person WHERE removed_at IS NULL AND deleted_at IS NULL AND account IN [#17:106,#17:29] LIMIT 100

Suggestions

  • Surely the correct index should be used regardless of whether an IN or OR is used?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment