Skip to content

Instantly share code, notes, and snippets.

@reinink

reinink/query.sql

Last active Apr 21, 2021
Embed
What would you like to do?
Text search across multiple tables using MySQL
select
first_name,
last_name
from
users
left join
companies on companies.id = users.company_id
where (
companies.name like 'TERM%' or
first_name like 'TERM%' or
last_name like 'TERM%'
)
Companies
+------------+-----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-----------------+------+-----+---------+----------------+
| id | bigint unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | MUL | NULL | |
+------------+-----------------+------+-----+---------+----------------+
+-----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| companies | 0 | PRIMARY | 1 | id | A | 10106 | NULL | NULL | | BTREE | | | YES | NULL |
| companies | 1 | companies_name_index | 1 | name | A | 8624 | NULL | NULL | | BTREE | | | YES | NULL |
+-----------+------------+----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Users
+------------+-----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-----------------+------+-----+---------+----------------+
| id | bigint unsigned | NO | PRI | NULL | auto_increment |
| company_id | bigint unsigned | NO | MUL | NULL | |
| first_name | varchar(255) | NO | MUL | NULL | |
| last_name | varchar(255) | NO | MUL | NULL | |
+------------+-----------------+------+-----+---------+----------------+
+-------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| users | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
| users | 1 | users_company_id_foreign | 1 | company_id | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
| users | 1 | users_first_name_index | 1 | first_name | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
| users | 1 | users_last_name_index | 1 | last_name | A | 0 | NULL | NULL | | BTREE | | | YES | NULL |
+-------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented Apr 24, 2020

So my goal here is to be able to simple text search across tables (users and companies) in the fastest way possible. However, I am running into issues with my user table indexes (users_first_name_index and users_last_name_index) not being used when I filter against the companies table (companies.name like 'TERM%').

I have a test database with 30,000 users and 10,000 companies (3 users per company). Here are the results I'm getting:

-- 20ms
select first_name, last_name
from users
left join companies on companies.id = users.company_id
where (companies.name like 'TERM%' or first_name like 'TERM%' or last_name like 'TERM%')

The reason it's slow is because no indexes are being used on the users table, as you can see from an explain:

| id | select_type | table     | partitions | type   | possible_keys                                | key     | key_len | ref                                                  | rows  | filtered | Extra       |
|----|-------------|-----------|------------|--------|----------------------------------------------|---------|---------|------------------------------------------------------|-------|----------|-------------|
|  1 | SIMPLE      | users     |            | ALL    | users_first_name_index,users_last_name_index |         |         |                                                      | 29997 |   100.00 |             |
|  1 | SIMPLE      | companies |            | eq_ref | PRIMARY                                      | PRIMARY | 8       | making-the-most-of-string-searching.users.company_id |     1 |   100.00 | Using where |

Removing the company filter speeds things up drastically:

-- <1ms
select first_name, last_name
from users
left join companies on companies.id = users.company_id
where (first_name like 'TERM%' or last_name like 'TERM%')

That makes sense, because it's now using the indexes on the users table:

| id | select_type | table     | partitions | type        | possible_keys                                | key                                          | key_len   | ref                                                  | rows | filtered | Extra                                                                       |
|----|-------------|-----------|------------|-------------|----------------------------------------------|----------------------------------------------|-----------|------------------------------------------------------|------|----------|-----------------------------------------------------------------------------|
|  1 | SIMPLE      | users     |            | index_merge | users_first_name_index,users_last_name_index | users_first_name_index,users_last_name_index | 1022,1022 |                                                      |   47 |   100.00 | Using sort_union(users_first_name_index,users_last_name_index); Using where |
|  1 | SIMPLE      | companies |            | eq_ref      | PRIMARY                                      | PRIMARY                                      | 8         | making-the-most-of-string-searching.users.company_id |    1 |   100.00 | Using index                                                                 |

However, it's interesting, because if I run just the companies query, it's very fast:

-- <1ms
select id from companies where name like 'TERM%'

Meaning, if I run this in two queries in my app, the first to get the matching company ids, and the second getting the users from those companies, I actually get the performance I want:

-- <1ms
select first_name, last_name
from users
left join companies on companies.id = users.company_id
where (company_id in (4,982) or first_name like 'TERM%' or last_name like 'TERM%')

When I do an explain now, you can see that it is using the users table indexes (including the users_company_id_foreign index):

| id | select_type | table     | partitions | type        | possible_keys                                                         | key                                                                   | key_len     | ref                                                  | rows | filtered | Extra                                                                                                |
|----|-------------|-----------|------------|-------------|-----------------------------------------------------------------------|-----------------------------------------------------------------------|-------------|------------------------------------------------------|------|----------|------------------------------------------------------------------------------------------------------|
|  1 | SIMPLE      | users     |            | index_merge | users_company_id_foreign,users_first_name_index,users_last_name_index | users_company_id_foreign,users_first_name_index,users_last_name_index | 8,1022,1022 |                                                      |   53 |   100.00 | Using sort_union(users_company_id_foreign,users_first_name_index,users_last_name_index); Using where |
|  1 | SIMPLE      | companies |            | eq_ref      | PRIMARY                                                               | PRIMARY                                                               | 8           | making-the-most-of-string-searching.users.company_id |    1 |   100.00 | Using index                                                                                          |

This all makes me wonder—do I really need to run this as two separate queries to get the performance I want (less than 1ms)? That seems really odd. I even tried doing this as a subquery:

-- 12ms
select first_name, last_name
from users
where (
	company_id in (select id from companies where name like 'Armstrong%') or
	first_name like 'Armstrong%' or
	last_name like 'Armstrong%'
)

This helps, but not because the users table indexes are used, but rather because MySQL is smart enough to just run the subquery once.

Am I missing something here that would allow me to run this as a single query?

@beryllium

This comment has been minimized.

Copy link

@beryllium beryllium commented Apr 24, 2020

Perhaps one of FORCE INDEX or USE INDEX would help? https://stackoverflow.com/questions/20797475/mysql-force-index-vs-use-index

@beryllium

This comment has been minimized.

Copy link

@beryllium beryllium commented Apr 24, 2020

Also, have you tried putting companies.name like 'TERM%' on the join instead of the where, and then using something like IS NOT NULL in the where?

select
  first_name,
  last_name
from
  users
left join
  companies on companies.id = users.company_id AND companies.name like 'TERM%' 
where (
    companies.name IS NOT NULL or
    first_name like 'TERM%' or
    last_name like 'TERM%'
)
@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented Apr 24, 2020

@beryllium Hey thanks for jumping in.

Trying the FORCE INDEX approach doesn't seem to help. 😕

And yes, I definitely tried putting the company search in the join statement. It helps a little, but not much. That's because it still doesn't use the users table index:

| id | select_type | table     | partitions | type   | possible_keys | key     | key_len | ref                                                  | rows  | filtered | Extra       |
|----|-------------|-----------|------------|--------|---------------|---------|---------|------------------------------------------------------|-------|----------|-------------|
|  1 | SIMPLE      | users     |            | ALL    |               |         |         |                                                      | 29750 |    20.99 | Using where |
|  1 | SIMPLE      | companies |            | eq_ref | PRIMARY       | PRIMARY | 8       | making-the-most-of-string-searching.users.company_id |     1 |   100.00 | Using where |
@shawnhooper

This comment has been minimized.

Copy link

@shawnhooper shawnhooper commented Apr 24, 2020

What about... (not tested)

SELECT a.first_name, a.last_name 
FROM users a
WHERE a.first_name like 'TERM%' or a.last_name like 'TERM%'
UNION
SELECT a.first_name, a.last_name
FROM users a
LEFT JOIN companies b ON a.company_id = b.id 
WHERE a.name like 'TERM%' 
@abhishekbhardwaj

This comment has been minimized.

Copy link

@abhishekbhardwaj abhishekbhardwaj commented Apr 24, 2020

@reinink (Continued from Twitter) here's what I was thinking for Union:

SELECT *
FROM users
WHERE (
    first_name LIKE 'TERM%' or
    last_name LIKE 'TERM%'
)

UNION

SELECT *
FROM companies
WHERE companies.name like 'TERM%'
@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented Apr 24, 2020

@shawnhooper @abhishekbhardwaj Hah! You both posted basically the same suggestion at the same time. VERY interesting. Let me see if I can work with this idea.

@abhishekbhardwaj

This comment has been minimized.

Copy link

@abhishekbhardwaj abhishekbhardwaj commented Apr 24, 2020

If this doesn't work, you might also want to try adding full-text indexes to the VARCHAR's and doing the matching using MATCH and AGAINST vs LIKE and see what the difference is.

@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented Apr 24, 2020

@abhishekbhardwaj My goal with this was to not use full-text and just do this with like...to see if it's possible to do fast.

Not only that, isn't full-text better suited for large pieces of text (like documents)? How well would it really work for names?

@abhishekbhardwaj

This comment has been minimized.

Copy link

@abhishekbhardwaj abhishekbhardwaj commented Apr 24, 2020

@reinink It works with any character field but I'm not sure of the performance implications for a Varchar. Just an idea to try and see if the performance really changes.

@willvincent

This comment has been minimized.

Copy link

@willvincent willvincent commented Apr 24, 2020

full text should work just fine against short strings (names, titles, etc) but it probably won't be faster than a like query.

@beryllium

This comment has been minimized.

Copy link

@beryllium beryllium commented Apr 24, 2020

I thought that full text would drop certain queries and return "no results" if the query matched a large percentage of overall records, among other quirks? Depending on the use cases, if my understanding is correct, it might be undesirable.

@gruzker

This comment has been minimized.

Copy link

@gruzker gruzker commented Apr 24, 2020

Hi, I don't know it is relevant or not , but I have a similar issue with using indexing from 2 different tables together, Let me explain with images.

before
before_query
after
after_query

I'm using index on both 'detail.id' column and 'payments.type' column but when using 'where exists' query it is not using the index of the 'detail.id' column and the query took 13.25 seconds and when using 'where in' query it is using the index and the query took 1.52ms.

@davedevelopment

This comment has been minimized.

Copy link

@davedevelopment davedevelopment commented Apr 24, 2020

Not a lot to offer. As soon as I see performance problems with a query like that, my gut tells me I'm really doing two queries and want a union, rather than smash all the data together and query in one.

One thing to note, in the query.sql snippet you use an INNER JOIN, then you use LEFT JOIN in your comment. That will make a difference to the query planner.

@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented Apr 24, 2020

@davedevelopment Totally. I think I'm going to go with two queries, since that tends to work better in Laravel. Doing unions on Eloquent models is a total nightmare.

And yes, sorry, they should all be left joins. I'll correct that.

@tomershay

This comment has been minimized.

Copy link

@tomershay tomershay commented Apr 24, 2020

MySQL is usually having a hard time optimizing such OR conditions, and in many cases it will not be able to use the correct indexes.
Therefore, I recommend modifying the query to include one condition per UNION, which will let MySQL use the relevant indexes for each of the conditions in the WHERE clause.

I tested this query and its execution duration is 1-3ms, compared to 500ms on my local MySQL.
Hope this is helpful and that I didn't miss anything.

SELECT
  users_first_name,
  users_last_name
FROM
  (
    (
      SELECT
        users.first_name AS users_first_name,
        users.last_name AS users_last_name
      FROM
        users
        INNER JOIN companies ON companies.id = users.company_id
      WHERE
        (
          users.last_name LIKE 'TERM%'
        )
    )
    UNION
      DISTINCT (
        SELECT
          users.first_name AS users_first_name,
          users.last_name AS users_last_name
        FROM
          users
          INNER JOIN companies ON companies.id = users.company_id
        WHERE
          (users.first_name LIKE 'TERM%')
      )
    UNION
      DISTINCT (
        SELECT
          users.first_name AS users_first_name,
          users.last_name AS users_last_name
        FROM
          users
          INNER JOIN companies ON companies.id = users.company_id
        WHERE
          (companies.name LIKE 'TERM%')
      )
  ) AS union1;
@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented Apr 24, 2020

@tomershay Awesome, thanks for sharing!

@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented Apr 24, 2020

So, thanks to everyone for helping with this problem. It blows my mind how hard this has been.

As many of you suggested, the UNION approach is very fast. The problem, however, with the UNION approach is:

  1. It's very difficult to convert to the Laravel Eloquent ORM (a priority for me).
  2. It only handles one term well. What I haven't told you this whole time is that I actually want to run a query like this:
    WHERE
        (first_name LIKE 'TERM%' OR user_name LIKE 'TERM%' OR company_name LIKE 'TERM%')
        AND
        (first_name LIKE 'TERM%' OR user_name LIKE 'TERM%' OR company_name LIKE 'TERM%')

I almost thought that @Cacobot had the solution, but after some testing, it has some really weird side affects where every SELECT column and ORDER BY added REALLY slowed it down.

As of right now, the best possible solution I can come up with is running a separate query to get the companies for each term, before running the main query. Here is what this looks like in Laravel:

<?php

class User
{
    public function scopeSearch($query, $terms)
    {
        Str::of($terms)->explode(' ')->filter()->each(function ($term) use ($query) {
            $query->where(function ($query) use ($term) {
                $query->where('email', $term)
                    ->orWhere('last_name', 'like', $term.'%')
                    ->orWhere('first_name', 'like', $term.'%')
                    ->orWhereIn('company_id', Company::query()
                        ->where('name', 'like', $term.'%')
                        ->pluck('id')
                    );
            });
        });
    }
}

$users = User::query()
    ->search(Request::input('search'))
    ->orderBy('last_name')
    ->orderBy('first_name')
    ->paginate();

It seems insane to introduce an N+1 intentionally, but I don't really have a better solution, and the results are actually really quite good. Here are the queries that are generated:

-- 2.09ms
select `id` from `companies` where `name` like 'Abbie%'

-- 480μs
select `id` from `companies` where `name` like 'Armstrong%'

-- 1.2ms (total for pagination)
select count(*) as aggregate from `users` where (`email` = 'Abbie' or `last_name` like 'Abbie%' or `first_name` like 'Abbie%' or 0 = 1) and (`email` = 'Armstrong' or `last_name` like 'Armstrong%' or `first_name` like 'Armstrong%' or `company_id` in (4211, 8674, 11838, 20590, 23361, 28869, 30739, 34837, 38305, 47335, 47424, 14761, 17342, 26008, 40535, 48473, 3691, 5159, 6059, 43551, 48434, 4629, 10168, 34613, 40971, 3342, 5030, 8895, 9359, 12397, 34029, 34248, 25601, 1561, 26392, 41562, 41496, 31682, 41638, 37991, 17725, 18232, 921, 32410, 11391, 46729, 40239, 11270, 21442, 46383, 24760, 40129, 22665, 39175, 33414, 18880, 11332, 11338, 6350, 42984, 2207, 46783, 26166, 687, 15758, 22790, 48446, 26709, 5266, 38082, 47310, 42517, 23894, 1552, 379, 29012, 11862, 43489, 9044, 25308, 1657, 14817, 29930, 14133, 27916, 29459, 10514, 6312, 9031, 20409, 5752, 33189, 4995, 28969, 7619, 38919, 49738, 8304, 39868, 46731, 10628, 34058, 48516, 32631, 10784))

-- 1.29ms
select * from `users` where (`email` = 'Abbie' or `last_name` like 'Abbie%' or `first_name` like 'Abbie%' or 0 = 1) and (`email` = 'Armstrong' or `last_name` like 'Armstrong%' or `first_name` like 'Armstrong%' or `company_id` in (4211, 8674, 11838, 20590, 23361, 28869, 30739, 34837, 38305, 47335, 47424, 14761, 17342, 26008, 40535, 48473, 3691, 5159, 6059, 43551, 48434, 4629, 10168, 34613, 40971, 3342, 5030, 8895, 9359, 12397, 34029, 34248, 25601, 1561, 26392, 41562, 41496, 31682, 41638, 37991, 17725, 18232, 921, 32410, 11391, 46729, 40239, 11270, 21442, 46383, 24760, 40129, 22665, 39175, 33414, 18880, 11332, 11338, 6350, 42984, 2207, 46783, 26166, 687, 15758, 22790, 48446, 26709, 5266, 38082, 47310, 42517, 23894, 1552, 379, 29012, 11862, 43489, 9044, 25308, 1657, 14817, 29930, 14133, 27916, 29459, 10514, 6312, 9031, 20409, 5752, 33189, 4995, 28969, 7619, 38919, 49738, 8304, 39868, 46731, 10628, 34058, 48516, 32631, 10784)) order by `last_name` asc, `first_name` asc limit 15 offset 0

In and around 5ms total, between all the queries. That's really fast. Much faster than any single query solution I could come up with.

The only thing that's a bummer is Postgres. This solution doesn't work quite as well there, although it's still better than most options. I'mg getting around 90ms total between all the queries with this approach in Postgres.

One last note, these numbers are not based on the original 30,000 users, 10,000 companies. I bumped that up to 200,000 users, and 50,000 companies.

@davedevelopment

This comment has been minimized.

Copy link

@davedevelopment davedevelopment commented Apr 24, 2020

@reinink when you say convert to Eloquent, do you mean to end up with Eloquent models as a result, or to use the query builder type thing? If the former, I quite often use raw queries to fetch the IDs, then the orm to fetch the models. If the latter, that's a weird priority 🙂

@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented Apr 24, 2020

@davedevelopment The first thing, exactly. I generally can convert most queries to the query builder, but it's sometimes hard to do with Eloquent if you're using a "computed" base table. Good idea on getting the IDs, and then doing a subsequent query. 👌

@mikeblas

This comment has been minimized.

Copy link

@mikeblas mikeblas commented Apr 24, 2020

Which version of MySQL are you using? Are you using MySQL MySQL, or Maria, or some other fork?

@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Apr 24, 2020

What about these options:

  • In Laravel you can try caching your local query scope
return Cache::remeber('key-to-associate', 15, function() {
   //here your localscope
});
  • Using like operator with a beginning match maybe could not be a good idea if when you run it this one makes a full table scan
  • What about using a regular expressions? in this way:
SELECT * FROM masters WHERE REGEXP_LIKE(degree, '^al');
@mikeblas

This comment has been minimized.

Copy link

@mikeblas mikeblas commented Apr 25, 2020

I think this query gives you results that you expect with better performance, though it has a semantic difference. It doesn't do a table scan. I tested it with MySQL 5.7.29-0ubuntu0.18.04.1 on Ubuntu .

select
  first_name,
  last_name
From users
WHERE first_name like 'NAD%'
    or last_name like 'NAD%'
OR company_id IN (select id from companies where name LIKE 'NAD%');

I think it's important to explain why this rewritten query works better.

The original query first does a join between the users and companies tables. Of the matching tuples, it finds the rows that match the LIKE conditions in the WHERE clause. The QP must do the join first to generate candidate rows for the WHERE clause. The execution plan of the original query enumerates all the rows in the companies table. For each row enumerated, it probes the Users table to find a match on company_id. If found, it further filters to see if the LIKE clauses match.

It seems like the goal (the complaint?) is that the filters aren't applied to tables on both sides of the join first, we'd only have companies that match the WHERE clause, and couldn't JOIN users that matched their clauses in the WHERE. All of the LIKE clauses are disjunctive -- they're OR statements. If we filtered a row from the Users table, we'd still maybe want to match it because OR Companies.Name LIKE was a match.

The query I wrote has a semantic difference; it will return rows from users that don't have a matching company. I think the OUTER JOIN was meant to take care of that, but I'm not positive it will. (Because NULL is NOT LIKE anything, so ... but I'm not positive of that, and I have too much lockdown time to think particularly clearly.)

My revised query will poke at the first_name index, then the last_name index, both on the users table. Then, with a subselect, it develops a list of company_ids that match using the names index on the companies table; it will probe each resulting entry in the company_ids index on the users table. All the results are then projected back to the user.

We all want SQL statements to run as fast as possible, but getting good performance has to start with careful thought about what the statement we're executing really does, and how it really works.

@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented Apr 27, 2020

@mikeblas Thanks for sharing your thoughts. You'll notice that I did try this query, but it's still giving me 10ms+ (or worse) performance, where running two queries got me <1ms per query.

@reinink

This comment has been minimized.

Copy link
Owner Author

@reinink reinink commented May 8, 2020

So, just an update here, for anyone who is interested. 😂

I managed to get this query working in Laravel, as a SINGLE query, using the union approach, where it supports searching multiple queries. 🎊

The following is my final User model scope.

Brace yourselves. 😬

public function scopeSearch($query, string $term = null)
{
    collect(str_getcsv($term, ' ', '"'))->filter()->each(function ($term) use ($query) {
        $term = $term.'%';
        $query->whereIn('id', function ($query) use ($term) {
            $query->select('id')
                ->from(function ($query) use ($term) {
                    $query->select('users.id')
                        ->from('users')
                        ->where('users.first_name', 'like', $term)
                        ->orWhere('users.last_name', 'like', $term)
                        ->union(
                            $query->newQuery()
                                ->select('users.id')
                                ->from('users')
                                ->join('companies', 'users.company_id', '=', 'companies.id')
                                ->where('companies.name', 'like', $term)
                        );
                }, 'matches');
        });
    });
}

And the generated query (for the search "bill gates microsoft"):

SELECT
    *
FROM
    `users`
WHERE
    `id` in(
        SELECT
            `id` FROM ((
                SELECT
                    `users`.`id` FROM `users`
                WHERE
                    `users`.`first_name` LIKE 'bill%'
                    OR `users`.`last_name` LIKE 'bill%')
            UNION (
                SELECT
                    `users`.`id` FROM `users`
                    INNER JOIN `companies` ON `users`.`company_id` = `companies`.`id`
                WHERE
                    `companies`.`name` LIKE 'bill%')) AS `matches`)
    AND `id` in(
        SELECT
            `id` FROM ((
                SELECT
                    `users`.`id` FROM `users`
                WHERE
                    `users`.`first_name` LIKE 'gates%'
                    OR `users`.`last_name` LIKE 'gates%')
            UNION (
                SELECT
                    `users`.`id` FROM `users`
                    INNER JOIN `companies` ON `users`.`company_id` = `companies`.`id`
                WHERE
                    `companies`.`name` LIKE 'gates%')) AS `matches`)
    AND `id` in(
        SELECT
            `id` FROM ((
                SELECT
                    `users`.`id` FROM `users`
                WHERE
                    `users`.`first_name` LIKE 'microsoft%'
                    OR `users`.`last_name` LIKE 'microsoft%')
            UNION (
                SELECT
                    `users`.`id` FROM `users`
                    INNER JOIN `companies` ON `users`.`company_id` = `companies`.`id`
                WHERE
                    `companies`.`name` LIKE 'microsoft%')) AS `matches`)
ORDER BY
    `last_name` ASC, `first_name` ASC
LIMIT 15 OFFSET 0

Overall, the final results are amazing.

Running it for three terms (ie. "bill gates microsoft"), against 1 million users, and 100,000 companies results in query times between 3-6ms.

The trick? Err..tricks?

First, using a union (as folks have suggested) allows the query builder to run each query (the users match and companies match) independently of each other, and therefore use all the available indexes. This final query now successfully uses the users_first_name_index, users_last_name_index and companies_name_index.

Second, and this was the piece I was missing before, you must run this union as a derived table, not a normal (correlated/dependent) subquery.

Without the derived table, all of subqueries are "dependent":

image

Adding the derived table makes all the problems go away:

image

I would have thought that the sub queries would only be dependent if you actually had some type of dependency between the two. For example, inner.id = outer.id, or something like that. But I guess the MySQL query planner doesn't look at the conditions to see if these two queries are actually dependent.

The only outstanding issue I have still is running a 1-2 character search. For example, searching for "b", or even "bi" is rather slow, taking upwards of 800ms on a million rows. The previous approaches that don't use the indexes are faster in those cases. I suspect that this is because the derived tables end up becoming massive, and that just takes computation time. One simple solution is to just not perform the search until you have at least 3 characters, which seems to be the sweet spot.

And, now for a quick sales pitch. 😂

I plan to cover all this in detail in my upcoming Eloquent Performance Patterns course. If that sounds interesting to you, be sure to join my mailing list on that website. 🤟

@OliverGrimsley

This comment has been minimized.

Copy link

@OliverGrimsley OliverGrimsley commented May 15, 2020

@reinink - interested in the course - the above is very close to an issue I want to solve - so wondering if the course will cover it - querying unrelated data at the same time and returning not just the ID of the model, but the table reference it is derived from, in the return?

The purpose is to allow users to just have one search box to find any one model in the search. So say you have 4 tables - COMPANY, SERVICE, PRODUCT, and DEVICE. I want to design a query that queries for a typeahead "meta" search of the description or name (or some other column) in each table, and returns the ID and the model type (i.e. table reference) so that if someone selected that, it would take them to that particular record. Right now these are 4 different searches on my application - in other words, the person needs to know they are looking for a DEVICE, and then search the DEVICE table.

I think this can be done with Unions - just including an AS column (from an SO post):

(SELECT content, title, 'msg' as type FROM messages WHERE content LIKE '%" . 
$keyword . "%' OR title LIKE '%" . $keyword ."%') 
UNION
(SELECT content, title, 'topic' as type FROM topics WHERE content LIKE '%" . 
$keyword . "%' OR title LIKE '%" . $keyword ."%') 
UNION
(SELECT content, title, 'comment' as type FROM comments WHERE content LIKE '%" . 
$keyword . "%' OR title LIKE '%" . $keyword ."%')";

In my case the tables have different columns in each table, so instead of say content, title above, say the table has description, name in one of the tables. As per your post above, however, I suspect this is not performant at all . . . and thus have not implemented it.

In any event, I am on your email list and very interested in this course and really appreciate the above gist and all the comments on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment