Skip to content

Instantly share code, notes, and snippets.

@gonter
Last active October 28, 2019 18:29
Show Gist options
  • Save gonter/63730d96c2f7fdd3ae05f9467082b9f4 to your computer and use it in GitHub Desktop.
Save gonter/63730d96c2f7fdd3ae05f9467082b9f4 to your computer and use it in GitHub Desktop.
mysql query mystery

Question

Why does query 1 not return the same items as query 2? Effectively, it works the same as query 4.

background

MySQL Server version: 5.1.73-log Source distribution

mysql client Ver 14.14 Distrib 5.7.22, for Linux (x86_64) using EditLine wrapper

schema

mysql> desc eprint;
+---------------------------+---------------------+------+-----+---------+-------+
| Field                     | Type                | Null | Key | Default | Extra |
+---------------------------+---------------------+------+-----+---------+-------+
| eprintid                  | int(11)             | NO   | PRI | NULL    |       |
...
| abstract_nicht_anzeigen   | set('TRUE','FALSE') | YES  |     | NULL    |       |

value distribution

mysql> select count(*),abstract_nicht_anzeigen from eprint group by abstract_nicht_anzeigen;                                                             +----------+-------------------------+
| count(*) | abstract_nicht_anzeigen |
+----------+-------------------------+
|    19766 | NULL                    |
|    32177 |                         |
|      736 | TRUE                    |
+----------+-------------------------+
3 rows in set (0.23 sec)

query 1

SELECT eprintid,ac_nummer,urn,thesis_type
  FROM eprint
 WHERE eprint_status="archive" and metadata_visibility="show" and date_sperre_year is null
       and einverstaendnis=TRUE and full_text_status="public"
       and abstract_nicht_anzeigen<>TRUE and urn is not null;
...       
19544 rows in set (0.29 sec)

query 2

SELECT eprintid,ac_nummer,urn,thesis_type
  FROM eprint
 WHERE eprint_status="archive" and metadata_visibility="show" and date_sperre_year is null
       and einverstaendnis=TRUE and full_text_status="public"
       and (abstract_nicht_anzeigen is NULL or abstract_nicht_anzeigen="") and urn is not null;
...
32651 rows in set (0.02 sec)

query 3

SELECT eprintid,ac_nummer,urn,thesis_type
  FROM eprint
 WHERE eprint_status="archive" and metadata_visibility="show" and date_sperre_year is null
       and einverstaendnis=TRUE and full_text_status="public"
       and abstract_nicht_anzeigen is NULL and urn is not null;
...
13107 rows in set (0.30 sec)

query 4

SELECT eprintid,ac_nummer,urn,thesis_type
  FROM eprint
 WHERE eprint_status="archive" and metadata_visibility="show" and date_sperre_year is null
       and einverstaendnis=TRUE and full_text_status="public"
       and abstract_nicht_anzeigen="" and urn is not null;
...
19544 rows in set (0.29 sec)
@gonter
Copy link
Author

gonter commented Jul 4, 2019

Ok, I figured it out.

  1. NULL can't be compared to anything ([1]). For this reason, abstract_nicht_anzeigen<>TRUE only operrates on those rows where this column actually has a value.
  2. TRUE is 1 which happens to be the index value of the string "TRUE" in the set definition. This is a lucky conincidence. If the column was declared as set ('FALSE', 'TRUE') it would select those rows where the value is "FALSE".
  3. The guys who created this database, accidentially defined that column (and others) as set instead of enum('TRUE','FALSE') . This accident allows for rows to have the value ``"TRUE, FALSE"` which does not make sense.
  4. query 2 from above is not the right query either, it should be (quoted "TRUE" and alls possible values of abstract_nicht_anzeigen excluding "TRUE"):
SELECT eprintid,ac_nummer,urn,thesis_type
  FROM eprint
 WHERE eprint_status="archive" and metadata_visibility="show" and date_sperre_year is null
       and einverstaendnis="TRUE" and full_text_status="public"
       and (abstract_nicht_anzeigen is NULL or abstract_nicht_anzeigen<>"TRUE") and urn is not null;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment