Assuming:
table1
andtable2
both have a date partition with a column name ofd
table2
referencestable1
by the columnt1_id
table1
andtable2
are stored in HDFS as RCFile
The following correctly produces NULL
(empty) values:
WITH
t1 AS (SELECT t1.id FROM table1 t1 WHERE t1.d='2014-02-24'),
t2 AS (SELECT t2.t1_id, t2.id FROM table2 t2 WHERE t2.d='2014-02-24')
SELECT
t1.id,
t2.t1_id,
t2.id
FROM t1
LEFT OUTER JOIN t2 ON t1.id = t2.t1_id
However, the following incorrectly produces no NULL
(empty) values where there are no table2
with table2.t1_id = table1.id
:
SELECT
t1.id,
t2.t1_id,
t2.id
FROM t1
LEFT OUTER JOIN t2 ON t1.id = t2.t1_id
WHERE
t1.d = '2014-02-24' AND
t2.d = '2014-02-24'