Just some initial notes from the Hive paper.
- complex types are implemented by making all types implement SerDe and ObjectInspector interfaces. The same mechanism supports legacy data formats also. This seems cool.
- insert, update, delete are not allowed. only creation of a new table is the available operation (along with reading)
- Only equi-join is supported.
- hourly loading of data into the warehouse is mentioned along with daily. So it seems to be an important use case. We should not assume only daily as the common case.
- half the queries are adhoc and remaining half are for dashboards and reports and they are run against separate hive clusters (because adhoc queries have unpredictable usage)