Skip to content

Instantly share code, notes, and snippets.

@cupdike
Last active August 19, 2020 16:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save cupdike/737966e848a8e4143c3a6c0249edc291 to your computer and use it in GitHub Desktop.
Save cupdike/737966e848a8e4143c3a6c0249edc291 to your computer and use it in GitHub Desktop.
Airflow Connection to Remote Kerberized Hive Metastore
# Let's say this is your kerberos ticket (likely from a keytab used for the remote service):
Ticket cache: FILE:/tmp/airflow_krb5_ccache
Default principal: hive/myserver.myrealm@myrealm
Valid starting Expires Service principal
06/14/2018 17:52:05 06/15/2018 17:49:35 krbtgt/myrealm@myrealm
renew until 06/17/2018 05:49:33
# Here's what creating a connection looks like:
airflow connections --add \
--conn_id metastore_cluster1 \
--conn_type 'hive_metastore' \
--conn_host 'myserver.mydomain \
--conn_port 9083 \
--conn_extra '{"authMechanism":"GSSAPI", "kerberos_service_name":"hive"}'
# But this won't work unless you make sure your airflow.cfg has this:
security = kerberos
# Note: I'm not sure what else that "security = kerberos" setting really does.
# I had it unset for quite a while and was still able to integrate with Kerberized Hive just fine.
# Test it from a CLI:
(venv) [airflow@mymachine dags]$ python
>>> from airflow.hooks import hive_hooks
>>> hm = hive_hooks.HiveMetastoreHook(metastore_conn_id='metastore_cluster1')
[2018-06-14 18:07:34,736] {base_hook.py:80} INFO - Using connection to: myserver.mydomain
>>> hm.get_databases()
['default']
# Note: If you are using airflow-kerberos ticket renewer, you'll want something like the below in your .bashrc
# Otherwise, airflow user may use a different ticket cache with expired tickets.
# And make sure that airflow.cfg is using the save value for ccache.
export KRB5CCNAME=/tmp/airflow_krb5_ccache
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment