This is the process I used to put redshift user defined functions into dbt.
- Created a subdirectory:
macros/udfs/
- Created a file for each udf, e.g.
macros/udfs/f_future_date.sql
.
{% macro f_future_date() %}
CREATE OR REPLACE FUNCTION {{target.schema}}.f_future_date()
RETURNS TIMESTAMP
IMMUTABLE AS $$
SELECT '2100-01-01'::TIMESTAMP;
$$ LANGUAGE sql
{% endmacro %}
- Created a macro called
macros/create_udfs.sql
which calls each UDF macro. Note separation with;
s.
{% macro create_udfs() %}
{{f_list_custom_keys()}};
{{f_count_custom_keys()}};
{{f_future_date()}}
{% endmacro %}
- Added a on-run-start hook to my project
on-run-start:
- '{{create_udfs()}}'
- Updated the references to UDFs in my models to use the schema-versions, e.g.
SELECT
{{target.schema}}.F_FUTURE_DATE()
The “gotcha” parts were:
- I had to grant permission to users to create python udfs:
-- as your superuser
GRANT USAGE ON LANGUAGE PLPYTHONU TO claire;
- I had to use schemas because users cannot edit each others’ UDFs (i.e. if
claire
creates the UDF, andsinter
runs theCREATE OR REPLACE FUNCTION
statement, you'll get a "must be owner of function" error). Using schemas means you are creating distinct UDFs* so won’t hit this issue. I would recommend using schemas anyway to maintain separate dev/production UDFs
* Assuming each redshift user profile used by dbt has a distinct target schema.
Hi Claire!
I wonder if you have tried creating a UDF that queries another available source in your project using an input parameter. For example, I am trying to adapt a SQL Server UDF that reads a table 'Projects' from my raw database and returns a Boolean result based on the input variable 'id'. The function is the following:
I have tried adapting this UDF following your post. I am using DBT using SparkSQL for Databricks. I created the following macros:
my_project/macros/udfs/fn_ProjectIsAvailable.sql
my_project/macros/create_udfs.sql
where:
{{ source('raw_db', 'projects') }}
is the same Project table from the SQL Server database that I already ingested into a Databricks schema 'raw_db' using DBT. It is basically a transactional catalogue of projects with their ids.and
I also added this entry to my dbt_project.yml file:
Now, I would like to use this UDF/macro in another DBT model where I intend to use this UDF in this way:
my_project/models/dwh_ProjectCurrentStatus.sql
Unfortunately, when I compile the preview on this model, I realised that the 'Id' variable is not being parsed to this query:
Please notice how the
p.Id =
receives a blank.I don't really know how to solve this and I would really appreciate any feedback. Many thanks.