Skip to content

Instantly share code, notes, and snippets.

@EliFuzz
Created June 19, 2023 06:30
Show Gist options
  • Save EliFuzz/43aa4fe25acbd7d306c2f5f6a1cea344 to your computer and use it in GitHub Desktop.
Save EliFuzz/43aa4fe25acbd7d306c2f5f6a1cea344 to your computer and use it in GitHub Desktop.
Comparison table of ETL vs ELT
Parameters ETL ELT
process data is transformed at staging server and then transferred to Datawarehouse DB data remains in the DB of the Datawarehouse
code usage compute-intensive transformations and for small amount of data high amounts of data
transformation done in server/staging area performed in the target system
time-load data first loaded into staging and later loaded into target system (time intensive) data loaded into target system only once (faster)
time-transformation needs to wait for transformation to complete. As data size grows, transformation time increases speed is never dependant on the size of the data
time- maintenance needs high maintenance for selecting data to load and transform low maintenance as data is always available
implementation complexity at an early stage, easier to implement requires deep knowledge of tools and expert skills
support for data warehouse used for on-premises, relational and structured data used in scalable cloud infrastructure which supports structured, unstructured data sources
support for lake/mart/lakehouse no yes
complexity oads only the important data, as identified at design time involves development from the output-backward and loading only relevant data
cost high costs for small and medium businesses low entry costs using online SasS Platforms
lookups both facts and dimensions need to be available in staging area all data will be available because extract and load occur in one single action
aggregations complexity increase with the additional amount of data in the dataset power of the target platform can process significant amount of data quickly
calculations overwrites existing column or need to append the dataset and push to the target platform easily add the calculated column to the existing table
maturity mature and well documented relatively new concept and complex to implement
hardware most tools have unique hardware requirements that are expensive being Saas hardware cost is not an issue
support unstructured data mostly supports relational data supports
size/type of data set best for small relational data sets with complex transformations relevant to analysis goals can process any big data, structured or unstructured
compliance/privacy better for GDPR, HIPAA, and CCPA compliance since sensitive data can be omitted before loading having more risk of exposing private data
data output tipically structured structured, semi-structured, unstructured
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment