Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save rupeshtiwari/61374b0065ca10fe285fe21be0bd2fd4 to your computer and use it in GitHub Desktop.
Save rupeshtiwari/61374b0065ca10fe285fe21be0bd2fd4 to your computer and use it in GitHub Desktop.
I Passed the AWS Certified Data Analytics – Specialty (DAS-C01) Exam!

I Passed the AWS Certified Data Analytics – Specialty (DAS-C01) Exam!

I'm happy to share that I have successfully passed the AWS Certified Data Analytics – Specialty (DAS-C01) exam. It was a challenging journey, and I'd like to share some of the key resources that helped me prepare for this achievement.

Understanding GROUP BY and HAVING Clauses in Amazon Athena

GROUP BY Clause

The GROUP BY clause is used to arrange identical data into groups. This is typically used in conjunction with aggregate functions like SUM, COUNT, AVG, MAX, or MIN to perform operations on each group of data.

Example:

SELECT product_name, SUM(sales_amount)
FROM sales_data
WHERE year = 2023
GROUP BY product_name

In this query:

  • The GROUP BY product_name clause groups the sales data by each product.
  • The SUM(sales_amount) function calculates the total sales amount for each product for the year 2023.

HAVING Clause

The HAVING clause is used to filter groups created by the GROUP BY clause. It operates similarly to the WHERE clause but is used for groups, not individual rows. It is typically used with aggregate functions.

Example:

SELECT product_name, SUM(sales_amount)
FROM sales_data
WHERE year = 2023
GROUP BY product_name
HAVING SUM(sales_amount) > 1000

In this query:

  • The GROUP BY product_name groups the sales data by product.
  • The SUM(sales_amount) function calculates the total sales amount for each product.
  • The HAVING SUM(sales_amount) > 1000 clause filters out products with total sales amount of 1000 or less.

Troubleshooting the Query

Given the data engineer's original query:

SELECT product_name, SUM(sales_amount)
FROM sales_data
WHERE year = 2023
GROUP BY product_name

This query might not return results for all products if there are products without sales in 2023 or other issues related to data inconsistencies.

Steps to Modify the Query

  1. Check for NULL values or missing data: Ensure there are no NULL values in product_name or sales_amount columns.
  2. Use a LEFT JOIN: If there's another table containing all products, perform a LEFT JOIN to ensure all products are included.
  3. Add HAVING Clause (if needed): To filter specific conditions on the grouped data.

Modified Query Example: Assuming all_products is a table containing all product names:

SELECT p.product_name, COALESCE(SUM(s.sales_amount), 0) AS total_sales
FROM all_products p
LEFT JOIN sales_data s ON p.product_name = s.product_name AND s.year = 2023
GROUP BY p.product_name
ORDER BY total_sales DESC

In this modified query:

  • LEFT JOIN ensures all products are included, even those without sales in 2023.
  • COALESCE(SUM(s.sales_amount), 0) replaces NULL sums with 0 for products with no sales.
  • ORDER BY total_sales DESC orders the results by total sales in descending order.

Summary

  • GROUP BY is used to group rows based on a specified column.
  • HAVING filters these grouped rows.
  • To troubleshoot the original query, consider potential data inconsistencies, NULL values, or missing data and use techniques like LEFT JOIN to include all products.

What is a Materialized View in a Database?

A materialized view is a database object that contains the results of a query. Unlike a regular view, which is a virtual table defined by a query and executed each time the view is queried, a materialized view stores the query result physically and periodically updates this stored result.

Key Concepts

  1. Storage of Query Results:

    • A materialized view stores the result of the query in the database. This means the data is physically saved, which can improve query performance, especially for complex queries that are expensive to compute.
  2. Periodic Refresh:

    • Materialized views can be set to refresh periodically, either on demand or automatically at specified intervals. This ensures that the data in the materialized view is up-to-date.
  3. Performance Improvement:

    • Since the results are stored, querying a materialized view is faster compared to executing the original complex query each time. This can significantly enhance performance for read-heavy operations.
  4. Use Cases:

    • Materialized views are particularly useful in data warehousing and business intelligence scenarios where complex queries need to be run frequently on large datasets.

Practical Example

Imagine a database with sales data that gets updated throughout the day. To quickly access aggregated sales data, you can create a materialized view.

Original Complex Query:

SELECT region, SUM(sales) AS total_sales
FROM sales_data
WHERE sale_date > '2023-01-01'
GROUP BY region;

Materialized View Creation:

CREATE MATERIALIZED VIEW sales_summary AS
SELECT region, SUM(sales) AS total_sales
FROM sales_data
WHERE sale_date > '2023-01-01'
GROUP BY region;

How to Refresh Materialized Views

Materialized views can be refreshed to keep the data up-to-date. The refresh can be done on demand or scheduled.

On Demand Refresh:

REFRESH MATERIALIZED VIEW sales_summary;

Scheduled Refresh: In some databases, you can set a schedule for automatic refresh. For example, in Oracle:

CREATE MATERIALIZED VIEW sales_summary
REFRESH FAST START WITH SYSDATE NEXT SYSDATE + 1/1440 -- Refresh every minute
AS
SELECT region, SUM(sales) AS total_sales
FROM sales_data
WHERE sale_date > '2023-01-01'
GROUP BY region;

Differences from Regular Views

  • Regular Views:

    • Do not store data physically.
    • Always execute the underlying query when accessed.
    • No need for refresh since the data is always current.
  • Materialized Views:

    • Store the query result physically.
    • Require periodic refresh to stay up-to-date.
    • Offer improved performance for complex and frequently accessed queries.

Sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment