Skip to content

Instantly share code, notes, and snippets.

@primaryobjects
Last active July 8, 2024 20:36
Show Gist options
  • Save primaryobjects/f58b068dea19a59bbb7cfc22e7a7bf30 to your computer and use it in GitHub Desktop.
Save primaryobjects/f58b068dea19a59bbb7cfc22e7a7bf30 to your computer and use it in GitHub Desktop.
Get Started with Data Engineering on Databricks certification exam https://www.databricks.com/learn/training/getting-started-with-data-engineering
1)
Which statement describes the Databricks workspace?
** It is a solution for organizing assets within Databricks.
It is a classroom setup for running Databricks lessons and exercises.
It is a set of predefined tables and path variables within Databricks.
It is a mechanism for cleaning up lesson-specific assets created during a learning session.
Score: 10.00
Multiple choice
2)
What assets can be accessed from and organized within the Databricks workspace?
Virtual machine configurations for clusters
Machine learning models and algorithms
** Notebooks and files
Cloud storage accounts
Score: 10.00
Multiple choice
3)
Which statement describes Databricks Repos?
** A capability centered around continuous integration of assets in Databricks and external Git repositories.
A tool for managing virtual environments and dependencies in Databricks
A feature for scheduling and orchestrating data pipelines within Databricks
An integrated development environment (IDE) specifically designed for Databricks notebooks
Score: 10.00
Multiple choice
4)
What is the basic compute structure of Databricks?
Data Warehouses
Databricks Instances
** Databricks Clusters
Data Nodes
Multiple choice
5)
As a Data Engineer, which of the following would you use to orchestrate data tasks?
** Workflow Jobs
Databricks AI Library
Spark MLlib
Databricks Academy
Score: 10.00
Multiple choice
6)
How do clusters and warehouses differ in their roles?
Clusters handle machine learning tasks, while SQL warehouses focus on data processing
** Clusters provide compute resource for running notebooks and warehouses work specifically with SQL queries
Clusters are designed for data visualization, while SQL warehouses execute SQL queries
Clusters offer storage optimization, while SQL warehouses provide data replication
Score: 10.00
Multiple choice
7)
What are the high-level configuration options available when setting up a cluster?
Data Replication, Disk Encryption, and Data Partitioning
Notebook Sharing, Version Control, and User Permissions
** Autoscaling Options, Access Mode, and Cluster Name
Data Transformation Pipelines, Machine Learning Models, and Data Visualization
Score: 10.00
Multiple choice
8)
What are the primary high-level configuration options available when setting up a warehouse?
Data Replication, Notebook Sharing, and Data Partitioning
** Compute Cluster Size, Auto-stop Timer, and Scaling Parameters
Query Execution Speed, Access Mode, and Visualization Mode
Data Compression, Cluster Name, and Query Optimization
Multiple choice
9)
What are the benefits of using the available serverless compute features?
Enhanced query performance for all workloads
Fixed and predetermined billing structure
Manual adjustment of resource allocation
** Cost efficiency, scalability, and simplified management
Score: 10.00
Multiple choice
10)
What is the primary interface used by data engineers when working with Databricks?
Visual Studio Code
Data Dashboards
Command Line Interface
** Databricks Notebooks
Score: 10.00
Multiple choice
11)
What are the common use cases for data engineers when working with Notebooks?
Writing Research Papers
** Data Exploration, Reporting, and Dashboarding
Creating Mobile Apps
Playing Online Games
Score: 10.00
Multiple choice
12)
How does Databricks store data?
Data is stored in physical servers
Data is stored in cloud-based web servers
** Data is stored in cloud object storage locations and accessed via Databricks
Data is stored on local computers
Score: 10.00
Multiple choice
13)
What are the benefits of data storage in the data lakehouse architecture across roles and Databricks services.
Faster data visualization for analysts
** Simplify the ETL processing and ensures integrity
Increased code complexity for data engineers
Enhanced security for data scientists
Score: 10.00
Multiple choice
14)
What is the optimized storage layer that serves as the foundation for data storage in a data lakehouse architecture?
Apache Spark
** Delta Lake
Apache Parquet
MongoDB
Score: 10.00
Multiple choice
15)
What is the default table type for all tables in Databricks?
** Delta tables
Temporary tables
External tables
CSV tables
Score: 10.00
Multiple choice
16)
What does Delta Lake include to improve performance?
External data sources
Real-time streaming
** Built-in and easy optimizations
Data compression
Multiple choice
17)
What is the purpose of Unity Catalog in Databricks?
** Centralized governance solution
Real-time data processing
Machine learning platform
Distributed storage system
Score: 10.00
Multiple choice
18)
What is the structure of the three-tier namespace?
Data, Analysis, Visualization
Source, Transform, Load
** Catalog, Schema, Table
Database, Collection, file
Multiple choice
19)
What is the purpose of workflows?
To visualize data pipelines graphically
To create interactive notebooks for data analysis
** To automate and orchestrate data workflows
To monitor real-time data streams
Score: 10.00
Multiple choice
20)
What is the primary purpose of jobs?
Enabling complex data transformations
Collaborative data analysis and exploration
** Scheduling and automating tasks
Managing data pipelines and ETL processes
Score: 10.00
Multiple choice
21)
Which of the following types of assets can be automated using Workflows?
BI Connectors
Partner integrations
** Notebooks, ETL pipelines, and ML model training
MLFlow
Score: 10.00
Multiple choice
22)
What solution is designed for building and running robust data pipelines?
Delta Live Streams
** Delta Live Tables
Delta Live Networks
Delta Live Systems
Score: 10.00
Multiple choice
23)
What is the purpose of Databricks SQL for analysts and engineers working within the Databricks ecosystem?
Providing graphic design tools
Managing social media campaigns
** Serving as a data warehousing solution
Offering fitness tracking features
Score: 10.00
Multiple choice
24)
What are common use cases for data engineers when working with Databricks SQL?
Writing machine learning algorithms
** Determining data quality
Designing mobile applications
Generating random data samples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment