- Module 1: Describe Azure Databricks
In this module, you will:
- Understand the Azure Databricks platform
- Create your own Azure Databricks workspace
- Create a notebook inside your home folder in Databricks
- Understand the fundamentals of Apache Spark notebook
- Create, or attach to, a Spark cluster
- Identify the types of tasks well-suited to the unified analytics engine Apache Spark
- Module 2: Spark architecture fundamentals
In this module, you will:
- Understand the architecture of an Azure Databricks Spark Cluster
- Understand the architecture of a Spark Job
- Module 3: Read and write data in Azure Databricks
In this module, you will:
- Use Azure Databricks to read multiple file types, both with and without a Schema.
- Combine inputs from files and data stores, such as Azure SQL Database.
- Transform and store that data for advanced analytics.
- Module 4: Work with DataFrames in Azure Databricks
In this module, you will:
- Use the count() method to count rows in a DataFrame
- Use the display() function to display a DataFrame in the Notebook
- Cache a DataFrame for quicker operations if the data is needed a second time
- Use the limit function to display a small set of rows from a larger DataFrame
- Use select() to select a subset of columns from a DataFrame
- Use distinct() and dropDuplicates to remove duplicate data
- Use drop() to remove columns from a DataFrame
- Module 5: Work with user-defined functions
In this module, you will learn how to:
- Write User-Defined Functions
- Perform ETL operations using User-Defined Functions
- Module 6: Build and query a Delta Lake
In this module, you will:
- Learn about the key features and use cases of Delta Lake.
- Use Delta Lake to create, append, and upsert tables.
- Perform optimizations in Delta Lake.
- Compare different versions of a Delta table using Time Machine.
- Module 7: Perform machine learning with Azure Databricks
In this module, you will learn how to:
- Perform Machine Learning
- Train a model and create predictions
- Perform exploratory data analysis
- Describe machine learning workflows
- Build and evaluate machine learning models
- Module 8: Train a machine learning model
In this module, you will learn how to:
- Perform featurization of the dataset
- Finish featurization of the dataset
- Understand Regression modeling
- Build and interpret a regression model
- Module 9: Work with MLflow in Azure Databricks
In this module, you will learn how to:
- Use MLflow to track experiments, log metrics, and compare runs
- Work with MLflow to track experiment metrics, parameters, artifacts and models.
- Module 10: Perform model selection with hyperparameter tuning
In this module, you will learn how to:
- Describe Model selection and Hyperparameter Tuning
- Select the optimal model by tuning Hyperparameters
- Module 11: Deep learning with Horovod for distributed training
In this module, you will learn how to:
- Use Horovod to train a deep learning model
- Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
- Work with Horovod and Petastorm for training a deep learning model
- Module 12: Work with Azure Machine Learning to deploy serving models
In this module, you will learn how to:
- Use Azure Machine Learning to deploy Serving Models