Perform data science with Azure Databricks

0

Microsoft Learn

Free Online Course

English

8-9 hours worth of material

selfpaced

Overview

Module 1: Describe Azure Databricks

In this module, you will:

Understand the Azure Databricks platform
Create your own Azure Databricks workspace
Create a notebook inside your home folder in Databricks
Understand the fundamentals of Apache Spark notebook
Create, or attach to, a Spark cluster
Identify the types of tasks well-suited to the unified analytics engine Apache Spark

Module 2: Spark architecture fundamentals

In this module, you will:

Understand the architecture of an Azure Databricks Spark Cluster
Understand the architecture of a Spark Job

Module 3: Read and write data in Azure Databricks

In this module, you will:

Use Azure Databricks to read multiple file types, both with and without a Schema.
Combine inputs from files and data stores, such as Azure SQL Database.
Transform and store that data for advanced analytics.

Module 4: Work with DataFrames in Azure Databricks

In this module, you will:

Use the count() method to count rows in a DataFrame
Use the display() function to display a DataFrame in the Notebook
Cache a DataFrame for quicker operations if the data is needed a second time
Use the limit function to display a small set of rows from a larger DataFrame
Use select() to select a subset of columns from a DataFrame
Use distinct() and dropDuplicates to remove duplicate data
Use drop() to remove columns from a DataFrame

Module 5: Work with user-defined functions

In this module, you will learn how to:

Write User-Defined Functions
Perform ETL operations using User-Defined Functions

Module 6: Build and query a Delta Lake

In this module, you will:

Learn about the key features and use cases of Delta Lake.
Use Delta Lake to create, append, and upsert tables.
Perform optimizations in Delta Lake.
Compare different versions of a Delta table using Time Machine.

Module 7: Perform machine learning with Azure Databricks

In this module, you will learn how to:

Perform Machine Learning
Train a model and create predictions
Perform exploratory data analysis
Describe machine learning workflows
Build and evaluate machine learning models

Module 8: Train a machine learning model

In this module, you will learn how to:

Perform featurization of the dataset
Finish featurization of the dataset
Understand Regression modeling
Build and interpret a regression model

Module 9: Work with MLflow in Azure Databricks

In this module, you will learn how to:

Use MLflow to track experiments, log metrics, and compare runs
Work with MLflow to track experiment metrics, parameters, artifacts and models.

Module 10: Perform model selection with hyperparameter tuning

In this module, you will learn how to:

Describe Model selection and Hyperparameter Tuning
Select the optimal model by tuning Hyperparameters

Module 11: Deep learning with Horovod for distributed training

In this module, you will learn how to:

Use Horovod to train a deep learning model
Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
Work with Horovod and Petastorm for training a deep learning model

Module 12: Work with Azure Machine Learning to deploy serving models

In this module, you will learn how to:

Use Azure Machine Learning to deploy Serving Models

Syllabus

Module 1: Describe Azure Databricks

Introduction
Explain Azure Databricks
Create an Azure Databricks workspace and cluster
Understand Azure Databricks Notebooks
Exercise: Work with Notebooks
Knowledge check
Summary

Module 2: Spark architecture fundamentals

Introduction
Understand the architecture of Azure Databricks spark cluster
Understand the architecture of spark job
Knowledge check
Summary

Module 3: Read and write data in Azure Databricks

Introduction
Read data in CSV format
Read data in JSON format
Read data in Parquet format
Read data stored in tables and views
Write data
Exercises: Read and write data
Knowledge check
Summary

Module 4: Work with DataFrames in Azure Databricks

Introduction
Describe a DataFrame
Use common DataFrame methods
Use the display function
Exercise: Distinct articles
Knowledge check
Summary

Module 5: Work with user-defined functions

Introduction
Write user defined functions
Exercise: Perform Extract, Transform, Load(ETL) operations using user-defined functions
Knowledge check
Summary

Module 6: Build and query a Delta Lake

Introduction
Describe the open source Delta Lake
Exercise: Work with basic Delta Lake functionality
Describe how Azure Databricks manages Delta Lake
Exercise: Use the Delta Lake Time Machine and perform optimization
Knowledge check
Summary

Module 7: Perform machine learning with Azure Databricks

Introduction
Understand machine learning
Exercise: Train a model and create predictions
Understand data using exploratory data analysis
Exercise: Perform exploratory data analysis
Describe machine learning workflows
Exercise: Build and evaluate a baseline machine learning model
Knowledge check
Summary

Module 8: Train a machine learning model

Introduction
Perform featurization of the dataset
Exercise: Finish featurization of the dataset
Understand regression modeling
Exercise: Build and interpret a regression model
Knowledge check
Summary

Module 9: Work with MLflow in Azure Databricks

Introduction
Use MLflow to track experiments, log metrics, and compare runs
Exercise: Work with MLflow to track experiment metrics, parameters, artifacts and models
Knowledge check
Summary

Module 10: Perform model selection with hyperparameter tuning

Introduction
Describe model selection and hyperparameter tuning
Exercise: Select optimal model by tuning hyperparameters
Knowledge check
Summary

Module 11: Deep learning with Horovod for distributed training

Introduction
Use Horovod to train a deep learning model
Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
Exercise: Work with Horovod and Petastorm for training a deep learning model
Knowledge check
Summary

Module 12: Work with Azure Machine Learning to deploy serving models

Introduction
Use Azure Machine Learning to deploy serving models
Knowledge check
Summary

Related Courses