Perform data science with Azure Databricks

0
Microsoft Learn
Free Online Course
English
8-9 hours worth of material
selfpaced

Overview

  • Module 1: Describe Azure Databricks
  • In this module, you will:

    • Understand the Azure Databricks platform
    • Create your own Azure Databricks workspace
    • Create a notebook inside your home folder in Databricks
    • Understand the fundamentals of Apache Spark notebook
    • Create, or attach to, a Spark cluster
    • Identify the types of tasks well-suited to the unified analytics engine Apache Spark
  • Module 2: Spark architecture fundamentals
  • In this module, you will:

    • Understand the architecture of an Azure Databricks Spark Cluster
    • Understand the architecture of a Spark Job
  • Module 3: Read and write data in Azure Databricks
  • In this module, you will:

    • Use Azure Databricks to read multiple file types, both with and without a Schema.
    • Combine inputs from files and data stores, such as Azure SQL Database.
    • Transform and store that data for advanced analytics.
  • Module 4: Work with DataFrames in Azure Databricks
  • In this module, you will:

    • Use the count() method to count rows in a DataFrame
    • Use the display() function to display a DataFrame in the Notebook
    • Cache a DataFrame for quicker operations if the data is needed a second time
    • Use the limit function to display a small set of rows from a larger DataFrame
    • Use select() to select a subset of columns from a DataFrame
    • Use distinct() and dropDuplicates to remove duplicate data
    • Use drop() to remove columns from a DataFrame
  • Module 5: Work with user-defined functions
  • In this module, you will learn how to:

    • Write User-Defined Functions
    • Perform ETL operations using User-Defined Functions
  • Module 6: Build and query a Delta Lake
  • In this module, you will:

    • Learn about the key features and use cases of Delta Lake.
    • Use Delta Lake to create, append, and upsert tables.
    • Perform optimizations in Delta Lake.
    • Compare different versions of a Delta table using Time Machine.
  • Module 7: Perform machine learning with Azure Databricks
  • In this module, you will learn how to:

    • Perform Machine Learning
    • Train a model and create predictions
    • Perform exploratory data analysis
    • Describe machine learning workflows
    • Build and evaluate machine learning models
  • Module 8: Train a machine learning model
  • In this module, you will learn how to:

    • Perform featurization of the dataset
    • Finish featurization of the dataset
    • Understand Regression modeling
    • Build and interpret a regression model
  • Module 9: Work with MLflow in Azure Databricks
  • In this module, you will learn how to:

    • Use MLflow to track experiments, log metrics, and compare runs
    • Work with MLflow to track experiment metrics, parameters, artifacts and models.
  • Module 10: Perform model selection with hyperparameter tuning
  • In this module, you will learn how to:

    • Describe Model selection and Hyperparameter Tuning
    • Select the optimal model by tuning Hyperparameters
  • Module 11: Deep learning with Horovod for distributed training
  • In this module, you will learn how to:

    • Use Horovod to train a deep learning model
    • Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
    • Work with Horovod and Petastorm for training a deep learning model
  • Module 12: Work with Azure Machine Learning to deploy serving models
  • In this module, you will learn how to:

    • Use Azure Machine Learning to deploy Serving Models

Syllabus

  • Module 1: Describe Azure Databricks
    • Introduction
    • Explain Azure Databricks
    • Create an Azure Databricks workspace and cluster
    • Understand Azure Databricks Notebooks
    • Exercise: Work with Notebooks
    • Knowledge check
    • Summary
  • Module 2: Spark architecture fundamentals
    • Introduction
    • Understand the architecture of Azure Databricks spark cluster
    • Understand the architecture of spark job
    • Knowledge check
    • Summary
  • Module 3: Read and write data in Azure Databricks
    • Introduction
    • Read data in CSV format
    • Read data in JSON format
    • Read data in Parquet format
    • Read data stored in tables and views
    • Write data
    • Exercises: Read and write data
    • Knowledge check
    • Summary
  • Module 4: Work with DataFrames in Azure Databricks
    • Introduction
    • Describe a DataFrame
    • Use common DataFrame methods
    • Use the display function
    • Exercise: Distinct articles
    • Knowledge check
    • Summary
  • Module 5: Work with user-defined functions
    • Introduction
    • Write user defined functions
    • Exercise: Perform Extract, Transform, Load(ETL) operations using user-defined functions
    • Knowledge check
    • Summary
  • Module 6: Build and query a Delta Lake
    • Introduction
    • Describe the open source Delta Lake
    • Exercise: Work with basic Delta Lake functionality
    • Describe how Azure Databricks manages Delta Lake
    • Exercise: Use the Delta Lake Time Machine and perform optimization
    • Knowledge check
    • Summary
  • Module 7: Perform machine learning with Azure Databricks
    • Introduction
    • Understand machine learning
    • Exercise: Train a model and create predictions
    • Understand data using exploratory data analysis
    • Exercise: Perform exploratory data analysis
    • Describe machine learning workflows
    • Exercise: Build and evaluate a baseline machine learning model
    • Knowledge check
    • Summary
  • Module 8: Train a machine learning model
    • Introduction
    • Perform featurization of the dataset
    • Exercise: Finish featurization of the dataset
    • Understand regression modeling
    • Exercise: Build and interpret a regression model
    • Knowledge check
    • Summary
  • Module 9: Work with MLflow in Azure Databricks
    • Introduction
    • Use MLflow to track experiments, log metrics, and compare runs
    • Exercise: Work with MLflow to track experiment metrics, parameters, artifacts and models
    • Knowledge check
    • Summary
  • Module 10: Perform model selection with hyperparameter tuning
    • Introduction
    • Describe model selection and hyperparameter tuning
    • Exercise: Select optimal model by tuning hyperparameters
    • Knowledge check
    • Summary
  • Module 11: Deep learning with Horovod for distributed training
    • Introduction
    • Use Horovod to train a deep learning model
    • Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
    • Exercise: Work with Horovod and Petastorm for training a deep learning model
    • Knowledge check
    • Summary
  • Module 12: Work with Azure Machine Learning to deploy serving models
    • Introduction
    • Use Azure Machine Learning to deploy serving models
    • Knowledge check
    • Summary