In this capstone project course, we'll compare genome sequences of COVID-19 mutations to identify potential areas a drug therapy can look to target. The first step in drug discovery involves identifying target subsequences of theirs genome to target. We'll start by comparing the genomes of virus mutations to look for similarities. Then, we'll perform PCA to cut down our number of dimensions and identify the most common features. Next, we'll use K-means clustering in Python to find the optimal number of groups and trace the lineage of the virus. Finally, we'll predict similarity between the sequences and use this to pick a target subsequence. Throughout the course, each section will consist of a programming assignment coupled with a guide video and helpful hints. By the end, you'll be well on your way to discovering ways to combat disease with genome sequencing.