A roadmap for AI-driven protein design
Course, YouTube, 2025
I created this free 10-lecture course to introduce you to protein design.

General description
I want more people to learn how to design proteins using Artificial Intelligence (AI). However, I’ve found two problems:
- there are no comprehensive online courses on the topic
- the existing courses are often too expensive for most students in Latin America
To address these problems, I created this free 10-lecture course as an introduction to AI-driven protein design. The course has two main resources:
- The 10 lectures on YouTube
- A GitHub repository containing the following resources:
- Tools: recommended libraries organized into +10 categories such as protein sequence and structure processing, data management, machine learning, etc.
- Learning resources: courses and blogs to learn topics such as Python, data science, bioinformatics, etc.
- Databases: recommended sources to download protein data (i.e., sequences, structures, embeddings, (meta)genomes).
- Tutorials: tutorials for learning how to process and analyze protein science data.
- Selected papers: scientific articles I recommend.
Access to the slides
This course is composed of +800 slides, each containing image sources, citations, and recommended resources in the notes section. The slides were created in PowerPoint, so I recommend viewing them using that software. You can download the slides in the following two options:
This is intended to give you access to more information so you can explore the topics in greater depth. If you are a teacher and have adopted this material for your lectures, please let me know. I would love to hear how you improved the course and to know that more people are learning about protein science. However, if you identify someone who has fully or partially plagiarized this course AND is charging money to access it, I would appreciate it if you notified me.
Course organization

The theoretical lectures are numbered from 01 to 10 to facilitate understanding of the topics. For example, to discuss AlphaFold, it is necessary to know concepts from structural biology and deep learning. Below is a brief description of the lectures and their topics:
- Basic computing concepts: how CPUs and GPUs work, as well as the essential software for data analysis (i.e., Linux/Bash and Python).
- How to get started in bioinformatics
- Hardware
- Software
- Machine learning: what AI is and its subfields, the current capabilities of algorithms, and how a model is trained.
- The current state of AI
- How AI learns
- How to train a model
- Deep learning: how neural networks work, the different types of neural networks, and the software used to work with them.
- Neural networks
- Deep learning libraries
- Transformers and language models: how Transformers and modern language models work, as well as the software used to work with them.
- Language models
- Transformers
- Performance and generalization
- How to work with language models
- Protein structure: principles of structural biology and the organization of protein sequences and structures.
- Structural organization
- Classifications
- The shape of the protein universe
- Protein function: how proteins adopt their structure, how catalysis and ligand binding occur, and how function is regulated.
- Folding
- Function
- Functional regulation
- Protein evolution: how proteins are thought to have originated and diversified from simpler peptides, and how molecular evolution operates.
- Levels of biological organization
- Biological evolution
- The sequence space
- Epistasis
- AlphaFold: overview of the AlphaFold2 and AlphaFold3 architectures, their strengths and weaknesses, and their impact on science.
- AlphaFold
- AlphaFold2
- AlphaFold3
- AI-driven protein design: motivations for designing proteins, characteristics of designed proteins, and how AI has modernized classical methods and enabled new approaches.
- Protein design
- Rational design
- Evolutionary design
- Representation learning
- Generative AI
- Data and biases: relevant databases, how to process data for model training, and examples of inherent biases present in datasets.
- Big data in omics
- Datasets
- Data processing
- Generalization in biology
- Data biases
How to support this project
Creating this course required a lot of time and work. If you found it useful and would like to support me financially, you can make a donation via PayPal. Donations can be of any amount, or of 12, 30, or 45 USD (suggestions based on students’ economic situations and the typical cost of courses like this). Click the image below if you want to donate.
If you do not have to much financial flexibility but would still like to express your gratitude, you can send me your comments by email:
- gamamiguelangel@gmail.com
Finally, I would appreciate it if you share this course with your colleagues who are interested in learning about AI-driven protein design.
About me
I am Miguel Angel Gonzalez Arias. I am a Mexican biologist, and I love proteins, microbes, and computing. For more details about me, my socials and other contact information, please visit:
