Post 37: How did AlphaFold2 learn to model proteins? 🤔

1 minute read

Published:

 

In the video, a hydrolase from a bacterium (PDB: 7DQ9) is modeled. At the beginning, AF2 places all atoms in the same space. Later, it extends the atoms in one dimension, two dimensions, until adopting a volume. AF2 first learned to model helical regions, then beta sheets, and finally other variations of secondary structures. Once the local structure is established, the global structure is refined by checking for clashes between the side chains of the amino acids, which is quite difficult considering the geometry of all atoms in the neighborhood.

To reach this conclusion, a team fully replicated AF2’s training using practically the same algorithm, and they named their model OpenFold. Their training lasted 90,000 iterations; however, they observed that only 2,700 iterations (equivalent to ~3%) are necessary to achieve 90% of AF2’s final performance. This suggests that we can reach a decent solution for other biochemistry problems with artificial intelligence without having to use massive amounts of data like AF2 did.

Refs:

  1. OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization
  2. An OpenFold seminar