Post 49: Download thousands of proteins from the AlphaFold Database 📝
Published:
Today is Friday, protein Friday! And if you want to (relatively) mass download AlphaFold2 models given a list of Uniprot IDs, here’s a tutorial I wrote in Google Colab:
But if you have >2TB of storage (ideally around 10TB), it’s better to download the pre-compressed databases and then select the models you’re interested in. With foldcomp, you can compress your proteins to save space, similar to a .rar/.zip file. In fact, the authors already provide pre-compressed databases.
The optimized code with numpy takes about ~28 hours to download 100,000 protein structures, each with around 350 amino acids.