Preprint / Version 1

AlphaFold: A beginner’s guide and in-depth exploration of the revolutionary AI tool and its inner workings.

##article.authors##

  • Shravan Shravan Branham High School, San Jose, CA, United States

DOI:

https://doi.org/10.58445/rars.1758

Keywords:

AI, Machine Learning, AlphaFold

Abstract

AlphaFold is a very famous example of a machine learning program that was used to solve a generational problem in figuring out the 3D structure of a protein given its sequence, which it achieves to some degree. However, the program is often misunderstood, and for a long time wasn’t very accessible to many researchers and the general public, until Colabfold was created for anyone to use AlphaFold in an easy, user-friendly format. While accessible, it can still be difficult for many researchers to navigate, so this paper aims to explain why AlphaFold was created and what problem it solves, how it works, a deep dive into the code of it all, and what it can and can’t do, as well as how to potentially make it better. This paper aims to shine light and give insight into the world of AlphaFold in a vernacular that anyone can understand and interpret.

References

"AlphaFold Protein Structure Database," European Bioinformatics Institute, Accessed: Aug. 31, 2024. [Online]. Available: https://alphafold.ebi.ac.uk/

B. Alberts, A. Johnson, J. Lewis, et al., Molecular Biology of the Cell, 4th ed. New York: Garland Science, 2002, The Shape and Structure of Proteins. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK26830/.

J. P. Hughes, S. Rees, S. B. Kalindjian, and K. L. Philpott, "Principles of early drug discovery," *Br. J. Pharmacol.*, vol. 162, no. 6, pp. 1239-1249, Mar. 2011, doi: 10.1111/j.1476-5381.2010.01127.x.

G. M. Ashraf et al., "Protein misfolding and aggregation in Alzheimer's disease and type 2 diabetes mellitus," CNS Neurol. Disord. Drug Targets, vol. 13, no. 7, pp. 1280-1293, 2014, doi: 10.2174/1871527313666140917095514.

J. Jumper, R. Evans, A. Pritzel, et al., "Highly accurate protein structure prediction with AlphaFold," Nature, vol. 596, pp. 583–589, 2021, doi: 10.1038/s41586-021-03819-2.

S. C. Pakhrin et al., "Deep learning-based advances in protein structure prediction," Int. J. Mol. Sci., vol. 22, no. 11, p. 5553, May 2021, doi: 10.3390/ijms22115553.

M. A. Haggerty, "Explained: Neural networks and deep learning," MIT News, Apr. 14, 2017. [Online]. Available: https://news.mit.edu/2017/explained-neural-networks-deep-learning-0414. [Accessed: Aug. 31, 2024].

J. H. Williams, "Language modeling from scratch: Part 2," Towards AI, Jan. 27, 2021. [Online]. Available: https://towardsai.net/p/data-science/language-modeling-from-scratch-part-2. [Accessed: Aug. 31, 2024].

I. H. Sarker, "Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions," SN Comput. Sci., vol. 2, no. 6, p. 420, 2021, doi: 10.1007/s42979-021-00815-1.

"Reading Protein Structure," Nursing Hero. [Online]. Available: https://www.nursinghero.com/study-guides/bio1/reading-protein-structure. [Accessed: Sep. 01, 2024].

M. Mirdita, K. Schütze, Y. Moriwaki, L. Heo, S. Ovchinnikov, and M. Steinegger, "ColabFold: making protein folding accessible to all," Nature Methods, vol. 19, pp. 679–682, 2022, doi: 10.1038/s41592-022-01488-1.

V. Hornak, R. Abel, A. Okur, B. Strockbine, A. Roitberg, and C. Simmerling, "Comparison of multiple Amber force fields and development of improved protein backbone parameters," Proteins, vol. 65, no. 3, pp. 712-725, Nov. 2006, doi: 10.1002/prot.21123.

ChatGPT. (GPT-4). OpenAI. Accessed: Sep. 1, 2024. [Online]. Available: https://chat.openai.com/chat.

"ColabFold Documentation," UMass Unity. [Online]. Available: https://docs.unity.rc.umass.edu/documentation/tools/colabfold/. [Accessed: Sep. 01, 2024].

ColabFold, "AlphaFold2.ipynb," 2021. [Online]. Available: https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb. [Accessed: Sep. 01, 2024].

ColabFold, GitHub repository, 2021. [Online]. Available: https://github.com/sokrypton/ColabFold. [Accessed: Sep. 01, 2024].

"torch.nn.Dropout," PyTorch Documentation. [Online]. Available: https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html. [Accessed: Sep. 01, 2024].

"Table 2.1," Western Oregon University, Apr. 2020. [Online]. Available: https://wou.edu/chemistry/files/2020/04/Table-2.1.jpg. [Accessed: Sep. 01, 2024].

Soeding Lab, *hhdatabase_cif70*, GitHub repository, 2021. [Online]. Available: https://github.com/soedinglab/hhdatabase_cif70. [Accessed: Sep. 01, 2024].

ColabFold, "batch.py," *GitHub repository*, 2021. [Online]. Available: https://github.com/sokrypton/ColabFold/blob/main/colabfold/batch.py. [Accessed: Sep. 01, 2024].

S. Saranyan, "Investigations into AlphaFold Modelling Types," Google Docs, [Online]. Available: https://docs.google.com/document/d/1qSxyBQB7Vwt77xJsdGWz8se-t6XDk7AVuZNLEb4vKSs/edit. [Accessed: Sep. 01, 2024].

P. Bryant, G. Pozzati, and A. Elofsson, "Improved prediction of protein-protein interactions using AlphaFold2," Nat. Commun., vol. 13, p. 1265, 2022, doi: 10.1038/s41467-022-28865-w.

J. Liu et al., "Enhancing AlphaFold-Multimer-based protein complex structure prediction with MULTICOM in CASP15," Commun. Biol., vol. 6, no. 1, p. 1140, Nov. 2023, doi: 10.1038/s42003-023-05525-3.

"FASTA Format Overview," Zhang Lab, [Online]. Available: https://zhanggroup.org/FASTA/. [Accessed: Sep. 01, 2024].

"Strengths and Limitations of AlphaFold," EMBL-EBI, [Online]. Available: https://www.ebi.ac.uk/training/online/courses/alphafold/an-introductory-guide-to-its-strengths-and-limitations/strengths-and-limitations-of-alphafold/. [Accessed: Sep. 01, 2024].

"The Complement System," NCBI Bookshelf, [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK27100/. [Accessed: Sep. 01, 2024].

"National Center for Biotechnology Information (NCBI)," NCBI, [Online]. Available:

https://www.ncbi.nlm.nih.gov/. [Accessed: Sep. 01, 2024].

"Modified Copy of AlphaFold2.ipynb," Google Colab, [Online]. Available: https://colab.research.google.com/drive/1aZmUSm1k3XBO6W18gfu_lnFhUJWPx4H-. [Accessed: Sep. 01, 2024].

"Protein Classes," Rose-Hulman Institute of Technology, [Online]. Available: https://www.rose-hulman.edu/~brandt/Chem330/Protein_classes.pdf. [Accessed: Sep. 01, 2024].

"Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank," RCSB, [Online]. Available: https://www.rcsb.org/. [Accessed: Sep. 01, 2024].

R. Maoz-Segal and P. Andrade, "Molecular mimicry and autoimmunity," in Infection and Autoimmunity, 2015, pp. 27–44, doi: 10.1016/B978-0-444-63269-2.00054-4.

Downloads

Posted

2024-10-17