Home Internet DeepMind puts the entire human proteome online, as folded by AlphaFold –...

DeepMind puts the entire human proteome online, as folded by AlphaFold – TechCrunch


DeepMind and several research partners have released a database containing the 3D structures of nearly every protein in the human body, as computationally determined by the breakthrough protein folding system demonstrated last year, AlphaFold. The freely available database represents an enormous advance and convenience for scientists across hundreds of disciplines and domains. It may very well form the foundation of a new phase in biology and medicine.

The AlphaFold Protein Structure Database is a collaboration between DeepMind, the European Bioinformatics Institute, and others. It consists of hundreds of thousands of protein sequences with their structures predicted by AlphaFold — and the plan is to add millions more to create a “protein almanac of the world.”

entire human proteome

“We believe that this work represents the most significant contribution AI has made to advancing the state of scientific knowledge to date and is a great example of the kind of benefits AI can bring to society,” said DeepMind founder and CEO Demis Hassabis.

From genome to proteome

If you’re not familiar with proteomics in general — and it’s pretty natural if that’s the case — the best way to think about this is perhaps in terms of another significant effort: that of sequencing the human genome. As you may recall, from the late ’90s and early ’00s, this was a colossal endeavor undertaken by a large group of scientists and organizations across the globe and over many years. The genome finished, at last, has been instrumental to the diagnosis and understanding of countless conditions and the development of drugs and treatments for them.

It was, however, just the beginning of the work in that field — like finishing all the edge pieces of a giant puzzle. And one of the following significant projects everyone turned their eyes toward in those years was understanding the human proteome — which is to say all the proteins used by the human body and encoded into the genome.

The problem with the proteome is that it’s much, much more complex. Proteins, like DNA, are sequences of known molecules; in DNA, these are the handful of familiar bases (adenine, guanine, etc.), but in proteins, they are the 20 amino acids (each of which is coded by multiple bases in genes). This in itself creates a great deal more complexity, but it’s only the start. The sequences aren’t simply “code” but actually twist and fold into tiny molecular origami machines that accomplish all kinds of tasks within our bodies. It’s like going from binary code to a complex language that manifests objects in the real world.


Please enter your comment!
Please enter your name here