The 2024 Nobel Prize in Chemistry rewards the prediction of the 3D structure of proteins

The 2024 Nobel Prize in Chemistry rewards the prediction of the 3D structure of proteins
The 2024 Nobel Prize in Chemistry rewards the prediction of the 3D structure of proteins

As soon as we talk about sequence in a biological molecule, we immediately think of DNA and its cousins, such as RNA (subject of the Nobel Prize in Medicine awarded on Monday), made up of a succession of elementary “bricks”, the nucleotides designated by letters (A, G, C…). These alignments are organized in three dimensions according to motifs, the best known of which is the DNA double helix.

Proteins are another class of molecules made up of a succession of compounds, or residues, amino acids (and this succession is precisely encoded in a DNA sequence), their name referring to a common chemical motif: an acid group – COOH and an amine group – NH2.

There are around twenty of them, and they constitute a bestiary much more varied than that of nucleotides: these amino acids are distinguished by their size, their chemical properties… And from this diversity of “legos”, comes that of what we can build as proteins: enzymes, antibodies, hair keratin, hormones such as insulin, transcription factors, not to mention a number of drugs.

A protein is basically a string of pearls, these being amino acids.

Academy of Sciences © Johan Jarnestad/The Royal Swedish Academy of Sciences

If from the sequence of a gene it is “easy” to obtain that of amino acids, there remains a crucial step: determining the three-dimensional arrangement of the protein chain. And that’s where award-winning researchers come in, because there’s a long way to go.

3D structure does the function

In fact, the linear sequence of amino acids folds according to several degrees of complexity until it adopts the configuration specific to the function of the protein, its 3D structure.


The four levels of structure of a chain of amino acids to make up a protein.

To determine it, researchers have long used crystallography. Starting from crystallized proteins (a long process that does not work with all proteins), studying the diffraction patterns of X-rays passing through the sample led to the elucidation of the structure. The first was announced in the late 1960s by John Kendrew and Max Perutz of the University of Cambridge in Great Britain. Methods improved and by the 1980s some 5,000 new structures were being published per year. That’s good, but knowing that today we have 200 million amino acid sequences (they are gathered in the Protein Data Bank, with open access), we had to find something else.

The Protein Olympics

Recipient of half the prize, David Baker, from the University of Washington in Seattle, was one of the first to explore the path of computer science, at the end of the 1990s. With his team he designed Rosetta, the first program predicting the structure of a protein from its amino acid sequence. The algorithm participated with honors in 1998 in the Critical assessment of structure prediction (Casp), a protein structure prediction competition organized since 1994. At that time, the biochemist had the idea of ​​reversing the process: offering Rosetta a protein structure and asking her for a plausible amino acid sequence by drawing from data banks. “Protein design” was born, and Top7, without equivalent in nature, was the first molecule thus produced.

Since then, thanks to Rosetta, a fentanyl (an opioid) detector protein has been manufactured in 2017, a molecular rotor in 2022, and a protein structure in 2024 that changes shape depending on external parameters…

Back to Casp. During the fourteenth edition, in November 2020, a competitor rose above the competition: the AlphaFold2 program, developed by DeepMind, the branch of Google dedicated to artificial intelligence. A first version, Alphafold, had already won the Casp 13 in 2018, but its successor greatly surpasses it. We owe this feat to John Jumper and Demis Hassabis, both researchers at DeepMind in London, and that is why they share the other half of the 2024 Chemistry Nobel Prize. Their founding idea? Using artificial intelligence is definitely at the heart of numerous works rewarded by the Nobel committee, and in particular those which enabled its advent, such as John Hopfield and Geoffrey Hinton, pioneers of neural networks.

Make way for transformers

Starting from an amino acid sequence, AlphaFold2 first searches databases for similar (but not identical!) sequences, for example in other species. The program then compares them and tracks down amino acids which, even if far apart in the primary sequence, are close in the 3D structure. Therefore, a map of the estimated distances between each pair of amino acids is established. Finally, through an iterative process carried out by a neural network, the map is refined and a possible structure of the protein is proposed.

The neural network used is unique in that it is a “transformer”: its architecture has been specially designed to identify common patterns, for example when it comes to detecting frequent sequences of words in language.

Thus equipped, AlphaFold2 was able to overcome one of the largest known protein structures, that of a nuclear pore, at the interface between the interior of the cell nucleus and the cytoplasm.


A model of the human nuclear pore, built using AlphaFold2.

© Agnieszka Obarska-Kosinska

The performance of the transformers is so impressive that David Baker incorporated one into his program, and Rosetta became TrRosetta in the early 2020s.

To know how a protein works – and to be able to imagine possible drugs that inhibit it – you need to know what it looks like! Thanks to David Baker, Demis Hassabis and John Jumper, it’s made easy!

-

-

PREV why nothing is decided between the two candidates in the presidential election
NEXT IAMSTRONG at your service| Studyrama