Post 9: The sequence space 🚀

4 minute read

Published:

 

Life has found in biodiversity a good strategy to keep moving forward.

Around ~4000 million years ago, many protein families emerged in a very busy Earth. In a universe of proteins, each set of species (evolutionary lineage) can be seen as a galaxy. Just as galaxies in the physical universe move away from each other in space-time, proteins from species evolve in a similar space called the “adaptive landscape,” creating the concept of the “Sequence space” that indicates all possible combinations of amino acids in a protein. For a common protein of 300 amino acids, this results in more possible combinations than atoms in the universe (~10^82) (F1).

Using this galaxy approach, Inna and Fyodor found that even the proteins present in the last universal common ancestor (LUCA) continue to diverge, indicating a linear expansion of sequence space. Contrary to the exponential expansion of the physical universe. Even more interesting, at any given time, ~98% of the amino acids in a protein cannot mutate just like that; they need other previous (or compensatory) mutations to be able to mutate, and only under that specific combination(s) of mutations can the amino acids in a protein mutate. This implies that conserved amino acids are so because not enough time has passed for the specific combination of mutations that allow them to mutate to appear (F1).

img

We still do not know what causes the expansion of the physical universe, but in the case of proteins, we know several “evolutionary forces” based on natural selection, which together are responsible for feedback the dynamics of genetic information systems. These forces work from the atomic to the ecosystem scale, as well as over short and long periods of time (F2). To know how many proteins and families currently exist, Luis’s team conducted a massive study of prokaryotic proteins. They found a total of ~302 million proteins representing ~32 million families in which evolutionary forces operate differently. In fact, the vast majority of families are rare and of low abundance. The largest family contains only ~74 thousand genes. Of the 14 habitats they analyzed, they found that some, like intestinal ones, seem to have almost no new proteins left to discover, while marine and sediment ones are the source of many genes, and there are still many new proteins to discover as their curve has not yet “flattened”.

img

Unfortunately, due to climate change, many species and their proteins are extinct, and we will never manage to record them. Genetic variation has influences on both human health and ecosystems in at least 18 major ways. To try to restore ecosystems, it is sometimes possible to use key and native species such as animals or certain plants, but in more delicate/complex cases like coral reefs, it is necessary to even consider details at the level of introduced microorganisms and the proteins they carry. For this, one must understand how introduced genetic variation works. Currently, the first steps are being taken to manipulate microorganisms within their habitat (in situ) with CRISPR/Cas, but certainly, when performing these types of procedures, it is necessary to consider the evolution capacity of the entire system and its components, in what is currently known as “evolutionary engineering”.

img

I would like to say that we will need many biologists in the future, but I almost bet that we are going to be screwed because of our consumption habits, and the future will be (already is) very cyberpunk… F.

REfs:

  1. Natural Selection and the Concept of a Protein Space
  2. Sequence space and the ongoing expansion of the protein universe
  3. The millions of proteins pf the Luis team; Towards the biogeography of prokaryotic genes
  4. Evolutionary forces; Evolution in the light of fitness landscape
  5. The importance of genetic variation in biodiversity; The importance of genomic variation for biodiversity, ecosystems and people
  6. Microbial manipulacion with CRISPR-Cas; Species- and site-specific genome editing in complex bacterial communities
  7. Evolutionary engineering; Towards an engineering theory of evolution