Over nearly 22 years, the sequencing of the human genome had already been reported at least five times: in 2000, 2001, 2003, 2013 and 2019. Friday’s journal Science (April 1) now announces the sixth and more full version.
The feat was once again the work of a multinational consortium. A hundred researchers from 54 universities, companies and institutes sign the article in the magazine, which published in 2001 the pioneering spelling of the company Celera, in parallel with that of the international initiative Human Genome Project (HGP) in the competitor Nature.
The genes of a human being are spread over 46 chromosomes, very complex structures in the nucleus of cells. There are regions in them that are difficult to decipher, such as the tips (telomeres and adjacent sequences) and the “knot” (centromere) that joins the two arms of the set, in addition to many repetitions.
Concentrated efforts in recent decades had failed to decipher the base pairs (“letters” of the minimalist chemical-genetic alphabet, A, T, C, and G) in these regions. The holes took up 8% of the genome, despite “completion” announcements.
The work of the T2T consortium (from the English expression for “telomere to telomere”) closed these gaps with 238 million added letters. This is no small feat, in a text —the genome— composed of 3,054,832,041 characters; if it were a book, it would have over 1.5 million pages.
It was as if this 200,000-year-old encyclopedia (emergence of the human species) had 120,000 moldy and unreadable pages. Using recent techniques, the T2T legion has come up with a way to remove blemishes without erasing what was underneath.
When the contents were revealed, it was found that there were 1,956 unidentified genes among the approximately 20,000 that make up the genome. Among the discoveries, 99 are protein coders, that is, they contain a list of ingredients to make each of the biochemical bricks with which an organism is built.
The other 1,857 genes are probably sequences involved in the regulation of cellular machinery and chromosomes, or traces of genomic alterations that occurred in the species’ past. The disproportion gives an idea of ​​how complex the genome is, with so many sequences that in the past were designated as junk DNA.
Such complexity provides good reason to take with a grain of salt the conclusion that this is, after all, the definitive sequence of the human genome. It certainly represents the most complete reference that the enormous biotechnological advance has managed to produce in the seven decades since the discovery of the DNA double helix.
A reference genome with greater accuracy will help to compare the DNA of people with diseases with it, in search of variations that can contribute to explain and treat them. The genetic diversity itself from individual to individual, however, implies keeping one foot behind the notion that there is “one” human genome, as the metaphors of “Book of Life” and quetals suggested.
This is to say nothing of population variation, as bioinformatics specialist Sandro José de Souza, from the Federal University of Rio Grande do Norte emphasizes: “Much more important than having a complete genome is having many genomes sequenced.”
“It is necessary to have more ethnic groups represented in these databases”, says Souza, who praises the new study for its ability to facilitate and cheapen studies of association between aspects of the genome and health problems, alongside initiatives such as the government UK to sequence 500,000 individuals.
He is also in competition with Deanna Church, from the company Inscripta, in a comment in the same issue of Science: “Although there is more complexity in the population than represented in the T2TCHM13 montage, having a haplotype [metade de um cromossomo] assembled correctly facilitates the analysis of these biomedically important regions in other human and non-human primates”.
“We need to continue to develop assembly models that allow for the representation of the rich diversity of sequences found in the human population. Understanding the diversity will help dissect disease risks seen in different populations.”
The work at Science is an outstanding milestone in the billion-dollar enterprise started 32 years ago with the PGH. Another. Many others will have to come to speak properly about the genetic code of all species and about medical benefits accessible to all.