Researchers from the Andalusian Center for Developmental Biology (CABD) and the Institute of Evolutionary Biology of Barcelona (IBE) have used advanced artificial intelligence techniques for protein analysis.
As SINC reports, the team has shown that it is possible to identify and describe the functions of proteins in detail, even without prior information. This work enables the broad application of these methods to understand proteins in less-studied organisms, identify new gene functions and explore which proteins may be of biomedical and biotechnological interest, with greater precision than conventional methods.
In nature, the information contained in DNA is converted into proteins that perform functions in cells. In this project, led by CABD researchers Ildefonso Cases and Ana M. Rojas together with Rosa Fernández from the IBE, two deep learning-based methods were used to analyze proteins in different model organisms, including yeast, mice and fruit flies.
The study found that language models (transformers) are more effective than folding networks and provide more accurate and informative data about the proteins in the species studied. In addition, language models can retrieve functional information from RNA data (the molecule that carries the DNA instructions for protein synthesis in cells).
“We are at a critical moment because the enormous amount of sequencing projects of unknown organisms is producing millions of sequences whose functions we cannot predict using conventional methods,” explains Rojas. This work opens up new research possibilities for a more precise analysis and classification of protein functions.
This new study, published in *Nuc Acids Res Genomics and Bioinformatics*, lays the foundation for the use of AI in other applications.
Computational Biology
“These deep learning tools will allow us to tackle new problems in computational biology. We are working on applying these techniques to other targets, such as customized promoters, annotation of cell groups in single cell analysis or protein engineering,” says the IBE researcher.
In the meantime, Rosa Fernández emphasizes that this research is crucial in the field of biodiversity, as new protein sequences with unknown functions are published daily, addressing the challenge of annotating the dark proteome.
“To this end, we are using these tools on thousands of transcriptomes from the animal kingdom, a project that is currently under review. The more information we have on the functions of new sequences, the faster we can decipher the molecular mechanisms underlying biological processes in biodiversity and regeneration, with potential biotechnological (food industry) and biomedical (pharmaceutical industry) applications,” concludes the researcher.