The prediction of protein structures has become easier with a method developed by KAUST researchers that outperforms state-of-the-art techniques. Resulting insight into proteins could help to identify drug targets and develop therapeutics.
Proteins have multiple structure levels, the most basic of which is a string of building blocks called amino acids that are linked in a protein-specific sequence. The complete amino acid chains fold into 3-D structures that are essential for protein function. The 3-D structure of a protein tells us about proteins' roles in cells, but determining this structure is remarkably difficult.
“Searching for protein 3-D structure from scratch is difficult due to the huge search space,” explained Xin Gao from the KAUST Computational Bioscience Research Center. “The most promising way to predict protein structures is by homology modeling, which is based on the observation that homologous proteins have similar structures.”
Homology modeling looks for similarities between the query protein and thousands of proteins with known 3-D structures. Current systems can look for similarities in amino acid sequence through a process called sequence alignment or assess how well the amino acid sequence maps to known 3-D structures in a process called threading. However, they do not account for the entire protein sequence space (all known sequences) and the entire structure space (all known 3-D structures). Gao and colleagues developed a cross-modal method called CMsearch that integrates such information.
“CMsearch systematically and simultaneously incorporates sequence and structure features and sequence space and structure space information,” Gao said. “Its ability to consider more information in a systematic way means that CMsearch can successfully detect some remote homologs with relatively weak sequence alignment or threading scores that existing methods cannot detect.”
The researchers tested their system on a database of over 8,000 proteins. They selected some proteins to be query proteins for which only the basic sequences were known. They then asked CMsearch to predict the 3-D structures of these proteins by comparing the query protein sequences to proteins in the rest of the database.
CMsearch identified homologs for the query proteins more accurately than other currently used prediction software, demonstrating the benefit of analyzing sequence and structure space information in parallel.
The improvement that CMsearch yields in structural predictions provides a basis for greater functional insights, with the potential for considerable biomedical benefits.
“The function of a protein is determined by its 3-D structure,” said Gao. “Therefore, determining protein structures can help us to understand how proteins work and lead to therapeutic solutions for various diseases.”