Skip to main content

Computer Science

Algorithm scours datasets to diagnose medical mysteries

A new tool uses genetic and clinical information to find the root cause of unexplained illnesses.

Imane Boudellioua and Robert Hoehndorf developed a tool that may help find the genetic cause of some "mystery" illnesses.

Imane Boudellioua and Robert Hoehndorf developed a tool that may help find the genetic cause of some “mystery” illnesses. 

© 2017 KAUST

An algorithm developed by KAUST scientists has the potential to help patients with mysterious ailments find genetic causes for their undiagnosed diseases1.

It works by first identifying presumed harmful variants in a patient’s genome. The algorithm then cross-references the various mutations against large databases linking genes and symptoms and determines the likelihood of any given gene variant being implicated in the patient’s disease.

Other tools available to scour the genome for harmful mutations tend to rely solely on DNA sequence data. Meanwhile, the KAUST team’s new PhenomeNET Variant Predictor (PVP) system includes clinical information from a patient’s medical record. It also incorporates reams of phenotype data from systematic evaluations of mice and zebrafish that match DNA changes to disease features.

“Ours uses more information than other tools, and we look for potential causative variants, not just a deleterious variant,” explained Professor Robert Hoehndorf, who led the study, along with his Ph.D. student Imane Boudellioua.

In their new paper, the researchers used a retrospective dataset from the UK and the Supercomputing Laboratory at KAUST to show that PVP accurately identified the causative gene variants responsible for congenital hypothyroidism. Mutations in a number of different genes are known to cause the disease, leading to an underproduction by the thyroid gland in the neck of iodine-containing hormone needed for normal growth and development. As reported, PVP pinpointed the gene variants responsible for congenital hypothyroidism in individual patients, both in sequence datasets that spanned the entire genome and in those that included only the protein-coding portion.

Hoehndorf envisions the tool becoming a part of the clinical geneticist’s diagnosis routine. For a patient with a suspected genetic disease, doctors could sequence that person’s genome, give a full clinical workup and then run the algorithm. “PVP should be able to identify the variant or variants causing the patient’s phenotypes (symptoms) directly in most cases,” he said.

Still, there’s room for improvement. Hoehndorf explained that PVP can find pathogenic DNA variants in genes that have already been implicated in disease, either in people or in lab organisms; however, around two-thirds of the protein-coding genes in mice still await full characterization. While more genes have been characterized in zebrafish, the evolutionary distance between fish and humans (and differences in experimental protocols) makes this kind of cross-species comparison more challenging.

“We desperately need more high-quality phenotype data from model organisms, in particular the mouse, to improve our system,” Hoehndorf said. 


  1. Boudellioua, I., Mahamad Razali, R.B., Kulmanov, M., Hashish, Y., Bajic, V.B., Goncalves-Serra, E., Schoenmakers, N., Gkoutos, G.V., Schofield, P.N. & Hoehndorf, R. Semantic prioritization of novel causative genomic variants. PLoS Computational Biology 13, e1005500 (2017).| article

You might also like