Skip to main content


Barcoding long DNA quantifies CRISPR effects

A sequencing approach can home in on a rare mutation within a large number of cells, revealing implications for CRISPR genome editing and early cancer detection.

Assistant Professor Mo Li works on sequencing library preparation. © 2020 KAUST Jinna Xu.

Current sequencing techniques lack the sensitivity to detect rare gene mutations in a pool of cells, which is particularly important in early cancer detection, for example. Now, scientists at KAUST have developed an approach, called targeted individual DNA molecule sequencing (IDMseq), that can accurately detect a single mutation in a pool of 10,000 cells.

Importantly, the team successfully used IDMseq to determine the number and frequency of mutations caused by the gene editing tool, CRISPR/Cas9, in human embryonic stem cells. Clinical trials are underway to test CRISPR’s safety to treat some genetic diseases. “Our study revealed potential risks associated with CRISPR/Cas9 editing and provides tools to better study genome editing outcomes,” says KAUST bioscientist Mo Li, who led the study.

IDMseq is a sequencing technique that involves attaching a unique barcode to every DNA molecule in a sample of cells and then making a large number of copies of each molecule using a polymerase chain reaction (PCR). Copied molecules carry the same barcode as the original ones.

The sequencing setup for the study: an Oxford Nanopore sequencer and a laptop computer. The screen in background shows the DNA strand fed through the sequencer.

The sequencing setup for the study: an Oxford Nanopore sequencer and a laptop computer. The screen in background shows the DNA strand fed through the sequencer.

© 2020 KAUST Mo Li. 

A bioinformatics tool kit, called variant analysis with a unique molecular identifier for long-read technology (VAULT), then decodes the barcodes and places similar molecules into their own “bins,” with every bin representing one of the original DNA molecules. VAULT uses a combination of algorithms to detect mutations in the bins. The process works especially well with third-generation long-read sequencing technologies and helps scientists detect and determine the frequency of all types of mutations, from changes in single DNA letters to large deletions and insertions in the original DNA molecules.

The approach successfully detected a deliberately caused gene mutation that was mixed with a group of wild-type cells at ratios of 1:100, 1:1,000 and 1:10,000. It also correctly reported its frequency.

The researchers also used IDMseq to look for mutations caused by CRISPR/Cas9 genome editing. “Several recent studies have reported that Cas9 introduces large and unexpected DNA deletions around the edited genes, leading to safety concerns. These deletions are difficult to detect and quantitate using current DNA sequencing strategies. But our approach, in combination with various sequencing platforms, can analyze these large DNA mutations with high accuracy and sensitivity,” says Ph.D. student Chongwei Bi.

The tests found that large deletions accounted for 2.8-5.4 percent of Cas9 editing outcomes. They also discovered a three-fold rise in single-base DNA variants in the edited region. “This shows that there is a lot that we need to learn about CRISPR/Cas9 before it can be safely used in the clinic,” says Yanyi Huang of Peking University, who is an international collaborator co-funded by KAUST.

IDMseq can currently sequence only one DNA strand, but work to enable double-strand sequencing could further improve performance, say the researchers.


  1. Bi, C., Wang, L., Yuan, B., Zhou, X., Li, Y., Wang, S., Pang, Y., Gao, X., Huang, Y. & Li, M. Long-read individual-molecule sequencing reveals CRISPR-induced genetic heterogeneity in human ESCs. Genome Biology 21, 213 (2020).| article
You might also like