DOGEMS — Deconvolution of Genomes after En Masse Sequencing
Dan Russell
Feb 1, 2024

What is DOGEMS?

Deconvolution Of Genomes after En Masse Sequencing, or DOGEMS (DAH-jums), is an approach to sequencing phages where individually isolated phage DNA samples are deliberately mixed before preparing libraries. The main reasons to do so are to reduce library preparation costs and/or screen the diversity of a set of phages.

How it Works

Put simply: an equal amount (by weight) of DNA* from multiple phages is mixed, then a sequencing library is prepared from that pool and sequenced. (This differs from barcoding, which requires a separate library prep for each sample.) After sequencing, the resulting reads are assembled, and often both complete and partial phage genomes are identified. Using that sequence information, we move on to the "Deconvolution" part, which involves identifying which of the initial samples that went into the pool match which genomes. Deconvolution is most commonly done by designing PCR primers and running them on the initial samples, but other methods can be used such as restriction digest or homoimmunity patterns, or similarity to known phages of a particular host.

The deconvolution step is what distinguishes DOGEMS from metagenomics; in our case, we already have individual samples for each member in the mix, and therefore can match a genomic sequence with a specific biological sample.

Two (or Three, or Eight) for One

A simple DOGEMS example would be if we had isolated two phages, one on a Mycobacterium host and one on an Arthrobacter host, and wanted to sequence both. Instead of sequencing them separately, we mix their DNA, library prep, and sequence. When we assemble, we get two distinct circularized genome sequences. A quick BLAST tells us the first is from Cluster B (all Mycobacterium) and the other is from Cluster AK (all Arthrobacter). We can readily identify that the Cluster B genome goes with our Mycobacterium phage, and the Cluster AK genome belongs to our Arthrobacter phage. We've now sequenced two phages in a single library prep, saving time and money.

A common use case among SEA-PHAGES schools—whose students are isolating lots of novel phages—is to combine all DNAs from a lab section (often a handful to more than a dozen), then sequence that pool. This means that instead of choosing just one or two students' phages to receive the "honor" of being sequenced, the entire class can at least get some data. Because similar (but not identical) phage genomes are unlikely to assemble completely when sequenced en masse, adding more phages increases the possibility that some won't be completable just from a DOGEMS sequencing sample. At the same time, unique phage genomes will often be complete as they don't have conflicting assembly information.

In 2017, we at Pitt made a series of successive pools from 576 til-then-unsequenced lysates from our frozen lysate inventory. We did a single DNA extraction, and a single library prep, then sequenced and assembled the resulting reads. This resulted in the discovery of a new Singleton genome, which after a few rounds of PCR on various pools we were able to match to the phage Kumao, isolated by Lehigh University students in 2015. Kumao is still a Singleton now, and we wouldn't have found it or known it was unique without the huge DOGEMS run.

Coverage and Caveats

Of course, saving money on library prep doesn't really matter if you don't get enough reads to get decent depth of coverage on each of the samples included. But given how small our phage genomes are, and the modern landscape of low-cost sequencing, generally coverage is not a problem. If we're sequencing a DOGEMS sample with just two mixed DNAs, we don't alter our normal protocols at all, which has been to run 48 barcoded samples on a single MiSeq run. The DOGEMS sample would be just one of those 48. If we have a DOGEMS sample with more genomes, say 6-12, we're likely to boost that sample's relative abundance in our pool of 48, loading 2x to 3x what we do for the other samples. This approach generally provides enough reads to fully sample (and often complete) the individual genomes in the mix.

How you think about coverage, the number of samples to mix, and which samples to mix depends on the end goal. For example, a DOGEMS sample consisting of two phages from the same cluster is actually worse than sequencing each of those individually, because they'll interfere with one another's assemblies. But a mix of four phages, each from a host in a different genus, is very likely to yield four complete genomes, given enough coverage.


The Deconvolution Of Genomes after En Masse Sequencing (DOGEMS) approach can allow for lowering library prep costs, sampling more of a SEA-PHAGES' classroom's phages, or screening large pool for unique members. It consists of deliberately mixing phage DNAs, sequencing and assembling the pool, then deconvolution by matching genome sequences to distinct biological samples.

* Phage lysates may also be mixed, then DNA extracted from that pool and used for sequencing. But in this case it's harder to ensure balanced representation in the final library.