This service allowed to search for the presence and abundance of any sequence
within publicly available
raw next-generation sequencing datasets (and genomes) of SARS-CoV-2. The service has been discontinued (as of 2021), and no alternative exists. If you are interested in performing sequence search, please feel free to contact us!
Search
Input a DNA sequence to search:
Examples
Query sequence
Origin
Remarks
TCAAATTGGATGACAAAGATCCAAATTTCAA
NC_045512v2:29283-29313
Just a chunk of the SARS-CoV-2 genome, illustrating that most datasets indeed have it at high abundances
Sequences searches are performed as follows: the query sequence is broken down into all its overlapping 21-mers, and if any of those 21-kmers is absent from a dataset, the whole sequence is reported as absent in that dataset.
Otherwise, it is considered to be present and for raw sequencing datasets, we report the median abundance across all 21-mers of the query in the dataset.
For genomes, we only report the presence/absence of the query.
Thus, sequences searches are exact in the sense that they allow for no mutations between a query sequence and matching sequences in datasets.
However, this is not the same as doing a 'grep': a query is essentially seen as an unordered, de-duplicated set of 21-mers.
E.g. for the long polyA sample query, since all the constituent 21-mers are equal, the query is performed as if it was only a single 21-mer (disregarding the original query length).
Misc
Get the k-mer centered at a given position in the Covid19 genome (k=21, NC_045512v2)
Contact
The technology behind this service is REINDEER
(pre-print).
Contact for this website: rayan.chikhi@pasteur.fr
Department of Computational Biology, Institut Pasteur.
Website hosted by Information Systems at Institut Pasteur.
Project funded by ANR Transipedia (University Paris-Orsay, INSERM Montpellier, CNRS, Institut Pasteur)
and INCEPTION (PIA/ANR-16-CONV-0005).
Part of the PANGAIA H2020-MSCA-RISE-2019 network. Contributors