Search sequences

Search nucleotide sequences in plasmids. You can upload a sample represented by multiple sequences (e.g. reads or contigs) for containment or comparison analysis, or search for shorter sequences (e.g. genes).

Sequences to be searched in plasmids
How to search in the plasmids
Maximal p-value to report, between 0 and 100, default is 0.1
Maximal distance to report, between 0 and 100, default is 0.1
Minimal identity, between 0 and 1, default is 0.99
Hashes found in multiple queries will be removed except for the query with highest identity. Removes redundancy from the output.
Process each sequence individually rather than processing them together as a whole, i.e. distances are estimated per sequence and not for the complete FASTA.
With this feature activated, using FASTA files with more than 500 sequences is not advisable because it can exceed the runtime limit.
Minimal percentage identity, default is 60
Minimal query coverage per HSP, default is 90
Minimal query coverage per HSP, default is 90

Search strategy:
  • mash screen: Which plasmids are contained in the input sample?
  • mash dist: Is there any plasmid similar to the input sample?
  • blastn: Which plasmids contain given nucleotide sequences?
  • tblastn: Which plasmids contain given protein sequences?

Input requirements
Mash search: maximal file size of 500.0 MB
BLASTn/tBLASTn search: sequence length of 100 - 5000 bp, max. 10 sequences, unique and non-empty IDs, sequence alphabet IUPACAmbiguousDNA or ExtendedIUPACProtein
FASTA file is checked upon submission.
For BLASTn search, the uploaded FASTA is filtered to remove any sequence record not fulfilling the above requirements.