news and updates
05.07.16 | more informative messaging and some maintenance 21.07.13 | minor sanity improvements 18.10.11 | potential bug with label length fixed, label length limit is 40 characters (we cut the rest) 21.06.11 | service is up and running
Soon on GitHub.
background and instructions
MASiVE is a finetuned product of the BAT cave which focuses on the detection of full-length Sirevirus LTR retrotransposons in plant genomic sequences.
It works by identifying, in a stepwise manner, some of the highly conserved motifs of the Sirevirus genome,
thus eventually building intact elements with very high accuracy and considerable sensitivity.
| input format and options
You can provide a FASTA-formatted nucleotide sequence (up to 10Mb, use our SubSplit to split longer sequences) and let MASiVE run the full-version of the algorithm by default,
OR choose (by ticking appropriately) to search only for the multiple PPT signature, the most conserved motif present unequivocally in all Sireviruses across the plant kingdom, and lying at the core of the algorithm.
The latter is recommended i) for short sequences, as these are less likely to contain full-length Sireviruses which are usually more than 10,000bp in length - in contrast, the length of the signature does not normally exceed 1,000bp,
or ii) after an unsuccessful search for full-length elements - identification of the multiple PPT signature is a very strong indication that the query
sequence contains an incomplete (partially deleted) Sirevirus.
As there are several steps in the algorithm between identifying a multiple PPT signature and eventually building an intact element, please get in touch if you wish to conduct offline a highly informative and sensitive analysis on your sequence of interest.
Finally, you have the option to use the alternative, less widespread but still potentially useful for some plant species, set of oligomers making up the multiple PPT signature. For example, it works better in tomato.
After submission, MASiVE reports on the deployment of its step-wise algorithms and filters and their results, and finally reports a short report with
number and length of your sequence(s),
PPT oligomers found and multiple PPT signatures put together,
full-length Sireviruses. The final output, if any, is primarily a GFF file (look at the example below) with the coordinates and structure of the Sireviruses and/or their multiple PPT signatures. If intact Sireviruses are found, the GFF file includes their coordinate-based name (and D or P if the element is found in the sense or palindromic strand, respectively), the total and LTR lengths, and an estimation of the insertion age based on the divergence of the two LTRs. The sequence of each oligomer is printed, alongside its similarity to the default 8mer (AGGGGGAG) and 12mer (AGGGGGAGATTG) - we only accept up to one mismatch and no indels. The sequence composition of the alternative oligomers is AGGGGGAA and AGGGGGAAATTG for the 8mer and 12mer respectively. We also provide the FASTA-formatted sequences of the full-length elements. To aid visualization of the structure of the Sirevirus genome, you can activate the GenomeView (through Java webstart), which beautifully depicts the PPTs and internal LTR boundaries, and can be used as a starting point to annotate the element further (i.e. the retrotransposon genes etc.).
##gff-version 3 ##sequence-region example 1 11245 # example example MASiVE D12mer 8909 8920 100.00 + . AGGGGGAGATTG example MASiVE D8mer 8256 8263 100.00 + . AGGGGGAG example MASiVE D8mer 8513 8520 100.00 + . AGGGGGAG example MASiVE D8mer 8671 8678 87.50 + . AGGGGAAG example MASiVE Sirevirus 1126 10272 . + . ID=example-D-1126;length=9147;age=1.1654 example MASiVE long_terminal_repeat 1126 2479 . + . ID=example-D-1126-5prime;length=1354 example MASiVE long_terminal_repeat 8919 10272 . + . ID=example-D-1126-3prime;length=1354
hosted at the Bioinformatics Analysis Team / BAT