background and instructions

| intro
MASiVE is a finetuned product of the BAT cave which focuses on the detection of full-length Sirevirus LTR retrotransposons in plant genomic sequences. It works by identifying, in a stepwise manner, some of the highly conserved motifs of the Sirevirus genome, thus eventually building intact elements with very high accuracy and considerable sensitivity.

| input format and options
You can provide a FASTA-formatted nucleotide sequence (up to 10Mb, use our SubSplit to split longer sequences) and let MASiVE run the full-version of the algorithm by default, OR choose (by ticking appropriately) to search only for the multiple PPT signature, the most conserved motif present unequivocally in all Sireviruses across the plant kingdom, and lying at the core of the algorithm. The latter is recommended i) for short sequences, as these are less likely to contain full-length Sireviruses which are usually more than 10,000bp in length - in contrast, the length of the signature does not normally exceed 1,000bp, or ii) after an unsuccessful search for full-length elements - identification of the multiple PPT signature is a very strong indication that the query sequence contains an incomplete (partially deleted) Sirevirus. As there are several steps in the algorithm between identifying a multiple PPT signature and eventually building an intact element, please get in touch if you wish to conduct offline a highly informative and sensitive analysis on your sequence of interest.

Finally, you have the option to use the alternative, less widespread but still potentially useful for some plant species, set of oligomers making up the multiple PPT signature. For example, it works better in tomato.

| results
After submission, MASiVE reports on the deployment of its step-wise algorithms and filters and their results, and finally reports a short report with the:
    number and length of your sequence(s),
    PPT oligomers found and multiple PPT signatures put together,
    full-length Sireviruses.

The final output, if any, is primarily a GFF file (look at the example below) with the coordinates and structure of the Sireviruses and/or their multiple PPT signatures. If intact Sireviruses are found, the GFF file includes their coordinate-based name (and D or P if the element is found in the sense or palindromic strand, respectively), the total and LTR lengths, and an estimation of the insertion age based on the divergence of the two LTRs. The sequence of each oligomer is printed, alongside its similarity to the default 8mer (AGGGGGAG) and 12mer (AGGGGGAGATTG) - we only accept up to one mismatch and no indels. The sequence composition of the alternative oligomers is AGGGGGAA and AGGGGGAAATTG for the 8mer and 12mer respectively. We also provide the FASTA-formatted sequences of the full-length elements.

To aid visualization of the structure of the Sirevirus genome, you can activate the GenomeView (through Java webstart), which beautifully depicts the PPTs and internal LTR boundaries, and can be used as a starting point to annotate the element further (i.e. the retrotransposon genes etc.).

    ##gff-version 3
    ##sequence-region example 1 11245
    # example
    example	MASiVE	D12mer	8909	8920	100.00	+	.	AGGGGGAGATTG
    example	MASiVE	D8mer	8256	8263	100.00	+	.	AGGGGGAG
    example	MASiVE	D8mer	8513	8520	100.00	+	.	AGGGGGAG
    example	MASiVE	D8mer	8671	8678	87.50	+	.	AGGGGAAG
    example	MASiVE	Sirevirus	1126	10272	.	+	.	ID=example-D-1126;length=9147;age=1.1654
    example	MASiVE	long_terminal_repeat	1126	2479	.	+	.	ID=example-D-1126-5prime;length=1354
    example	MASiVE	long_terminal_repeat	8919	10272	.	+	.	ID=example-D-1126-3prime;length=1354

