Motivation: High-throughput sequencing technologies have recently made deep interrogation of expressed transcript sequences practical, both economically and temporally. Identification of intron/exon boundaries is an essential part of genome annotation, yet remains a challenge. Here, we present supersplat, a method for unbiased splice-junction discovery through empirical RNA-seq data.
Results: Using a genomic reference and RNA-seq high-throughput sequencing datasets, supersplat empirically identifies potential splice junctions at a rate of ∼11.4 million reads per hour. We further benchmark the performance of the algorithm by mapping Illumina RNA-seq reads to identify introns in the genome of the reference dicot plant Arabidopsis thaliana and we demonstrate the utility of supersplat for de novo empirical annotation of splice junctions using the reference monocot plant Brachypodium distachyon.
Availability: Implemented in C++, supersplat source code and binaries are freely available on the web at http://mocklerlab-tools.cgrb.oregonstate.edu/