Overview of junctions extract command

The junctions extract command can be used to extract exon-exon junctions from an RNAseq BAM file. The output is a BED file in the BED12 format. We have tested this command with alignments from HISAT2, TopHat2, STAR, kallisto, and minimap2 and by comparing the exon-exon junctions with the junctions.bed file produced from TopHat.

Usage

regtools junctions extract [options] indexed_alignments.bam

Input

Input Description
indexed_alignments.bam Aligned RNAseq BAM/CRAM which has been indexed for example with samtools index. We have tested this command with alignments from TopHat.

Options

Option Description
-a Minimum anchor length. 8bp by default. Junctions having a minimum overlap of this much on both ends are reported. Note - the required overlap can be observed amongst separate reads, for example one read might have sufficient left overlap and another read might have sufficient right overlap, this is sufficient for the junction to be reported. No mismatches are allowed in the anchor regions.
-m Minimum intron size. 70bp by default. The intron size is the same as junction.end - junction.start. (Not to be confused with chromStart and chromEnd below, the required blockSizes need to be added/subtracted.)
-M Maximum intron size. 500,000bp by default. The intron size the same as junction.end - junction.start. (Not to be confused with chromStart and chromEnd below, the required blockSizes need to be added/subtracted.)
-o File to write output to. STDOUT by default.
-r Region to extract junctions in. This is specified in the format "chr:start-end". If not specified, junctions are extracted from the entire BAM file.
-h Display help message for this command.
-s Strand specificity of RNA library preparation, where the options XS, use XS tags provided by aligner; RF, first-strand; FR, second-strand. This option is required. If your alignments contain XS tags, these will be used in the "unstranded" mode. If you are unsure, we have created this table to help.
-b The file containing the barcodes of interest for single cell data.

Note Both junctions extract and cis-splice-effects identify have an intron-motif method that can be used to determine strandedness of junctions extracted from alignment files. Using this method supercedes any strandedness information that might be encoded in the alignment file. To use this method with junctions extract, you can add a fasta file at the end of your junctions extract command which will tell RegTools that you want the intron-motif method to take priority when assigning strandedness. e.g. regtools junctions extract [options] indexed_alignments.bam fasta.fa

Output

The output is in the BED12 format which is described in detail here. Each line is an exon-exon junction as explained below.

Column-name Description
chrom The name of the chromosome.
chromStart The starting position of the junction-anchor. This includes the maximum overhang for the junction on the left. For the exact junction start add blockSizes[0].
chromEnd The ending position of the junction-anchor. This includes the maximum overhang for the juncion on the left. For the exact junction end subtract blockSizes[1].
name The name of the junctions, the junctions are just numbered JUNC1 to JUNCn.
score The number of reads supporting the junction.
strand Defines the strand - either '+' or '-'. This is calculated using the XS tag in the BAM file.
thickStart Same as chromStart.
thickEnd Same as chromEnd.
itemRgb RGB value - "255,0,0" by default.
blockCount The number of blocks, 2 by default.
blockSizes A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.
blockStarts A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.