Parses the junctions outputted from process_junction_table() into an STAR compatible format (SJ.out) for more convenient use in downstream analyses. The columns strand, intron_motif and annotated will always be 0 (undefined) but can be derived through extracting the dinucleotide motifs for the given reference coordinates for canonical motifs. This function is an R-implementation of the Megadepth helper script, on which further details of column definitions can be found: https://github.com/ChristopherWilks/megadepth#junctions.

process_junction_table(all_jxs)

Arguments

all_jxs

A tibble::tibble() containing junction data ("all.jxs.tsv") generated by bam_to_junctions(all_junctions = TRUE) and imported through megadepth::read_junction_table().

Value

Processed junctions in a STAR-compatible format.

Examples


## Install if necessary
install_megadepth()
#> The latest megadepth version is 1.2.0
#> This is not an interactive session, therefore megadepth has been installed temporarily to 
#> /tmp/RtmpK16vxZ/megadepth

## Find the example BAM file
example_bam <- system.file("tests", "test.bam",
    package = "megadepth", mustWork = TRUE
)

## Run bam_to_junctions()
example_jxs <- bam_to_junctions(example_bam, overwrite = TRUE)

## Read the junctions in as a tibble
all_jxs <- read_junction_table(example_jxs[["all_jxs.tsv"]])

## Process junctions into a STAR-compatible format
processed_jxs <- process_junction_table(all_jxs)

processed_jxs
#> # A tibble: 5 × 8
#>   chr     start     end strand intron_motif annotated uniquely_mapping_reads
#>   <chr>   <dbl>   <dbl>  <dbl>        <dbl>     <dbl>                  <int>
#> 1 chr10 4358579 4581019      0            0         0                      0
#> 2 chr10 8458623 8778558      0            0         0                      0
#> 3 chr10 8722315 8848720      0            0         0                      1
#> 4 chr10 8722508 8870679      0            0         0                      1
#> 5 chr10 8756762 8780518      0            0         0                     20
#> # ℹ 1 more variable: multimapping_reads <int>