MetaSRA (Bernstein, Doan, and Dewey, 2017) contains “normalized metadata for the Sequence Read Archive” which is constructed using the SRA Run Info tables. The MetaSRA (Bernstein, Doan, and Dewey, 2017) authors provide a website where you can query the samples by term such as the brain which leads to metasra.biostat.wisc.edu/?and=UBERON:0000955. As of April 15th, 2019 they have 17,890 brain samples from 342 studies listed.

1 Data setup

We can download the data using the following link:

## April 15, 2019
wget http://metasra.biostat.wisc.edu/api/v01/samples.csv?and=UBERON:0000955

Next, we load the required R packages.

library('recount')
library('tidyverse')

Now we can get all the required data

## Read the MetaSRA data
metasra <- read.csv('samples.csv?and=UBERON:0000955')
head(metasra)
##    study_id                                                               study_title  sample_id sample_name
## 1 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341305            
## 2 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341312            
## 3 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341782            
## 4 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341315            
## 5 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341318            
## 6 SRP052546 Single Cell Analysis Program-Transcriptomics (SCAP-T) (UC San Diego site) SRS1341322            
##     sample_type sample_type_confidence                                     mapped_ontology_ids
## 1 primary cells              0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 2 primary cells              0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 3 primary cells              0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 4 primary cells              0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 5 primary cells              0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
## 6 primary cells              0.8799537 CL:0000540, EFO:0003534, UBERON:0003100, UBERON:0013541
##                                                    mapped_ontology_terms
## 1 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 2 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 3 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 4 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 5 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
## 6 neuron, dorsal telencephalon, female organism, Brodmann (1909) area 10
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                 raw_SRA_metadata
## 1 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 2 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 3 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 4 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 5 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## 6 analyte type: RNA; biospecimen repository: SCAP-T; body site: Left Brodmann's Area 10 (Prefrontal Cortex); gap_accession: phs000834; gap_consent_code: 1; gap_consent_short_name: GRU; gap_parent_phs: phs000833; histological type: Brain cell (Neuron); is technical control: Yes; is tumor: No; molecular data type: RNA Seq (NGS); sex: female; study design: Case Set; study name: Single Cell Analysis Program - Transcriptome (SCAP-T) (UCSD); submitter handle: SCAP-T
## Get the unique 342 studies
metasra_study <- unique(metasra$study_id)
stopifnot(length(metasra_study) == 342)

## Get the recount2 metadata
meta <- all_metadata()
## 2020-11-13 16:25:45 downloading the metadata to /tmp/RtmpK9pZcs/metadata_clean_sra.Rdata
## Load the predictions
PredictedPhenotypes <- add_predictions(version = '0.0.03')
## 2020-11-13 16:25:47 downloading the predictions to /tmp/RtmpK9pZcs/PredictedPhenotypes_v0.0.03.rda
## Loading objects:
##   PredictedPhenotypes
PredictedPhenotypes_latest <- add_predictions(version = '0.0.06')
## 2020-11-13 16:25:48 downloading the predictions to /tmp/RtmpK9pZcs/PredictedPhenotypes_v0.0.06.rda
## Loading objects:
##   PredictedPhenotypes
## Get recount-brain using the recount Bioconductor package
recount_brain <- add_metadata(source = 'recount_brain_v2')
## 2020-11-13 16:25:48 downloading the recount_brain metadata to /tmp/RtmpK9pZcs/recount_brain_v2.Rdata
## Loading objects:
##   recount_brain

2 General comparison

2.1 MetaSRA to recount_brain

First, we can check how many studies with at least one brain sample as detected with MetaSRA are in either recount2 or recount_brain.

## using tolower() doesn't change any of these numbers
addmargins(table(
    'In recount2' = metasra_study %in% recount_abstract$project,
    'In recount-brain' = metasra_study %in% unique(recount_brain$sra_study_s)
))
##            In recount-brain
## In recount2 FALSE TRUE Sum
##       FALSE   195    0 195
##       TRUE    100   47 147
##       Sum     295   47 342
## In percent
addmargins(table(
    'In recount2' = metasra_study %in% recount_abstract$project,
    'In recount-brain' = metasra_study %in% unique(recount_brain$sra_study_s)
)) / length(metasra_study) * 100
##            In recount-brain
## In recount2     FALSE      TRUE       Sum
##       FALSE  57.01754   0.00000  57.01754
##       TRUE   29.23977  13.74269  42.98246
##       Sum    86.25731  13.74269 100.00000
## Studies in MetaSRA and recount2 but not in recount_brain
studies_to_check <- metasra_study[
    metasra_study %in% recount_abstract$project &
    !metasra_study %in% unique(recount_brain$sra_study_s)
]

As a check, anything in recount_brain has to be in recount2 by construction. We’ll later take a deeper look at the 100 studies present in MetaSRA and recount2 yet absent from recount_brain (excluding TCGA).

At the sample level we can find samples present in recount_brain absent from recount2 which is not unexpected (recount2 was built to be only human RNA-seq samples). All the samples present in MetaSRA and recount2 yet absent from recount_brain from the studies we wanted to check.

## using tolower() doesn't change any of these numbers
addmargins(table(
    'In recount2' = metasra$sample_id %in% meta$sample,
    'In recount-brain' = metasra$sample_id %in% recount_brain$sra_sample_s
))
##            In recount-brain
## In recount2 FALSE  TRUE   Sum
##       FALSE 13291  1411 14702
##       TRUE   2026  1162  3188
##       Sum   15317  2573 17890
## in percent
addmargins(table(
    'In recount2' = metasra$sample_id %in% meta$sample,
    'In recount-brain' = metasra$sample_id %in% recount_brain$sra_sample_s
)) / nrow(metasra) * 100
##            In recount-brain
## In recount2      FALSE       TRUE        Sum
##       FALSE  74.292901   7.887088  82.179989
##       TRUE   11.324762   6.495249  17.820011
##       Sum    85.617663  14.382337 100.000000
## Samples in MetaSRA and recount2 but not in recount_brain
samples_to_check <- metasra$sample_id[
    metasra$sample_id %in% meta$sample &
    !metasra$sample_id %in% recount_brain$sra_sample_s
]

## All of them are from the studies we need to check
table(unique(meta$project[meta$sample %in% samples_to_check]) %in%
    studies_to_check)
## 
## TRUE 
##   97

Note that these results exclude TCGA since they don’t have SRA sample IDs.

table('Has SRA sample id' = !is.na(recount_brain$sra_sample_s), recount_brain$Dataset)
##                  
## Has SRA sample id GTEX recount_brain_v1 TCGA
##             FALSE    0                0  707
##             TRUE  1409             4431    0

2.2 recount_brain to MetaSRA

We can also do the reverse check and ask which studies or samples present in recount_brain are present in MetaSRA.

## At the study level
addmargins(table(
    'In MetaSRA (project)' = unique(recount_brain$sra_study_s) %in%
    metasra_study
))
## In MetaSRA (project)
## FALSE  TRUE   Sum 
##    17    47    64
## in percent
addmargins(table(
    'In MetaSRA (project)' = unique(recount_brain$sra_study_s) %in% metasra_study
)) / length(unique(recount_brain$sra_study_s)) * 100
## In MetaSRA (project)
##    FALSE     TRUE      Sum 
##  26.5625  73.4375 100.0000
## At the sample level
## Check whether it's all the large study SRP025982
addmargins(table(
    'In MetaSRA (sample)' = recount_brain$sra_sample_s %in% metasra$sample_id,
    'SRP025982' = recount_brain$sra_study_s == 'SRP025982',
    'Dataset' = recount_brain$Dataset, useNA = 'ifany'
))
## , , Dataset = GTEX
## 
##                    SRP025982
## In MetaSRA (sample) FALSE TRUE <NA>  Sum
##               FALSE     0    0    0    0
##               TRUE   1409    0    0 1409
##               Sum    1409    0    0 1409
## 
## , , Dataset = recount_brain_v1
## 
##                    SRP025982
## In MetaSRA (sample) FALSE TRUE <NA>  Sum
##               FALSE   659 2475    0 3134
##               TRUE    874  423    0 1297
##               Sum    1533 2898    0 4431
## 
## , , Dataset = TCGA
## 
##                    SRP025982
## In MetaSRA (sample) FALSE TRUE <NA>  Sum
##               FALSE     0    0  707  707
##               TRUE      0    0    0    0
##               Sum       0    0  707  707
## 
## , , Dataset = Sum
## 
##                    SRP025982
## In MetaSRA (sample) FALSE TRUE <NA>  Sum
##               FALSE   659 2475  707 3841
##               TRUE   2283  423    0 2706
##               Sum    2942 2898  707 6547
## Ok, it's not all SRP025982 so we can drop that comparison
## and show the table in percent
addmargins(table(
    'In MetaSRA (sample)' = recount_brain$sra_sample_s %in% metasra$sample_id,
    'Dataset' = recount_brain$Dataset, useNA = 'ifany'
)) / nrow(recount_brain) * 100
##                    Dataset
## In MetaSRA (sample)      GTEX recount_brain_v1      TCGA       Sum
##               FALSE   0.00000         47.86925  10.79884  58.66809
##               TRUE   21.52131         19.81060   0.00000  41.33191
##               Sum    21.52131         67.67985  10.79884 100.00000

From these checks, we can see that 26.6% of the recount_brain studies and 58.7% of the samples are missing from MetaSRA, respectively.

3 Studies to check

Lets take a deeper look at the 100 studies present in MetaSRA and recount2 yet absent from recount_brain. The recount package already has the study abstract and number of samples information. We can then construct the URL to explore manually these discrepant studies. Next, we can look at the phenopredict (Ellis, Collado-Torres, Jaffe, and Leek, 2018) predictions we used (version 0.0.03) for selecting the studies as well and the latest (0.0.06) predictions. The prediction table also includes a reported_tissue. Along with the predictions and the reported tissue, we can also look at MetaSRA to identify the number of brain samples according to each source and the percent of brain samples per study. We can then evaluate whether the study passed or not our selection criteria of at least 4 brain samples with 70 percent of the study samples coming from the brain.

## Lets get the study-level information already present in the recount package
discrepant <- subset(recount_abstract, project %in% studies_to_check)

## Does the abstract mention the word brain?
discrepant$mentions_brain <- grepl('brain', tolower(discrepant$abstract))

## Next, the url
discrepant$url <- paste0(
    'https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=',
    discrepant$project
)

## Order by decreasing number of samples
discrepant <- discrepant[order(discrepant$number_samples, decreasing = TRUE), ]

## Get information at the sample level for each project
discrepant_studies_samples <- map(discrepant$project, function(x) {
    y <- meta$run[meta$project == x]
    m <- match(y, PredictedPhenotypes$sample_id)
    m2 <- match(y, PredictedPhenotypes_latest$sample_id)
    
    data.frame(
        prediction_original = PredictedPhenotypes$predicted_tissue[m],
        prediction_latest = PredictedPhenotypes_latest$predicted_tissue[m2],
        sharq = PredictedPhenotypes_latest$reported_tissue[m2],
        project = x,
        sample_id = y,
        stringsAsFactors = FALSE
    )
})

## Summarize the information found for each study
discrepant <- cbind(discrepant, map_dfr(discrepant_studies_samples, function(x) {
    
    data.frame(
        brain_n_original = sum(x$prediction_original == 'Brain', na.rm = TRUE),
        brain_n_latest = sum(x$prediction_latest == 'Brain', na.rm = TRUE),
        brain_n_sharq = sum(x$sharq == 'Brain', na.rm = TRUE),
        brain_n_metasra = sum(metasra$study_id == unique(x$project)),
        brain_percent_original = sum(x$prediction_original == 'Brain',
            na.rm = TRUE) / nrow(x) * 100,
        brain_percent_latest = sum(x$prediction_latest == 'Brain',
            na.rm = TRUE) / nrow(x) * 100,
        brain_percent_sharq = sum(x$sharq == 'Brain',
            na.rm = TRUE) / nrow(x) * 100,
        brain_percent_metasra = sum(metasra$study_id == unique(x$project)) /
            nrow(x) * 100,
        stringsAsFactors = FALSE
    )
    
}))

## Does it match the original criterial of at least 4 samples and greater than
## 70% brain samples in the study?
discrepant$criteria_original <- discrepant$number_samples >= 4 &
    discrepant$brain_percent_original > 70
discrepant$criteria_latest <- discrepant$number_samples >= 4 &
    discrepant$brain_percent_latest > 70
discrepant$criteria_sharq <- discrepant$number_samples >= 4 &
    discrepant$brain_percent_sharq > 70
discrepant$criteria_metasra <- discrepant$number_samples >= 4 &
    discrepant$brain_percent_metasra > 70

## Check the original criteria is all FALSE since they are absent from recount_brain
stopifnot(all(!discrepant$criteria_original))

Now that we have our detailed table for these 100 studies, we can look into them in more detail.

addmargins(with(discrepant,
    table(criteria_latest, criteria_sharq, criteria_metasra)))
## , , criteria_metasra = FALSE
## 
##                criteria_sharq
## criteria_latest FALSE TRUE Sum
##           FALSE    68    4  72
##           TRUE      0    0   0
##           Sum      68    4  72
## 
## , , criteria_metasra = TRUE
## 
##                criteria_sharq
## criteria_latest FALSE TRUE Sum
##           FALSE    19    4  23
##           TRUE      2    3   5
##           Sum      21    7  28
## 
## , , criteria_metasra = Sum
## 
##                criteria_sharq
## criteria_latest FALSE TRUE Sum
##           FALSE    87    8  95
##           TRUE      2    3   5
##           Sum      89   11 100

From the above output we can see that 28 of the 100 studies would match our study criteria had we used MetaSRA, which includes 5 studies that now also match our criteria using the version 0.0.06 predictions.

## all
ggplot(discrepant,
    aes(x = brain_percent_original, y = brain_percent_latest,
        color = criteria_latest, size = number_samples,
        shape = criteria_metasra)) +
    geom_point() +
    facet_grid( ~ criteria_sharq) +
    geom_abline(linetype = 3, color = 'purple') +
    labs(caption = 'Panels by criteria_sharq')

## just those with some TRUE criteria
ggplot(subset(discrepant,
    criteria_sharq | criteria_metasra | criteria_latest),
    aes(x = brain_percent_original, y = brain_percent_latest,
        color = criteria_latest, size = number_samples,
        shape = criteria_metasra)) +
    geom_point() +
    facet_grid( ~ criteria_sharq) +
    geom_abline(linetype = 3, color = 'purple') +
    labs(caption = 'Panels by criteria_sharq')

There are 4 studies that only pass the selection criteria based on the reported_tissue information present in the predictions table. The reported_tissue was extracted from SHARQ prototype as described in the phenopredict manuscript (Ellis, Collado-Torres, Jaffe, and Leek, 2018).

subset(discrepant, criteria_sharq & !(criteria_metasra | criteria_latest))
##     number_samples species
## 847             24   human
## 789             16   human
## 268              4   human
## 341              4   human
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             abstract
## 847                                                                                                                                                                                                                                              Purpose: The purpose of this experiment is to identify a C9-ALS/FTD specific genomic profile in fibroblast lines that is distinct from sporadic ALS without C9orf72 expansion and non-neurologic control cells. The study will then evaluate the effect on this identified profile of ASO treatment targeting the sense strand RNA transcript of the C9orf72 gene. Methods: Expression profiling was performed on RNAs from fibroblasts of four C9orf72 patients, four control individuals and four sporadic ALS patients using Multiplex Analysis of PolyA-linked Sequences method. Results: Hierarchical clustering of expression values for all genes showed that the four C9orf72 patient lines had an expression profile distinct from control and sporadic ALS lines. Statistical comparison of expression values between the four C9orf72 lines and the four control lines revealed that 122 genes were upregulated (defined by a False Discovery Rate FDR<0.05) and 34 genes were downregulated (defined by a False Discovery Rate FDR <0.05) in C9orf72 patient fibroblasts. Conclusions: A genome wide RNA signature can be defined in fibroblasts with C9orf72 expansion. ASO-mediated reduction of C9orf72 RNA levels in fibroblasts with the hexanucleotide expansion efficiently reduced accumulation of GGGGCC RNA foci. This did not, however, generate a reversal of the C9orf72 RNA profile. Overall design: Use of Multiplex Analysis of PolyA-linked Sequences to identify expression changes in fibroblasts from amyotrophic lateral sclerosis and frontotemporal dementia patients harboring an hexanucleotide expansion in the C9orf72 gene.
## 789                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          MiRNAs are important negative regulators of protein coding gene expression, and have been studied intensively over the last few years. Several measurement platforms, designed to determine their relative RNA abundance levels in biological samples, have been developed. In this study, we systematically compared 12 commercially available miRNA expression platforms by measuring an identical set of 20 standardized positive and negative control samples, including human universal reference RNA, human brain RNA and titrations thereof, human serum samples, and synthetic spikes from miRNA family members with varying homology. We developed novel and robust quality metrics to objectively assess platform performance of very different technologies such as small RNA sequencing, RT-qPCR and (microarray) hybridization. We assessed reproducibility, sensitivity, accuracy, specificity, and concordance of differential expression. The results indicate that each method has its strengths and weaknesses, which helps to guide informed selection of a quantitative miRNA gene expression platform in function of particular study goals. Overall design: Sequencing of 20 miRQC samples on Illumina Genome Analyzer IIx System
## 268                                                                                                      Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era. Here, we present a method, allele-specific alternative mRNA processing (ASARP), to identify genetically influenced mRNA processing events using transcriptome sequencing (RNA-Seq) data. The method examines RNA-Seq data at both single nucleotide and whole-gene/isoform levels to identify allele-specific expression (ASE) and existence of allele-specific regulation of mRNA processing. We applied the methods to data obtained from the human glioblastoma cell line U87MG and primary breast cancer tissues and found that 26ââ\u0082¬â\u0080\u009c45% of all genes with sufficient read coverage demonstrated ASE, with significant overlap between the two cell types. Our methods predicted potential mechanisms underlying ASE due to regulations affecting either whole-gene-level expression or alternative mRNA processing, including alternative splicing, alternative polyadenylation and alternative transcriptional initiation. Allele-specific alternative splicing and alternative polyadenylation may explain ASE in hundreds of genes in each cell type. Reporter studies following these predictions identified the causal single nucleotide variants (SNVs) for several allele-specific alternative splicing events. Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies. Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms. Overall design: Examine allele-specific gene expression and alternative RNA processing in U87MG cell line
## 341 RNA editing enhances the diversity of gene products at the post-transcriptional level. Approaches for genome-wide identification of RNA editing face two main challenges: separating true editing sites from false discoveries and accurate estimation of editing levels. We developed an approach to analyze transcriptome sequencing data (RNA-Seq) for global identification of RNA editing in cells for which whole-genome sequencing data are available. We applied the method to analyze RNA-Seq data of a human glioblastoma cell line, U87MG. Around 10,000 DNA-RNA differences were identified, the majority being putative A-to-I editing sites. These predicted A-to-I events were associated with a low false discovery rate (~5%). Moreover, the estimated editing levels from RNA-Seq correlated well with those based on traditional clonal sequencing. Our results further facilitated unbiased characterization of the sequence and evolutionary features flanking predicted A-to-I editing sites and discovery of a conserved RNA structural motif that may be functionally relevant to editing. Genes with predicted A-to-I editing were significantly enriched with those known to be involved in cancer, supporting the potential importance of cancer-specific RNA editing. A similar profile of DNA-RNA differences as in U87MG was predicted for another RNA-Seq data set obtained from primary breast cancer samples. Remarkably, significant overlap exists between the putative editing sites of the two transcriptomes despite their difference in cell type, cancer type and genomic backgrounds. Our approach enabled de novo identification of the RNA editome, which sets the stage for further mechanistic studies of this important step of post-transcriptional regulation. Overall design: Examine mRNA expression in U87MG cells following ADAR1 or control siRNA knockdown
##       project mentions_brain                                                        url brain_n_original brain_n_latest
## 847 SRP032165          FALSE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP032165                5              3
## 789 SRP028738           TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP028738                2              3
## 268 SRP006970          FALSE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP006970                0              0
## 341 SRP009659          FALSE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP009659                0              0
##     brain_n_sharq brain_n_metasra brain_percent_original brain_percent_latest brain_percent_sharq brain_percent_metasra
## 847            24               6               20.83333                12.50                 100                  25.0
## 789            12               2               12.50000                18.75                  75                  12.5
## 268             4               1                0.00000                 0.00                 100                  25.0
## 341             4               2                0.00000                 0.00                 100                  50.0
##     criteria_original criteria_latest criteria_sharq criteria_metasra
## 847             FALSE           FALSE           TRUE            FALSE
## 789             FALSE           FALSE           TRUE            FALSE
## 268             FALSE           FALSE           TRUE            FALSE
## 341             FALSE           FALSE           TRUE            FALSE

These are the 5 studies that would pass the selection criteria with the latest predictions which would all pass it with MetaSRA data.

subset(discrepant, criteria_latest)
##      number_samples species
## 1669            466   human
## 438              24   human
## 665              20   human
## 1862              7   human
## 567               5   human
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  abstract
## 1669                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          We used single cell RNA sequencing on 466 cells to capture the cellular complexity of the adult and fetal human brain at a whole transcriptome level. Healthy adult temporal lobe tissue was obtained from epileptic patients during temporal lobectomy for medically refractory seizures. We were able to classify individual cells into all of the major neuronal, glial, and vascular cell types in the brain. Overall design: Examination of cell types in healthy human brain samples.
## 438                                                                                                                                                                                                                                                                                                                                                                                                                                                                     The expansion of the neocortex during mammalian brain evolution results primarily from an increase in neural progenitor cell divisions in its two principal germinal zones during development, the ventricular zone (VZ) and the subventricular zone (SVZ). Using mRNA sequencing, we analyzed the transcriptomes of fetal human and embryonic mouse VZ, SVZ and cortical plate (CP). We describe sets of genes that are up- or down-regulated in each germinal zone. These data suggest that cell adhesion and cell-extracellular matrix (ECM) interactions promote the proliferation and self-renewal of neural progenitors in the developing human neocortex. Notably, relevant ECM-associated genes include distinct sets of collagens, laminins, proteoglycans and integrins, along with specific sets of growth factors and morphogens. Our data establish a basis for identifying novel cell-type markers and open up avenues to unravel the molecular basis of neocortex expansion during evolution. Overall design: Total RNA was isolated from the VZ, inner SVZ (ISVZ), outer SVZ (OSVZ) and CP of six 13-16 weeks post-conception (w.p.c.) human fetuses and from the VZ, SVZ and CP of five E14.5 mouse embryos using laser capture microdissection of Nissl-stained cryosections of dorsolateral telencephalon. Poly A+ RNA was used as template for the preparation of cDNA which were then subjected to single-end 76-bp RNA-Seq.
## 665  MicroRNAs (miRNAs) are small (20-22 nucleotides) regulatory non-coding RNAs that strongly influence gene expression. Most prior studies addressing the role of miRNAs in neurodegenerative diseases (NDs) have focused on individual controls (n = 2), AD (n = 5), dementia with Lewy bodies (n = 4), hippocampal sclerosis of aging (n = 4), and frontotemporal lobar dementia (FTLD) (n = 5) cases, together accounting for the most prevalent ND subtypes. All cases had short postmortem intervals, relatively high-quality RNA, and state-of-the-art neuropathological diagnoses. The resulting data (over 113 million reads in total, averaging 5.6 million reads per sample) and secondary expression analyses constitute an unprecedented look into the human cerebral cortical miRNome at single nucleotide resolution. While we find no apparent changes in isomiR or miRNA editing patterns in correlation with ND pathology, our results validate and extend previous miRNA profiling studies with regard to quantitative changes in NDs. In agreement with this idea, we provide independent cohort validation for changes in miR-132 expression levels in AD (n = 8) and FTLD (n = 14) cases when compared to controls (n = 8). The identification of common and ND-specific putative novel brain miRNAs and/or short-hairpin molecules is also presented. The challenge now is to better understand the impact of these and other alterations on neuronal gene expression networks and neuropathologies. Overall design: Using RNA deep sequencing, we sought to analyze in detail the small RNAs (including miRNAs) in the temporal neocortex gray matter from non-demented controls (n = 2), AD (n = 5), dementia with Lewy bodies (n = 4), hippocampal sclerosis of aging (n = 4), and frontotemporal lobar dementia (FTLD) (n = 5) cases, together accounting for the most prevalent ND subtypes.
## 1862                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Neuronal migration defects (NMDs) are among the most common and severe brain abnormalities in humans. Lack of disease models in mice or in human cells has hampered the identification of underlying mechanisms. From patients with severe NMDs we generated iPSCs then differentiated neural progenitor cells (NPCs). On artificial extracellular matrix, patient-derived neuronal cells showed defective migration and impaired neurite outgrowth. From a cohort of 107 families with NMDs, sequencing identified two homozygous C-terminal truncating mutations in CTNNA2, encoding aN-catenin, one of three paralogues of the a-catenin family, involved in epithelial integrity and cell polarity. Patient-derived or CRISPR-targeted CTNNA2- mutant neuronal cells showed defective migration and neurite stability. Recombinant aN-catenin was sufficient to bundle purified actin and to suppress the actin-branching activity of ARP2/3. Small molecule inhibitors of ARP2/3 rescued the CTNNA2 neurite defect. Thus, disease modeling in human cells could be used to understand NMD pathogenesis and develop treatments for associated disorders. Overall design: 2 biological replicates per individual (2 iPSC clone differentiations), excluding 1263A, which has one sample
## 567                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    TAF15, an RNA binding protein was recently implicated in Amyotrophic Lateral Sclerosis (ALS). ALS is a fatal neurodegenerative disease. We report the identification of  the conserved neuronal RNA targets of TAF15 and the assessment of the impact of TAF15 depletion on the neuronal transcriptome. Our study uncovers regulation of splicing of sets of neuronal RNAs encoding proteins with essential roles in synaptic activities including glutamergic receptors such as zeta-1 subunit of the glutamate N-methyl-D-aspartate (NMDA) receptor (Grin1). Overall design: Identification of TAF15 neuronal targets using normal human brain samples and mouse neurons.  Mouse background: E14Tg2a.4 wildtype cells derived from 129P2/OlaHsd.
##        project mentions_brain                                                        url brain_n_original
## 1669 SRP057196           TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP057196              201
## 438  SRP013825           TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP013825               14
## 665  SRP021130           TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP021130               13
## 1862 SRP063669           TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP063669                0
## 567  SRP017777           TRUE https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP017777                3
##      brain_n_latest brain_n_sharq brain_n_metasra brain_percent_original brain_percent_latest brain_percent_sharq
## 1669            350             0             466               43.13305             75.10730             0.00000
## 438              23            23              24               58.33333             95.83333            95.83333
## 665              18            19              20               65.00000             90.00000            95.00000
## 1862              6             0               7                0.00000             85.71429             0.00000
## 567               4             5               5               60.00000             80.00000           100.00000
##      brain_percent_metasra criteria_original criteria_latest criteria_sharq criteria_metasra
## 1669                   100             FALSE            TRUE          FALSE             TRUE
## 438                    100             FALSE            TRUE           TRUE             TRUE
## 665                    100             FALSE            TRUE           TRUE             TRUE
## 1862                   100             FALSE            TRUE          FALSE             TRUE
## 567                    100             FALSE            TRUE           TRUE             TRUE

To explore the table in more detail, open the discrepant_studies.csv file.

write.csv(discrepant, file = 'discrepant_studies.csv')

4 Reproducibility

This document was made possible thanks to MetaSRA (Bernstein, Doan, and Dewey, 2017) and :

Code for creating this document

## Create the vignette
library('rmarkdown')
system.time(render('metasra_comp.Rmd', 'BiocStyle::html_document'))

Reproducibility information for this document.

## Reproducibility info
proc.time()
##    user  system elapsed 
##  75.865   5.957 135.840
message(Sys.time())
## 2020-11-13 16:25:52
options(width = 120)
library('sessioninfo')
session_info()
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                                      
##  version  R version 4.0.2 Patched (2020-06-24 r78746)
##  os       CentOS Linux 7 (Core)                      
##  system   x86_64, linux-gnu                          
##  ui       X11                                        
##  language (EN)                                       
##  collate  en_US.UTF-8                                
##  ctype    en_US.UTF-8                                
##  tz       US/Eastern                                 
##  date     2020-11-13                                 
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package              * version  date       lib source        
##  AnnotationDbi          1.50.3   2020-07-25 [2] Bioconductor  
##  askpass                1.1      2019-01-13 [2] CRAN (R 4.0.0)
##  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.0.0)
##  backports              1.2.0    2020-11-02 [1] CRAN (R 4.0.2)
##  base64enc              0.1-3    2015-07-28 [2] CRAN (R 4.0.0)
##  bibtex                 0.4.2.3  2020-09-19 [2] CRAN (R 4.0.2)
##  Biobase              * 2.48.0   2020-04-27 [2] Bioconductor  
##  BiocFileCache          1.12.1   2020-08-04 [2] Bioconductor  
##  BiocGenerics         * 0.34.0   2020-04-27 [2] Bioconductor  
##  BiocManager            1.30.10  2019-11-16 [2] CRAN (R 4.0.0)
##  BiocParallel           1.22.0   2020-04-27 [2] Bioconductor  
##  BiocStyle            * 2.16.1   2020-09-25 [1] Bioconductor  
##  biomaRt                2.44.4   2020-10-13 [2] Bioconductor  
##  Biostrings             2.56.0   2020-04-27 [2] Bioconductor  
##  bit                    4.0.4    2020-08-04 [2] CRAN (R 4.0.2)
##  bit64                  4.0.5    2020-08-30 [2] CRAN (R 4.0.2)
##  bitops                 1.0-6    2013-08-17 [2] CRAN (R 4.0.0)
##  blob                   1.2.1    2020-01-20 [2] CRAN (R 4.0.0)
##  bookdown               0.21     2020-10-13 [1] CRAN (R 4.0.2)
##  broom                  0.7.2    2020-10-20 [2] CRAN (R 4.0.2)
##  BSgenome               1.56.0   2020-04-27 [2] Bioconductor  
##  bumphunter             1.30.0   2020-04-27 [2] Bioconductor  
##  callr                  3.5.1    2020-10-13 [2] CRAN (R 4.0.2)
##  cellranger             1.1.0    2016-07-27 [2] CRAN (R 4.0.0)
##  checkmate              2.0.0    2020-02-06 [2] CRAN (R 4.0.0)
##  cli                    2.1.0    2020-10-12 [2] CRAN (R 4.0.2)
##  cluster                2.1.0    2019-06-19 [3] CRAN (R 4.0.2)
##  codetools              0.2-16   2018-12-24 [3] CRAN (R 4.0.2)
##  colorspace             1.4-1    2019-03-18 [2] CRAN (R 4.0.0)
##  crayon                 1.3.4    2017-09-16 [2] CRAN (R 4.0.0)
##  curl                   4.3      2019-12-02 [2] CRAN (R 4.0.0)
##  data.table             1.13.2   2020-10-19 [2] CRAN (R 4.0.2)
##  DBI                    1.1.0    2019-12-15 [2] CRAN (R 4.0.0)
##  dbplyr                 2.0.0    2020-11-03 [1] CRAN (R 4.0.2)
##  DelayedArray         * 0.14.1   2020-07-14 [2] Bioconductor  
##  derfinder              1.22.0   2020-04-27 [2] Bioconductor  
##  derfinderHelper        1.22.0   2020-04-27 [2] Bioconductor  
##  desc                   1.2.0    2018-05-01 [2] CRAN (R 4.0.0)
##  devtools             * 2.3.2    2020-09-18 [2] CRAN (R 4.0.2)
##  digest                 0.6.27   2020-10-24 [1] CRAN (R 4.0.2)
##  doRNG                  1.8.2    2020-01-27 [2] CRAN (R 4.0.0)
##  downloader             0.4      2015-07-09 [2] CRAN (R 4.0.0)
##  dplyr                * 1.0.2    2020-08-18 [2] CRAN (R 4.0.2)
##  ellipsis               0.3.1    2020-05-15 [2] CRAN (R 4.0.0)
##  evaluate               0.14     2019-05-28 [2] CRAN (R 4.0.0)
##  fansi                  0.4.1    2020-01-08 [2] CRAN (R 4.0.0)
##  farver                 2.0.3    2020-01-16 [2] CRAN (R 4.0.0)
##  forcats              * 0.5.0    2020-03-01 [2] CRAN (R 4.0.0)
##  foreach                1.5.1    2020-10-15 [2] CRAN (R 4.0.2)
##  foreign                0.8-80   2020-05-24 [3] CRAN (R 4.0.2)
##  Formula                1.2-4    2020-10-16 [2] CRAN (R 4.0.2)
##  fs                     1.5.0    2020-07-31 [1] CRAN (R 4.0.2)
##  generics               0.1.0    2020-10-31 [1] CRAN (R 4.0.2)
##  GenomeInfoDb         * 1.24.2   2020-06-15 [2] Bioconductor  
##  GenomeInfoDbData       1.2.3    2020-05-18 [2] Bioconductor  
##  GenomicAlignments      1.24.0   2020-04-27 [2] Bioconductor  
##  GenomicFeatures        1.40.1   2020-07-08 [2] Bioconductor  
##  GenomicFiles           1.24.0   2020-04-27 [2] Bioconductor  
##  GenomicRanges        * 1.40.0   2020-04-27 [2] Bioconductor  
##  GEOquery               2.56.0   2020-04-27 [2] Bioconductor  
##  ggplot2              * 3.3.2    2020-06-19 [2] CRAN (R 4.0.2)
##  glue                   1.4.2    2020-08-27 [1] CRAN (R 4.0.2)
##  gridExtra              2.3      2017-09-09 [2] CRAN (R 4.0.0)
##  gtable                 0.3.0    2019-03-25 [2] CRAN (R 4.0.0)
##  haven                  2.3.1    2020-06-01 [2] CRAN (R 4.0.2)
##  Hmisc                  4.4-1    2020-08-10 [2] CRAN (R 4.0.2)
##  hms                    0.5.3    2020-01-08 [2] CRAN (R 4.0.0)
##  htmlTable              2.1.0    2020-09-16 [2] CRAN (R 4.0.2)
##  htmltools              0.5.0    2020-06-16 [2] CRAN (R 4.0.2)
##  htmlwidgets            1.5.2    2020-10-03 [2] CRAN (R 4.0.2)
##  httr                   1.4.2    2020-07-20 [2] CRAN (R 4.0.2)
##  IRanges              * 2.22.2   2020-05-21 [2] Bioconductor  
##  iterators              1.0.13   2020-10-15 [2] CRAN (R 4.0.2)
##  jpeg                   0.1-8.1  2019-10-24 [2] CRAN (R 4.0.0)
##  jsonlite               1.7.1    2020-09-07 [2] CRAN (R 4.0.2)
##  knitcitations        * 1.0.10   2019-09-15 [1] CRAN (R 4.0.2)
##  knitr                  1.30     2020-09-22 [1] CRAN (R 4.0.2)
##  labeling               0.4.2    2020-10-20 [2] CRAN (R 4.0.2)
##  lattice                0.20-41  2020-04-02 [3] CRAN (R 4.0.2)
##  latticeExtra           0.6-29   2019-12-19 [2] CRAN (R 4.0.0)
##  lifecycle              0.2.0    2020-03-06 [2] CRAN (R 4.0.0)
##  limma                  3.44.3   2020-06-12 [2] Bioconductor  
##  locfit                 1.5-9.4  2020-03-25 [2] CRAN (R 4.0.0)
##  lubridate              1.7.9    2020-06-08 [1] CRAN (R 4.0.0)
##  magick                 2.5.2    2020-11-10 [1] CRAN (R 4.0.2)
##  magrittr               1.5      2014-11-22 [2] CRAN (R 4.0.0)
##  Matrix                 1.2-18   2019-11-27 [3] CRAN (R 4.0.2)
##  matrixStats          * 0.57.0   2020-09-25 [2] CRAN (R 4.0.2)
##  memoise                1.1.0    2017-04-21 [2] CRAN (R 4.0.0)
##  modelr                 0.1.8    2020-05-19 [1] CRAN (R 4.0.0)
##  munsell                0.5.0    2018-06-12 [2] CRAN (R 4.0.0)
##  nnet                   7.3-14   2020-04-26 [3] CRAN (R 4.0.2)
##  openssl                1.4.3    2020-09-18 [2] CRAN (R 4.0.2)
##  pillar                 1.4.6    2020-07-10 [2] CRAN (R 4.0.2)
##  pkgbuild               1.1.0    2020-07-13 [2] CRAN (R 4.0.2)
##  pkgconfig              2.0.3    2019-09-22 [2] CRAN (R 4.0.0)
##  pkgload                1.1.0    2020-05-29 [2] CRAN (R 4.0.2)
##  plyr                   1.8.6    2020-03-03 [2] CRAN (R 4.0.0)
##  png                    0.1-7    2013-12-03 [2] CRAN (R 4.0.0)
##  prettyunits            1.1.1    2020-01-24 [2] CRAN (R 4.0.0)
##  processx               3.4.4    2020-09-03 [2] CRAN (R 4.0.2)
##  progress               1.2.2    2019-05-16 [2] CRAN (R 4.0.0)
##  ps                     1.4.0    2020-10-07 [2] CRAN (R 4.0.2)
##  purrr                * 0.3.4    2020-04-17 [2] CRAN (R 4.0.0)
##  qvalue                 2.20.0   2020-04-27 [2] Bioconductor  
##  R6                     2.5.0    2020-10-28 [1] CRAN (R 4.0.2)
##  rappdirs               0.3.1    2016-03-28 [2] CRAN (R 4.0.0)
##  RColorBrewer           1.1-2    2014-12-07 [2] CRAN (R 4.0.0)
##  Rcpp                   1.0.5    2020-07-06 [2] CRAN (R 4.0.2)
##  RCurl                  1.98-1.2 2020-04-18 [2] CRAN (R 4.0.0)
##  readr                * 1.4.0    2020-10-05 [2] CRAN (R 4.0.2)
##  readxl                 1.3.1    2019-03-13 [2] CRAN (R 4.0.0)
##  recount              * 1.14.0   2020-04-27 [2] Bioconductor  
##  RefManageR             1.2.12   2019-04-03 [1] CRAN (R 4.0.2)
##  remotes                2.2.0    2020-07-21 [2] CRAN (R 4.0.2)
##  rentrez                1.2.2    2019-05-02 [2] CRAN (R 4.0.0)
##  reprex                 0.3.0    2019-05-16 [1] CRAN (R 4.0.0)
##  reshape2               1.4.4    2020-04-09 [2] CRAN (R 4.0.0)
##  rlang                  0.4.8    2020-10-08 [1] CRAN (R 4.0.2)
##  rmarkdown            * 2.5      2020-10-21 [1] CRAN (R 4.0.2)
##  rngtools               1.5      2020-01-23 [2] CRAN (R 4.0.0)
##  rpart                  4.1-15   2019-04-12 [3] CRAN (R 4.0.2)
##  rprojroot              1.3-2    2018-01-03 [2] CRAN (R 4.0.0)
##  Rsamtools              2.4.0    2020-04-27 [2] Bioconductor  
##  RSQLite                2.2.1    2020-09-30 [2] CRAN (R 4.0.2)
##  rstudioapi             0.11     2020-02-07 [2] CRAN (R 4.0.0)
##  rtracklayer            1.48.0   2020-04-27 [2] Bioconductor  
##  rvest                  0.3.6    2020-07-25 [2] CRAN (R 4.0.2)
##  S4Vectors            * 0.26.1   2020-05-16 [2] Bioconductor  
##  scales                 1.1.1    2020-05-11 [2] CRAN (R 4.0.0)
##  sessioninfo          * 1.1.1    2018-11-05 [2] CRAN (R 4.0.0)
##  stringi                1.5.3    2020-09-09 [2] CRAN (R 4.0.2)
##  stringr              * 1.4.0    2019-02-10 [2] CRAN (R 4.0.0)
##  SummarizedExperiment * 1.18.2   2020-07-09 [2] Bioconductor  
##  survival               3.2-3    2020-06-13 [3] CRAN (R 4.0.2)
##  testthat               3.0.0    2020-10-31 [1] CRAN (R 4.0.2)
##  tibble               * 3.0.4    2020-10-12 [2] CRAN (R 4.0.2)
##  tidyr                * 1.1.2    2020-08-27 [2] CRAN (R 4.0.2)
##  tidyselect             1.1.0    2020-05-11 [2] CRAN (R 4.0.0)
##  tidyverse            * 1.3.0    2019-11-21 [1] CRAN (R 4.0.0)
##  usethis              * 1.6.3    2020-09-17 [2] CRAN (R 4.0.2)
##  VariantAnnotation      1.34.0   2020-04-27 [2] Bioconductor  
##  vctrs                  0.3.4    2020-08-29 [1] CRAN (R 4.0.2)
##  withr                  2.3.0    2020-09-22 [2] CRAN (R 4.0.2)
##  xfun                   0.19     2020-10-30 [1] CRAN (R 4.0.2)
##  XML                    3.99-0.5 2020-07-23 [2] CRAN (R 4.0.2)
##  xml2                   1.3.2    2020-04-23 [2] CRAN (R 4.0.0)
##  XVector                0.28.0   2020-04-27 [2] Bioconductor  
##  yaml                   2.2.1    2020-02-01 [2] CRAN (R 4.0.0)
##  zlibbioc               1.34.0   2020-04-27 [2] Bioconductor  
## 
## [1] /users/neagles/R/4.0
## [2] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/site-library
## [3] /jhpce/shared/jhpce/core/conda/miniconda3-4.6.14/envs/svnR-4.0/R/4.0/lib64/R/library

5 Bibliography

This document was generated using BiocStyle (Oleś, Morgan, and Huber, 2020) with knitr (Xie, 2014) and rmarkdown (Allaire, Xie, McPherson, Luraschi, et al., 2020) running behind the scenes.

Citations made with knitcitations (Boettiger, 2019) and the bibliographical file is available here.

Bibliography file

[1] J. Allaire, Y. Xie, J. McPherson, J. Luraschi, et al. rmarkdown: Dynamic Documents for R. R package version 2.5. 2020. <URL: https://github.com/rstudio/rmarkdown>.

[2] M. N. Bernstein, A. Doan, and C. N. Dewey. “MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive”. In: Bioinformatics 33.18 (May. 2017). Ed. by J. Wren, pp. 2914-2923. DOI: 10.1093/bioinformatics/btx334. <URL: https://doi.org/10.1093/bioinformatics/btx334>.

[3] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.10. 2019. <URL: https://CRAN.R-project.org/package=knitcitations>.

[4] G. Csárdi, R. core, H. Wickham, W. Chang, et al. sessioninfo: R Session Information. R package version 1.1.1. 2018. <URL: https://CRAN.R-project.org/package=sessioninfo>.

[5] S. E. Ellis, L. Collado-Torres, A. E. Jaffe, and J. T. Leek. “Improving the value of public RNA-seq expression data by phenotype prediction”. In: Nucl. Acids Res. (2018). DOI: 10.1093/nar/gky102. <URL: https://doi.org/10.1093/nar/gky102>.

[6] A. Oleś, M. Morgan, and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.16.1. 2020. <URL: https://github.com/Bioconductor/BiocStyle>.

[7] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2020. <URL: https://www.R-project.org/>.

[8] H. Wickham, M. Averick, J. Bryan, W. Chang, et al. “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43 (2019), p. 1686. DOI: 10.21105/joss.01686.

[9] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. <URL: http://www.crcpress.com/product/isbn/9781466561595>.