RNA-seq Data Analysis Capability in KBase

Introduction

KBase offers a powerful suite of expression analysis tools. Starting with short reads, you can use the tool suite to assemble and quantify long transcripts, identify differentially expressed genes, cluster them and analyze them as functionally enriched modules. You can also compare the expression data with the flux when studying metabolic models in KBase and identify pathways where expression and flux agree or conflict.

Narrative Tutorials

You can copy these tutorials and re-run any of the steps (perhaps changing parameters or using your own data) in your KBase account.

  1. Arabidopsis RNA-seq Analysis using New Tuxedo Package
  2. Arabidopsis RNA-seq Analysis using Original Tuxedo Package
  3. Ecoli RNA-seq Analysis using New Tuxedo Package

Why Run RNA-seq and Expression Analysis in KBase

Whether you are a beginner or advanced user of RNA-seq, you will find that KBase’s expression analysis tool suite offers a number of advantages, including some unique to KBase:

  1. Modular (Plug-n-Play): We offer you the flexibility to pick and choose from an array of available Apps for a given step of the pipeline, while the pipeline works seamlessly end-to-end.
  2. Easy and powerful interface: The Apps are designed to be easily usable by beginners and have abstracted away most of the advanced options, though these are still available to advanced users.
  3. Extensible: Supported by standard data types, the tool suite is easy to extend by 3rd party developers who want to add new tools.
  4. Easy upload and download: You can easily upload reads, genomes and expression matrices from local and remote public sites. You can also download BAM, expression matrices and tool outputs for a number of intermediate steps.
  5. Unrestricted data and compute: You can store and analyze unrestricted amount of RNA-seq data in a reasonable time and free of cost
  6. Well documented: You can refer to a variety of documentation for apps, helpful Narrative tutorials and broader KBase documentation.
  7. Active support: You can contact us through the KBase Help Board to ask questions or share feedback.

Prerequisites for RNA-seq Analysis

We support the popular Tuxedo suite of tools (original and new) for RNA-seq analysis. As a result, KBase requires reference genome to guide the analysis of short reads. Here are the prerequisites:

  1. Import Genome: Use the public tab in Data Panel to choose the reference genome from KBase’s public data. If it’s not available, you can use the Import tab in Data Panel to import the genome of interest to your Narrative.
  2. Import Short Reads: Use the Import tab or any of the reads uploader apps from the Apps Panel to import the short reads from your experiment into your Narrative. Example reads are also available from the Public tab. The reads must be a set of single-end, paired-end or interleaved paired-end reads in FASTA, FASTQ or SRA format.
  3. Create Sample Set: Run the Create RNA-seq Sample Set app to group together your reads into an RNA-seq sample set with associated experimental metadata, so that you can easily and efficiently run the RNA-seq Apps in batch mode wherever appropriate.
  4. QC Sample Set: Run FASTQC to assess the read quality of the reads set from the previous step and if needed, run Trimmomatic, Cutadapt or PRINSEQ to pre-process or filter the reads before starting RNA-seq analysis.

RNA-seq Analysis

The RNA-seq pipeline in KBase is modular and consists of three steps. You can pick any of the multiple Apps available for a given step depending on your preference or individual characteristics of the App.

  1. Read Alignment: Run the BowTie app or the splice-aware TopHat2, HISAT2 or STAR apps to map short reads to the reference genome. The output is a set of BAM alignments.
  2. Transcriptome Assembly and Quantification: Run the Cufflinks or StringTie app on read alignments from previous step to generate and assemble full length transcripts and quantify transcripts and genes as appropriate.
  3. Differential Gene Expression: Run the Cuffdiff, Ballgown or DESeq2 app to generate gene or transcript level differential expression based on the quantification from previous step.

Downstream Expression Analysis

KBase offers a number of Apps to filter, cluster, visualize and functionally enrich the feature sets based on differential expression derived from RNA-seq analysis. Also, the expression data from RNA-seq can be assimilated into metabolic models to identify pathways where expression and flux agree or conflict.

  1. Filtering: You can create a filtered expression matrix and associated feature set based on fold-change or adjusted p-value. You can also filter an expression matrix based on LOR or ANOVA.
  2. Clustering: Depending on your preference, run the Hierarchical, K-Means or WGCNA clustering App to group features into clusters based on gene expression. You can also visualize the clusters as an interactive heatmap.
  3. Functional Enrichment: Assess the functional enrichment for a set of features using associated GO terms.
  4. Integration into Metabolic Models: Assimilate the expression data from RNA-seq into the metabolic models to compare reaction fluxes with gene expression and thus identify pathways where expression and flux agree or conflict.

The RNA-seq analysis suite is still in development. Please report any issues or suggestions to us.