Obtain rRNA information

rRNA (ribosomal RNA)

What is rRNA?

rRNA (ribosomal RNA) is the primary component of ribosomes, the molecular machines responsible for cell protein synthesis. rRNA interacts with mRNA and tRNA to catalyze peptide bond formation, playing a crucial role in the process of translation. rRNA can be classified into the following types based on function and location:

  • 5S rRNA: found in the large ribosomal subunit, primarily involved in maintaining ribosome structure.

  • 16S rRNA (prokaryotes) / 18S rRNA (eukaryotes): found in the small ribosomal subunit, responsible for mRNA recognition and translation initiation.

  • 23S rRNA (prokaryotes) / 28S rRNA (eukaryotes): found in the large ribosomal subunit, involved in peptide bond formation and translation elongation.

  • 5.8S rRNA (eukaryotes): found in the large ribosomal subunit, working with 28S and 5S rRNA to maintain ribosome function.

Presence in RNA sequencing experiments

In RNA sequencing (RNA-seq) experiments, rRNA is present mainly due to the following reasons:

  • High abundance of rRNA: rRNA constitutes 80%-90% of total cellular RNA, making it the most abundant RNA type.

  • Non-specific capture in experimental steps: during RNA extraction and library preparation, rRNA may be non-specifically captured and included in the sequencing library.

  • No/Incomplete rRNA removal: without the use of specific rRNA removal kits, or even such kits are used, some rRNA may still remain.

rRNA is the most abundant RNA type in cells and is inevitably present in RNA-seq experiments. So that its sequences occupy a significant portion of the sequencing data. Using rRNA removal kits during these experiments to remove rRNA can reduce sequencing depth requirements, thereby lowering costs.

rRNA sequences do not contain information about target gene expression and may interfere with the quantification of target gene expression and differential expression analysis. To enhance the effective utilization of sequencing data and improve the accuracy of data analysis, it is necessary to remove rRNA during both experimental and computational steps.

RNAcentralarrow-up-right is a comprehensive non-coding RNA (ncRNA) database developed by the European Bioinformatics Institute (EBI). It integrates ncRNA data from multiple expert databases (e.g., Ensembl, GENCODE, miRBase, Rfam) to provide a unified reference platform for ncRNA research

Search rRNA information

circle-info

The following three search methods are provided on the homepage:

  • "Text search" searches the RNA sequences based on the provided keywords.

  • "Sequence search" aligns the input unknown fragments with databases to retrieve specific RNA information.

  • "Genome browser" provides a genome browser, where analysts can select a species, specify a chromosome location, and view the distribution of genes and sequences within a target interval.

"Text search" is recommended for rRNA information. When you have some details about the name, species, tissue type, sequence length, RNA type (such as 5S, 18S, etc.) or other text information of the target rRNA, type them into the search window. In summary, select the appropriate qualifiers based on your analysis requirements.

When searching for 18S rRNA of homo sapiens (human), several rRNA records will be displayed. The database from which the RNA is sourced is indicated below each search record. Download the needed rRNA information in FASTA file format.

Search in RNAcentral
circle-exclamation

Compiled rRNA index files

For easy use, the STOmics R&D team has compiled common rRNA information for Homo sapiens (human) and Mus musculus (mouse). You can directly download the STAR and Bowtie2 index files, which include rRNA information, from our datasets.

File size: 28.03GB md5sum: 6fa47b14dc26321d1cab691baee4fb2f

File size: 31.47GB md5sum: a86ceda324fa300d18f48b77502e5274

Remove rRNA

circle-check

Add rRNA information to reference

Use --rRNA-FASTA to mark the input rRNA information specifically, which will be added to --fasta after redundancy removal.

circle-info

Key steps of the processing:

Step 1: given the rRNA fragments of --rRNA-fasta are short and highly repetitive so that the pipeline will remove their redundancy first.

Step 2: add rRNA information to --fasta file(s), with the suffix '_rRNA' on the chromosome, like '1_rRNA', to distinguish rRNA ones from the basic genome.

Step 3: build index files using the genome integrated with de-duplicated rRNA information.

Run count analysis

Let's take a simple analysis of FFPE data as an example:

Last updated