# Obtain rRNA information ## rRNA (**ribosomal RNA**) ### **What is rRNA?** rRNA (ribosomal RNA) is the primary component of ribosomes, the molecular machines responsible for cell protein synthesis. rRNA interacts with mRNA and tRNA to catalyze peptide bond formation, playing a crucial role in the process of translation. rRNA can be classified into the following types based on function and location: * **5S rRNA**: found in the large ribosomal subunit, primarily involved in maintaining ribosome structure. * **16S rRNA (prokaryotes) / 18S rRNA (eukaryotes)**: found in the small ribosomal subunit, responsible for mRNA recognition and translation initiation. * **23S rRNA (prokaryotes) / 28S rRNA (eukaryotes)**: found in the large ribosomal subunit, involved in peptide bond formation and translation elongation. * **5.8S rRNA (eukaryotes)**: found in the large ribosomal subunit, working with 28S and 5S rRNA to maintain ribosome function. ### **Presence in RNA** sequencing **experiments** In RNA sequencing (RNA-seq) experiments, rRNA is present mainly due to the following reasons: * **High abundance of rRNA**: rRNA constitutes 80%-90% of total cellular RNA, making it the most abundant RNA type. * **Non-specific capture in experimental steps**: during RNA extraction and library preparation, rRNA may be non-specifically captured and included in the sequencing library. * **No/Incomplete rRNA removal**: without the use of specific rRNA removal kits, or even such kits are used, some rRNA may still remain. rRNA is the most abundant RNA type in cells and is inevitably present in RNA-seq experiments. So that its sequences occupy a significant portion of the sequencing data. Using rRNA removal kits during these experiments to remove rRNA can reduce sequencing depth requirements, thereby lowering costs. rRNA sequences do not contain information about target gene expression and may interfere with the quantification of target gene expression and differential expression analysis. To enhance the effective utilization of sequencing data and improve the accuracy of data analysis, it is necessary to remove rRNA during both experimental and computational steps. ## Obtain from [RNAcentral](https://rnacentral.org/) [RNAcentral](https://rnacentral.org/) is a comprehensive non-coding RNA (ncRNA) database developed by the European Bioinformatics Institute (EBI). It integrates ncRNA data from multiple expert databases (e.g., Ensembl, GENCODE, miRBase, Rfam) to provide a unified reference platform for ncRNA research ### **Search rRNA information** {% hint style="info" %} The following three search methods are provided on the homepage： * "Text search" searches the RNA sequences based on the provided keywords. * "Sequence search" aligns the input unknown fragments with databases to retrieve specific RNA information. * "Genome browser" provides a genome browser, where analysts can select a species, specify a chromosome location, and view the distribution of genes and sequences within a target interval. {% endhint %} "Text search" is recommended for rRNA information. When you have some details about the name, species, tissue type, sequence length, RNA type (such as 5S, 18S, etc.) or other text information of the target rRNA, type them into the search window. In summary, select the appropriate qualifiers based on your analysis requirements. When searching for 18S rRNA of homo sapiens (human), several rRNA records will be displayed. The database from which the RNA is sourced is indicated below each search record. Download the needed rRNA information in FASTA file format.

{% hint style="warning" %} A downloaded rRNA-related FASTA is compressed as `*.fasta.gz`. Remember to gunzip the file first. {% endhint %} ### Compiled rRNA index files For easy use, the STOmics R\&D team has compiled common rRNA information for Homo sapiens (human) and Mus musculus (mouse). You can directly download the STAR and Bowtie2 index files, which include rRNA information, from [our datasets](/saw-user-manual-v8.2/download-center.md#reference-download).


reference-data-mouse-rRNA.tar.gz
File size: 28.03GB md5sum: 6fa47b14dc26321d1cab691baee4fb2f

{% tabs %} {% tab title="wget" %}

wget -c https://demo.stomicsdb.tech/STOmics_Reference_Released/Transcriptome/reference-data-mouse-rRNA.tar.gz

{% endtab %} {% tab title="curl" %} {% code overflow="wrap" %} ```shellscript curl -C - -O https://demo.stomicsdb.tech/STOmics_Reference_Released/Transcriptome/reference-data-mouse-rRNA.tar.gz ``` {% endcode %} {% endtab %} {% endtabs %}


reference-data-human-rRNA.tar.gz
File size: 31.47GB md5sum: a86ceda324fa300d18f48b77502e5274

{% tabs %} {% tab title="wget" %} {% code overflow="wrap" %} ```sh wget -c https://demo.stomicsdb.tech/STOmics_Reference_Released/Transcriptome/reference-data-human-rRNA.tar.gz ``` {% endcode %} {% endtab %} {% tab title="curl" %} {% code overflow="wrap" %} ```shellscript curl -C - -O https://demo.stomicsdb.tech/STOmics_Reference_Released/Transcriptome/reference-data-human-rRNA.tar.gz ``` {% endcode %} {% endtab %} {% endtabs %} ## Remove rRNA {% hint style="success" %} If you plan to remove rRNA fragments during SAW analysis, make sure of the following settings: * having added specific rRNA information to the transcriptomic reference. * using`--rRNA-remove` parameter to start `SAW count` analysis. {% endhint %} ### Add rRNA information to reference Use `--rRNA-FASTA` to mark the input rRNA information specifically, which will be added to `--fasta` after redundancy removal. {% hint style="info" %} Key steps of the processing: **Step 1:** given the rRNA fragments of `--rRNA-fasta` are short and highly repetitive so that the pipeline will remove their redundancy first. **Step 2:** add rRNA information to `--fasta` file(s), with the suffix '\_rRNA' on the chromosome, like '1\_rRNA', to distinguish rRNA ones from the basic genome. **Step 3:** build index files using the genome integrated with de-duplicated rRNA information. {% endhint %}

cd /saw/datasets/reference

saw makeRef \
    --mode=STAR \
    --fasta=/path/to/FASTA \
    --rRNA-fasta=/path/to/rRNA/FASTA \
    --gtf=/path/to/GTF/or/GFF \
    --genome=./transcriptome_with_rRNA

### Run count analysis Let's take a simple analysis of FFPE data as an example:

cd /saw/runs

saw count \
    --id=rRNA_removal \
    --sn=<SN> \
    --omics=transcriptomics \
    --kit-version="Stereo-seq N FFPE V1.1" \
    --sequencing-type="PE75_50+100" \
    --chip-mask=/path/to/chip/mask \
    --fastqs=/path/to/fastq/folders \
    --image-tar=/path/to/image/tar \
    --reference=/path/to/reference/transcriptome_with_rRNA \
    --rRNA-remove

--- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://stereotoolss-organization.gitbook.io/saw-user-manual-v8.2/tutorials/preparation-of-reference/obtain-rrna-information.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.