# BAM

A BAM file is in binary format for saving sequence alignment and gene annotation data. `SAW count` BAM adds custom tags in the BAM optional field to record read coordinates, CID and MID information. Annotation information is added to BAM in the tag field.&#x20;

## Tags

Custom tags are described in BAM custom tags.

<table><thead><tr><th width="141">Tag</th><th>Description</th></tr></thead><tbody><tr><td>Cx:i</td><td>x coordinate of the Coordiante ID.</td></tr><tr><td>Cy:i</td><td>y coordinate of the Coordiante ID.</td></tr><tr><td>UR:Z</td><td>The hexadecimal representation of uncorrected binary-encoded MID.</td></tr><tr><td>XF:i</td><td>Mapping region on the reference genome. Valid value: 0=EXONIC, 1=INTRONIC, 2=INTERGENIC, 3=rRNA.</td></tr><tr><td>GI:Z</td><td>Annotated gene ID.</td></tr><tr><td>GE:Z</td><td>Annotated gene name.</td></tr><tr><td>GS:Z</td><td>‘+’ or ‘-’, indicating forward/reverse strand respectively.</td></tr><tr><td>UB:Z</td><td>The hexadecimal representation of count corrected binary-encoded MID.</td></tr></tbody></table>

Example of the raw BAM:

```sh
E100026571L1C009R00301275185    16      1       3000095 255     26M121066N74M   *       0       0       GGCTTTTTTTTTTTTTTTTTTTTTTTTTTTCTAAATATTGGGTTTTATTAGCACCATGATAACTGTATATTAATTTGCACTGACTGTCATAACAAAATAC      G+:GFFGGFGFFGFFGFGGFFGFFFFFCFGFCFGGGFGGFGFFFFGGFGGFGFFFGGFFGFFFGFGFGFFGFFGFGFFFFGFFFFFFFFGGFFGGFFGEF    NH:i:1  HI:i:1    AS:i:88 nM:i:0  Cx:i:4826       Cy:i:11598      UR:Z:6FA29
```

Example of the annotated BAM:

```sh
E100026571L1C002R00703943265    1040    1       3082766 255     11M132671N89M   *       0       0       CTGCTGCAGCTTTTTTTTCTTTGAGATTTATTTTTATGCTATGTGTATGGGTATTTTGCCTGCATATATGTCTATGCACCATGTGTGTGCAGTGCTTGAG    FFFFFECGFDCFGDGDFEE@EEGIBFGGCGFFGACGFCGFFDGDGFFFFFFEGCDFCGFFGG@FFF=EFFDGGGGGFDGFFFGGGFGFFGGGFFGGGDFG    NH:i:1  HI:i:1  AS:i:88 nM:i:0  Cx:i:7767       Cy:i:18052      UR:Z:7AE49      XF:i:0  GI:Z:ENSMUSG00000051951 GE:Z:Xkr4       GS:Z:-  UB:Z:79E49
```

## Statistics for alignment

After alignment of FASTQ reads, a statistic file, recording details and output information will be saved in `/STEREO_ANALYSIS_WORKFLOW/ALIGNMENT/<lane>.CIDMap.stat`.

<table><thead><tr><th width="298">Metric</th><th>Description</th></tr></thead><tbody><tr><td>Number of CID in chip mask</td><td>Number of CIDs in the chip mask file</td></tr><tr><td>Number of unique CID in FASTQ</td><td>Number of unique CIDs in FASTQs</td></tr><tr><td>Number of total reads</td><td>Number of total reads in FASTQs</td></tr><tr><td>Q10 in CID %</td><td>Ratio of Q10 CID bases</td></tr><tr><td>Q20 in CID %</td><td>Ratio of Q20 CID bases</td></tr><tr><td>Q30 in CID %</td><td>Ratio of Q30 CID bases</td></tr><tr><td>Number of mapped CID</td><td>Number of reads mapped to CID</td></tr><tr><td>% of mapped CID</td><td>Ratio of reads mapped to CID</td></tr><tr><td>Number of exactly mapped CID</td><td>Number of reads exactly mapped to CID</td></tr><tr><td>% of exactly mapped CID</td><td>Ratio of reads exactly mapped to CID</td></tr><tr><td>Number of CID with mismatch</td><td>Number of reads mapped to CID with mismatch</td></tr><tr><td>% of CID with mismatch</td><td>Ratio of reads mapped to CID with mismatch</td></tr><tr><td>Q10 in RNA %</td><td>Ratio of Q10 RNA bases</td></tr><tr><td>Q20 in RNA %</td><td>Ratio of Q20 RNA bases</td></tr><tr><td>Q30 in RNA %</td><td>Ratio of Q30 RNA bases</td></tr><tr><td>Number of reads with polyA</td><td>Number of reads with polyA sequence</td></tr><tr><td>% of reads with polyA</td><td>Ratio of reads with polyA sequence</td></tr><tr><td>Number of short reads (trim polyA)</td><td>Number ot short reads after trimming polyA sequence</td></tr><tr><td>% of short reads (trim polyA)</td><td>Ration ot short reads after trimming polyA sequence</td></tr><tr><td>Number of reads with adapter</td><td>Number of reads with adapter sequence</td></tr><tr><td>% of reads with adapter</td><td>Ration of reads with adapter sequence</td></tr><tr><td>Number of short reads (trim adapter)</td><td>Number of short reads after trimming adapter sequence</td></tr><tr><td>% of short reads (trim adapter)</td><td>Ratio of short reads after trimming adapter sequence</td></tr><tr><td>Number of reads filtered with DNB</td><td>Number of reads with DNB sequence</td></tr><tr><td>% of reads filtered with DNB</td><td>Ratio of reads with DNB sequence</td></tr><tr><td>Q10 in clean RNA %</td><td>Ratio of Q10 RNA bases after filtering</td></tr><tr><td>Q20 in clean RNA %</td><td>Ratio of Q20 RNA bases after filtering</td></tr><tr><td>Q30 in clean RNA %</td><td>Ratio of Q30 RNA bases after filtering</td></tr><tr><td>Q10 in MID %</td><td>Ratio of Q10 MID bases</td></tr><tr><td>Q20 in MID %</td><td>Ratio of Q20 MID bases</td></tr><tr><td>Q30 in MID %</td><td>Ratio of Q30 MID bases</td></tr><tr><td>Number of low quality MID</td><td>Number of MID with low quality bases</td></tr><tr><td>% of low quality MID</td><td>Ratio of MID with low quality bases</td></tr><tr><td>Number of MID with N</td><td>Number of MID with N base</td></tr><tr><td>% of MID with N</td><td>Ratio of MID with N base</td></tr><tr><td>Number of MID in specific sequence</td><td>Number of MID mapped to specific sequences</td></tr><tr><td>% of MID with specific sequence</td><td>Ratio of MID mapped to specific sequences</td></tr><tr><td>Q10 in clean MID %</td><td>Ratio of Q10 MID bases after filtering</td></tr><tr><td>Q20 in clean MID %</td><td>Ratio of Q20 MID bases after filtering</td></tr><tr><td>Q30 in clean MID %</td><td>Ratio of Q30 MID bases after filtering</td></tr><tr><td>Number of exact MID</td><td>Number of reads exactly mapped to MID</td></tr><tr><td>% of exact MID</td><td>Ratio of reads exactly mapped to MID</td></tr><tr><td>Number of inexact MID</td><td>Number of reads inexactly mapped to MID</td></tr><tr><td>% of inexact MID</td><td>Ratio of reads inexactly mapped to MID</td></tr></tbody></table>

## Statistics for annotation

After annotation of reads, a statistic file, recording details and output information, will be saved in `/STEREO_ANALYSIS_WORKFLOW/ANNOTATION/*.bam.summary.stat`.

<table><thead><tr><th width="302">Metric</th><th>Description</th></tr></thead><tbody><tr><td>Number of total reads</td><td>Number for total reads aligned to genome</td></tr><tr><td>Number of reads to be annotated</td><td>Number of reads that will be annotated with GTF/GFF annotation database</td></tr><tr><td>% of reads to be annotated</td><td>% of reads that will be annotated with GTF/GFF annotation database</td></tr><tr><td>Number of uniquely mapped reads to be annotated</td><td>Number of reads to be annotated which are uniquely mapped to genome</td></tr><tr><td>% of uniquely mapped reads to be annotated</td><td>Ratio of reads to be annotated which are uniquely mapped to genome</td></tr><tr><td>Number of multi-mapped reads to be annotated</td><td>Number of reads to be annotated which are multi-mapped to genome</td></tr><tr><td>% of multi-mapped reads to be annotated</td><td>Ratio of reads to be annotated which are multi-mapped to genome</td></tr><tr><td>Number of multi-mapped reads</td><td>Number of reads multi-mapped to genome</td></tr><tr><td>Number of reads mapped to transcriptome</td><td>Number of reads mapped to transcriptome, including exon and intron regions.</td></tr><tr><td>% of reads mapped to transcriptome</td><td>% of reads mapped to transcriptome, including exonic and intronic regions.</td></tr><tr><td>Number of unique captures (on CID, gene and MID)</td><td>Number of unique captures for reads, based on CID, gene and MID information</td></tr><tr><td>% of unique captures (on CID, gene and MID)</td><td>% of unique captures for reads, based on CID, gene and MID information</td></tr><tr><td>Number of duplicated reads</td><td>Number of duplicated captures for reads, based on CID, gene and MID information</td></tr><tr><td>% of duplicated reads</td><td>% of duplicated captures for reads, based on CID, gene and MID information</td></tr><tr><td>Number of reads to be annotated</td><td>Number of reads that will be annotated with GTF/GFF annotation database</td></tr><tr><td>Number of reads mapped to exonic regions</td><td>Number of reads mapped to exonic regions</td></tr><tr><td>% of reads mapped to exonic regions</td><td>% of reads mapped to exonic regions</td></tr><tr><td>Number of reads mapped to intronic regions</td><td>Number of reads mapped to intronic regions</td></tr><tr><td>% of reads mapped to intronic regions</td><td>% of reads mapped to intronic regions</td></tr><tr><td>Number of reads mapped to intergenic regions</td><td>Number of reads mapped to intergenic regions</td></tr><tr><td>% of reads mapped to intergenic regions</td><td>% of reads mapped to intergenic regions</td></tr><tr><td>Number of reads mapped antisense to gene</td><td>Number of reads mapped antisense to gene</td></tr><tr><td>% of reads mapped antisense to gene</td><td>% of reads mapped antisense to gene</td></tr><tr><td>Number of reads mapped to rRNA</td><td>Numder of reads mapped to rRNA regions</td></tr><tr><td>Number of rRNA reads in uniquely mapped</td><td>Numder of uniquely mapped reads mapped to rRNA regions</td></tr><tr><td>% of rRNA reads in uniquely mapped</td><td>% of uniquely mapped reads mapped to rRNA regions</td></tr><tr><td>Number of rRNA reads in multi-mapped</td><td>Numder of multi-mapped reads mapped to rRNA regions</td></tr><tr><td>% of rRNA reads in multi-mapped reads</td><td>% of multi-mapped reads mapped to rRNA regions</td></tr></tbody></table>
