BAM

A BAM file is in binary format for saving sequence alignment and gene annotation data. SAW count BAM adds custom tags in the BAM optional field to record read coordinates, CID and MID information. Annotation information is added to BAM in the tag field.

Tags

Custom tags are described in BAM custom tags.

Tag
Description

Cx:i

x coordinate of the Coordiante ID.

Cy:i

y coordinate of the Coordiante ID.

UR:Z

The hexadecimal representation of uncorrected binary-encoded MID.

XF:i

Mapping region on the reference genome. Valid value: 0=EXONIC, 1=INTRONIC, 2=INTERGENIC, 3=rRNA.

GI:Z

Annotated gene ID.

GE:Z

Annotated gene name.

GS:Z

‘+’ or ‘-’, indicating forward/reverse strand respectively.

UB:Z

The hexadecimal representation of count corrected binary-encoded MID.

Example of the raw BAM:

E100026571L1C009R00301275185    16      1       3000095 255     26M121066N74M   *       0       0       GGCTTTTTTTTTTTTTTTTTTTTTTTTTTTCTAAATATTGGGTTTTATTAGCACCATGATAACTGTATATTAATTTGCACTGACTGTCATAACAAAATAC      G+:GFFGGFGFFGFFGFGGFFGFFFFFCFGFCFGGGFGGFGFFFFGGFGGFGFFFGGFFGFFFGFGFGFFGFFGFGFFFFGFFFFFFFFGGFFGGFFGEF    NH:i:1  HI:i:1    AS:i:88 nM:i:0  Cx:i:4826       Cy:i:11598      UR:Z:6FA29

Example of the annotated BAM:

E100026571L1C002R00703943265    1040    1       3082766 255     11M132671N89M   *       0       0       CTGCTGCAGCTTTTTTTTCTTTGAGATTTATTTTTATGCTATGTGTATGGGTATTTTGCCTGCATATATGTCTATGCACCATGTGTGTGCAGTGCTTGAG    FFFFFECGFDCFGDGDFEE@EEGIBFGGCGFFGACGFCGFFDGDGFFFFFFEGCDFCGFFGG@FFF=EFFDGGGGGFDGFFFGGGFGFFGGGFFGGGDFG    NH:i:1  HI:i:1  AS:i:88 nM:i:0  Cx:i:7767       Cy:i:18052      UR:Z:7AE49      XF:i:0  GI:Z:ENSMUSG00000051951 GE:Z:Xkr4       GS:Z:-  UB:Z:79E49

Statistics for alignment

After alignment of FASTQ reads, a statistic file, recording details and output information will be saved in /STEREO_ANALYSIS_WORKFLOW/ALIGNMENT/<lane>.CIDMap.stat.

Metric
Description

Number of CID in chip mask

Number of CIDs in the chip mask file

Number of unique CID in FASTQ

Number of unique CIDs in FASTQs

Number of total reads

Number of total reads in FASTQs

Q10 in CID %

Ratio of Q10 CID bases

Q20 in CID %

Ratio of Q20 CID bases

Q30 in CID %

Ratio of Q30 CID bases

Number of mapped CID

Number of reads mapped to CID

% of mapped CID

Ratio of reads mapped to CID

Number of exactly mapped CID

Number of reads exactly mapped to CID

% of exactly mapped CID

Ratio of reads exactly mapped to CID

Number of CID with mismatch

Number of reads mapped to CID with mismatch

% of CID with mismatch

Ratio of reads mapped to CID with mismatch

Q10 in RNA %

Ratio of Q10 RNA bases

Q20 in RNA %

Ratio of Q20 RNA bases

Q30 in RNA %

Ratio of Q30 RNA bases

Number of reads with polyA

Number of reads with polyA sequence

% of reads with polyA

Ratio of reads with polyA sequence

Number of short reads (trim polyA)

Number ot short reads after trimming polyA sequence

% of short reads (trim polyA)

Ration ot short reads after trimming polyA sequence

Number of reads with adapter

Number of reads with adapter sequence

% of reads with adapter

Ration of reads with adapter sequence

Number of short reads (trim adapter)

Number of short reads after trimming adapter sequence

% of short reads (trim adapter)

Ratio of short reads after trimming adapter sequence

Number of reads filtered with DNB

Number of reads with DNB sequence

% of reads filtered with DNB

Ratio of reads with DNB sequence

Q10 in clean RNA %

Ratio of Q10 RNA bases after filtering

Q20 in clean RNA %

Ratio of Q20 RNA bases after filtering

Q30 in clean RNA %

Ratio of Q30 RNA bases after filtering

Q10 in MID %

Ratio of Q10 MID bases

Q20 in MID %

Ratio of Q20 MID bases

Q30 in MID %

Ratio of Q30 MID bases

Number of low quality MID

Number of MID with low quality bases

% of low quality MID

Ratio of MID with low quality bases

Number of MID with N

Number of MID with N base

% of MID with N

Ratio of MID with N base

Number of MID in specific sequence

Number of MID mapped to specific sequences

% of MID with specific sequence

Ratio of MID mapped to specific sequences

Q10 in clean MID %

Ratio of Q10 MID bases after filtering

Q20 in clean MID %

Ratio of Q20 MID bases after filtering

Q30 in clean MID %

Ratio of Q30 MID bases after filtering

Number of exact MID

Number of reads exactly mapped to MID

% of exact MID

Ratio of reads exactly mapped to MID

Number of inexact MID

Number of reads inexactly mapped to MID

% of inexact MID

Ratio of reads inexactly mapped to MID

Statistics for annotation

After annotation of reads, a statistic file, recording details and output information, will be saved in /STEREO_ANALYSIS_WORKFLOW/ANNOTATION/*.bam.summary.stat.

Metric
Description

Number of total reads

Number for total reads aligned to genome

Number of reads to be annotated

Number of reads that will be annotated with GTF/GFF annotation database

% of reads to be annotated

% of reads that will be annotated with GTF/GFF annotation database

Number of uniquely mapped reads to be annotated

Number of reads to be annotated which are uniquely mapped to genome

% of uniquely mapped reads to be annotated

Ratio of reads to be annotated which are uniquely mapped to genome

Number of multi-mapped reads to be annotated

Number of reads to be annotated which are multi-mapped to genome

% of multi-mapped reads to be annotated

Ratio of reads to be annotated which are multi-mapped to genome

Number of multi-mapped reads

Number of reads multi-mapped to genome

Number of reads mapped to transcriptome

Number of reads mapped to transcriptome, including exon and intron regions.

% of reads mapped to transcriptome

% of reads mapped to transcriptome, including exonic and intronic regions.

Number of unique captures (on CID, gene and MID)

Number of unique captures for reads, based on CID, gene and MID information

% of unique captures (on CID, gene and MID)

% of unique captures for reads, based on CID, gene and MID information

Number of duplicated reads

Number of duplicated captures for reads, based on CID, gene and MID information

% of duplicated reads

% of duplicated captures for reads, based on CID, gene and MID information

Number of reads to be annotated

Number of reads that will be annotated with GTF/GFF annotation database

Number of reads mapped to exonic regions

Number of reads mapped to exonic regions

% of reads mapped to exonic regions

% of reads mapped to exonic regions

Number of reads mapped to intronic regions

Number of reads mapped to intronic regions

% of reads mapped to intronic regions

% of reads mapped to intronic regions

Number of reads mapped to intergenic regions

Number of reads mapped to intergenic regions

% of reads mapped to intergenic regions

% of reads mapped to intergenic regions

Number of reads mapped antisense to gene

Number of reads mapped antisense to gene

% of reads mapped antisense to gene

% of reads mapped antisense to gene

Number of reads mapped to rRNA

Numder of reads mapped to rRNA regions

Number of rRNA reads in uniquely mapped

Numder of uniquely mapped reads mapped to rRNA regions

% of rRNA reads in uniquely mapped

% of uniquely mapped reads mapped to rRNA regions

Number of rRNA reads in multi-mapped

Numder of multi-mapped reads mapped to rRNA regions

% of rRNA reads in multi-mapped reads

% of multi-mapped reads mapped to rRNA regions

Last updated