realign outputs

Overview of output structure

The SAW realign pipeline runs in a directory named by --id (or by --sn in the absence of --id). Output files are classified into several folders, in the outs/ directory.

The exact output files generated from the analysis depend on:

  • the version of SAW used

  • which pipeline was used, SAW count or SAW realign

  • whether input the microscope image(s)

  • the specific parameters added to the analysis

Spatial Gene Expression

After performing SAW realign on Stereo-seq T FF and Stereo-seq N FFPE kits, the following files can be found under the outs/ directory:

Directory/File Name
Description

bam/

Files in BAM format.

annotated_bam/

BAM file after alignment and annotation.

<SN>.*.bam

Indexed BAM file containing position-sorted reads mapped to CIDs, aligned to the genome, and annotated with GTF/GFF.

<SN>.*.bam.csi

Index for <SN>.*.bam.

image/

Images are generated from automatic or manual workflows.

<SN>_<stainType>_regist.tif

The panoramic image aligned with raw.gef matrix.

<SN>_<stainType>_tissue_cut.tif

The tissue segmentation image, based on the aligned panoramic image.

<SN>_<stainType>_mask.tif

The cell segmentation image, based on the aligned panoramic image.

<SN>_<stainType>_mask_edm_dis_<distance>.tif

The adjusted image, based on the cell segmentation image

feature_expression/

Feature expression matrices in HDF5 format at different dimensions.

<SN>.raw.gef

Feature expression matrix includes the whole information over a complete chip region. It only has bin1 expression counts.

<SN>.tissue.gef

Feature expression matrix under the tissue coverage region. It is also a visualization GEF which includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.

<SN>.cellbin.gef

Cellbin feature expression matrix records the information of cells individually, including the centroid coordinate, boundary coordinates, expression of genes, and cell area.

<SN>.adjusted.cellbin.gef

Cellbin expression matrix with cell border expanding, based on <SN>_<stain_type>_mask_edm_dis_<distance>.tif.

<SN>.merge.barcodeReadsCount.txt

A mapped CID list file with read counts for each CID, including three columns (x, y, count).

<SN>_raw_barcode_gene_exp.txt

An annotated list file with the information of coordinate, gene, MID, read counts, which is prepared to be a sampling file that performs sequence saturation.

analysis/

Secondary analysis files.

<SN>.bin200_1.0.h5ad

An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, based on <SN>.tissue.gef.

This output H5AD is named in the format of <SN>.<binN>_<leiden_res>.h5ad. In the file name, <SN> stands for the Stereo-seq chip serial number, <N> for bin size, and <leiden_res> for the resolution of Leiden clustering.

bin200_marker_features.csv

Format-integrated differential expression analysis results, using <SN>.tissue.gef of bin200.

<SN>.cellbin_1.0.h5ad

An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.cellbin.gef.

cellbin_marker_features.csv

Format-integrated differential expression analysis results, using <SN>.cellbin.gef.

<SN>.cellbin_1.0.adjusted.h5ad

An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.adjusted.cellbin.gef.

cellbin_adjusted_marker_features.csv

Format-integrated differential expression analysis results, using <SN>.adjusted.cellbin.gef.

<SN>.report.tar.gz

Analysis summary report of metrics and plots in HTML format.

report.html

HTML file, involved in <SN>.report.tar.gz.

visualization.tar.gz

StereoMap visualization file to presentation and manual processing.

<SN>.stereo

A manifest file in JSON format includes experiment and pipeline information, basic analysis statistics, and references to image and spatial matrix files in the SAW output visualization file folder.

circle-info

Expression-related data is from the last SAW count output directory, through --count-data parameter.

visualization.tar.gz

The compressed visualization TAR file integrates all the output results needed by StereoMap for visualization. The contents of an unpacked one are listed:

.stereo

.stereo is a manifest file in JSON format that records

  • information about the task in SAW pipelines,

  • information of the tissue sample,

  • basic analysis statistics,

  • records of image files and expression data for StereoMap exploration.

*More details about these files can be found in other parts of Outputs.

Last updated