realign outputs
Overview of output structure
The SAW realign pipeline runs in a directory named by --id (or by --sn in the absence of --id). Output files are classified into several folders, in the outs/ directory.
The exact output files generated from the analysis depend on:
the version of SAW used
which pipeline was used,
SAW countorSAW realignwhether input the microscope image(s)
the specific parameters added to the analysis
Spatial Gene Expression
After performing SAW realign on Stereo-seq T FF and Stereo-seq N FFPE kits, the following files can be found under the outs/ directory:
bam/
Files in BAM format.
annotated_bam/
BAM file after alignment and annotation.
<SN>.*.bam
Indexed BAM file containing position-sorted reads mapped to CIDs, aligned to the genome, and annotated with GTF/GFF.
<SN>.*.bam.csi
Index for <SN>.*.bam.
feature_expression/
Feature expression matrices in HDF5 format at different dimensions.
<SN>.raw.gef
Feature expression matrix includes the whole information over a complete chip region. It only has bin1 expression counts.
<SN>.tissue.gef
Feature expression matrix under the tissue coverage region. It is also a visualization GEF which includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.cellbin.gef
Cellbin feature expression matrix records the information of cells individually, including the centroid coordinate, boundary coordinates, expression of genes, and cell area.
<SN>.adjusted.cellbin.gef
Cellbin expression matrix with cell border expanding, based on <SN>_<stain_type>_mask_edm_dis_<distance>.tif.
<SN>.merge.barcodeReadsCount.txt
A mapped CID list file with read counts for each CID, including three columns (x, y, count).
<SN>_raw_barcode_gene_exp.txt
An annotated list file with the information of coordinate, gene, MID, read counts, which is prepared to be a sampling file that performs sequence saturation.
analysis/
Secondary analysis files.
<SN>.bin20_1.0.h5ad & <SN>.bin50_1.0.h5ad
An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, based on <SN>.tissue.gef.
This output H5AD is named in the format of <SN>.<binN>_<leiden_res>.h5ad. In the file name, <SN> stands for the Stereo-seq chip serial number, <N> for bin size, and <leiden_res> for the resolution of Leiden clustering.
<SN>.bin20_1.0.marker_features.csv & <SN>.bin50_1.0.marker_features.csv
Format-integrated differential expression analysis results, using <SN>.tissue.gef of bin20 and bin50.
<SN>.cellbin_1.0.h5ad
An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.cellbin.gef.
<SN>.cellbin_1.0.marker_features.csv
Format-integrated differential expression analysis results, using <SN>.cellbin.gef.
<SN>.cellbin_1.0.adjusted.h5ad
An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.adjusted.cellbin.gef.
<SN>.cellbin_1.0.adjusted.marker_features.csv
Format-integrated differential expression analysis results, using <SN>.adjusted.cellbin.gef.
Expression-related data is from the last SAW count output directory, through --count-data parameter.
Spatial Protein Expression
feature_expression/
Feature expression matrices in HDF5 format at different dimensions.
<SN>.protein.raw.gef
Feature expression matrix includes the whole information over a complete chip region. It only has bin1 expression counts.
<SN>.protein.tissue.gef
Feature expression matrix under the tissue coverage region. It is also a visualization GEF that includes expression counts for bin1, 5, 10, 20, 50, 100, 150, 200.
<SN>.protein.cellbin.gef
Cellbin feature expression matrix records the information of cells individually, including the centroid coordinate, boundary coordinates, expression of genes, and cell area.
<SN>.protein.adjusted.cellbin.gef
Cellbin expression matrix with cell border expanding, based on <SN>_<stain_type>_mask_edm_dis_<distance>.tif.
<SN>.protein.tissue.rmbg.gem.gz
Feature expression matrix from automatic protein background removal. It shows bin1 expression counts.
<SN>_cid_pid_mid_reads.tsv
A list file with coordinate, PID, MID, and read counts, which is prepared to be a sampling file that performs sequence saturation for all proteins.
<SN>_valid_cid_reads.tsv
A mapped CID list file from all ADT FASTQs, with read counts for each CID, including three columns (x, y, count).
analysis/
Secondary analysis files.
<SN>.protein.bin20_0.1.h5ad & <SN>.protein.bin50_0.1.h5ad
An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, based on <SN>.protein.tissue.gef.
This output H5AD is named in the format of <SN>.protein.<binN>_<leiden_res>.h5ad. In the file name, <SN> stands for the Stereo-seq chip serial number, <N> for bin size, and <leiden_res> for the resolution of Leiden clustering.
<SN>.protein.cellbin_0.1.h5ad
An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.protein.cellbin.gef.
<SN>.protein.cellbin_0.1.adjusted.h5ad
An AnnData H5AD records preprocessing, filtering, normalization, dimensionality reduction, clustering and differential expression analysis, using <SN>.protein.adjusted.cellbin.gef.
Image
image/
Images are generated from automatic or manual workflows.
<SN>_<stainType>_regist.tif
The panoramic image after the registration with <SN>.raw.gef matrix.
<SN>_<stainType>_tissue_cut.tif
The tissue segmentation image, based on the aligned panoramic image.
<SN>_<stainType>_mask.tif
The cell segmentation image, based on the aligned panoramic image.
<SN>_<stainType>_mask_edm_dis_<distance>.tif
The adjusted image, based on the cell segmentation image
Report and Visualization
<SN>.report.html
Analysis summary report of metrics and plots in HTML format.
visualization.tar.gz
StereoMap visualization file to presentation and manual processing.
<SN>.stereo
A manifest file in JSON format includes experiment and pipeline information, basic analysis statistics, and references to image and spatial matrix files in the SAW output visualization file folder.
visualization.tar.gz
visualization.tar.gzThe compressed visualization TAR file integrates all the output results needed by StereoMap for visualization. The contents of an unpacked one are listed:
The compressed visualization TAR file from Stereo-CITE analysis:
.stereo
.stereo.stereo is a manifest file in JSON format that records
information about the task in SAW pipelines,
information of the tissue sample,
basic analysis statistics,
records of image files and expression data for StereoMap exploration.
*More details about these files can be found in other parts of Outputs.
Last updated