Secondary analysis

In this tutorial, you will learn how to run the complementary pipeline SAW reanalyze for

cluster Clustering
lasso Extract feature expression matrices of interest regions.
diffExp differential expression analysis
multiomics Proteome & Transcriptome joint analysis
midFilter Performe manually filtering spatial expression matrices by MID range.
removeBackground Automatic protein background removal.

Clustering

In bioinformatic downstream analysis, clustering is a critical and fundamental method that creates groups of spatial expression points with similar characteristics. The process helps uncover the underlying structure and patterns within the expression data. Clustering plays a crucial role in bioinformatics research because of its versatility in finding gene expression patterns, investigating cell types, and studying disease subtypes.

Choose an appropriate bin size for your datasets. For example, the diameter of a mammalian cell is about 10 µm, based on the physical spacing of a pair of DNBs being 500 nm, so it can be roughly estimated that bin20 is a suitable starting point.

Leiden algorithm is called for clustering, --Leiden-resolution , default to 1.0, controls the coarseness of the clustering when performing Leiden. Higher values lead to more clusters.

Differential expression analysis can be performed through --marker based on the clusters categorized by Leiden algorithm.

You can perform clustering with a bin GEF and set up the command as below:

saw reanalyze cluster \
    --gef=/path/to/input/GEF \
    --bin-size=20 \
    --Leiden-resolution=1.0 \
    --marker \
    --output=/path/to/output/clustering

Clustering outputs based on bin GEF are listed:

clustering
├── <SN>.bin20_1.0.h5ad  ##<SN>.<bin_size>_<resolution>.h5ad, containing analysis results
├── find_marker_genes.csv  ##original output CSV
└── bin20_marker_features.csv  ##formatted CSV for visualization in StereoMap

If turn --marker on to the analysis, you will get results related to differential expression analysis, namely find_marker_genes.csv and <bin_size>_marker_features.csv.

find_marker_genes.csv is the original output file.
<bin_size>_marker_features.csv. is a formatted CSV that records mean MID counts, L2FC, adjusted p-value, and expression ratio of marker features for each cluster.

Or begin with a cellbin GEF:

saw reanalyze cluster \
    --cellbin-gef=/path/to/input/cellbin/GEF \
    --Leiden-resolution=1.0 \
    --marker \
    --output=/path/to/output/clustering

Clustering outputs based on cellbin GEF are listed:

clustering
├── <SN>.cellbin.gef  ##a copy of input cellbin GEF but with new clustering information
├── <SN>.cellbin_1.0.h5ad  ##<SN>.cellbin_<resolution>.h5ad, containing analysis results
├── find_marker_genes.csv  ##original output CSV
└── cellbin_marker_features.csv  ##formatted CSV for visualization in StereoMap

Lasso

The interactive tool in StereoMap can manually delineate closed regions of interest. It needs SAW reanalyze lasso to extract feature expression matrices of regions, using the GeoJSON from StereoMap.

Run the pipeline for lasso, and set up the command as below:

saw reanalyze lasso \
    --gef=/path/to/input/GEF \
    --lasso-geojson=/path/to/lasso/GeoJSON \
    --bin-size=1,20,50 \
    --output=/path/to/output/lasso

--bin-size parameter can accept a list of INTs to generate expression matrices with multiple bin sizes at once.

Lasso outputs based on bin GEF are listed:

lasso
├── <label1>
│       ├── SN.<label1>.label.gef  ##lasso GEF of bin1
│       └── segmentation
│              ├── SN.lasso.<bin_size_list[0]>.<label1>.gem.gz  ##GEM of lasso area of different bin sizes
│              ...
│              ├── SN.lasso.<bin_size_list[n]>.<label1>.gem.gz
│              └── SN.lasso.<label1>.mask.tif  ##mask image of lasso area
└── <label2>
       ├── ...
       └── ...

Or begin with:

saw reanalyze lasso \
    --cellbin-gef=/path/to/input/cellbin/GEF \
    --lasso-geojson=/path/to/lasso/GeoJSON \
    --output=/path/to/output/lasso

Lasso outputs based on cellbin GEF are listed:

lasso
├── <label1>
│       └── SN.<label1>.label.cellbin.gef  ##cellbin GEF of lasso area
└── <label2>
        └── ...

Differential expression analysis

SAW reanalyze can perform differential expression analysis based on both clustering and lasso areas, using the diffexp GeoJSON file from StereoMap.

Selected clusters and lasso regions are recorded in the diffexp GeoJSON.

Perform the analysis simply:

saw reanalyze diffExp \
    --count-data=/path/to/previous/SAW/count/result/folder/id \
    --diffexp-geojson=/path/to/StereoMap/diffexp/GeoJSON \
    --output=/path/to/output/differential_expression

--count-data accepts an output directory of the last SAW count, SAW reanalyze will detect all files, needed for differential expression analysis. Related information is recorded in the *.diffexp.geojson.

Differential expression analysis outputs are listed:

differential_expression
├── <SN>.<bin_size>_1.0.h5ad  ##H5ad containing analysis results
├── find_marker_genes.csv  ##original output CSV
└── <bin_size>_marker_features.csv  ##formatted CSV for visualization in StereoMap

Or:

differential_expression
├── <SN>.cellbin_1.0.h5ad  ##H5ad for cellbin containing analysis results
├── find_marker_genes.csv  ##original output CSV
└── cellbin_marker_features.csv  ##formatted CSV for visualization in StereoMap

Proteome & Transcriptome joint analysis

SAW multiomics can integrate RNA and protein data and compute the latent space by Total Variational Inference. Perform clustering analysis for latent space and do one-vs-all differential expression analysis to find marker genes and proteins.

You can perform joint analysis with gene and protein bin GEF and set up the command as below:

saw reanalyze multiomics \
    --gef=/path/to/input/gene/GEF,/path/to/input/protein/GEF \
    --protein-panel=/path/to/ProteinPanel.list \
    --bin-size=50 \
    --output=/path/to/output/joint_analysis

Or begin with gene and protein cellbin GEF:

saw reanalyze multiomics \
    --cellbin-gef=/path/to/input/gene/cellbin/GEF,/path/to/input/protein/cellbin/GEF \
    --protein-panel=/path/to/ProteinPanel.list \
    --output=/path/to/output/joint_analysis

--gpu-id <NUM> is available for computing accelaration.

Find the corresponding protein panel used in SAW count. You can also use --ref-libraries <CSV> instead of --protein-panel <PANEL>.

Joint analysis outputs are listed:

joint_analysis
├── <SN>.<bin_size>.differential_expression.csv ##original outoput CSV containing differential expression results
└── <SN>.<bin_size>.h5mu ##mutimodal data containing clustering results

Or:

joint_analysis
├── <SN>.cellbin.differential_expression.csv ##original outoput CSV containing differential expression results
└── <SN>.cellbin.h5mu ##mutimodal data containing clustering results

MID filtering

The interactive tool in StereoMap can manually set MID range.

saw reanalyze midFilter \
    --gef=/path/to/input/GEF \
    --mid-json=/path/to/MID/filtering/JSON \
    --output=/path/to/output/mid_filtering

MID filtering outputs are listed:

mid_filtering
└── <SN>.filter.gef ##common GEF filtered by MID range

Or:

mid_filtering
└── <SN>.protein.filter.gef ##protein GEF filtered by MID range

Automatic protein background removal

A method for automatically removing non-specific binding protein signals. Find more algorithm details in Proteom background removal.

saw reanalyze removeBackground \
    --gef=/path/to/output/input/protein/GEF \
    --bin-size=50 \
    --protein-panel=/path/to/ProteinPanel.list \
    --output=/path/to/output/removeBackground

Find the corresponding protein panel used in SAW count. You can also use --ref-libraries <CSV> instead of --protein-panel <PANEL>.

removeBackground outputs based on protein bin GEF are listed:

removeBackground
└── A03684D4.protein.tissue.rmbg.gem.gz ##protein expression matrix after background removal

PreviousWith manually processed files NextFormat conversion

Last updated 1 year ago

hashtagClustering

hashtagLasso

hashtagDifferential expression analysis

hashtagProteome & Transcriptome joint analysis

hashtagMID filtering

hashtagAutomatic protein background removal