Format conversion

This tutorial will show how to implement basic format conversions using the complementary pipeline SAW convert. To make this utility more straightforward and concise, several pipelines have been created under SAW convert. The sub-pipeline is usually named as "A2B", which signifies the switching from A-form to B-form (or from A-dimension to B-dimension).

Select the one you need for format conversion.

circle-check

gef2gem

Conversion from a bin GEF to a bin GEM.

Bin GEM is a type of text file that primarily contains gene information, spatial coordinates, and MID counts. The feature expression recorded in a GEM file has only one type of bin size, so you have to set --bin-size for the conversion.

saw convert gef2gem \
    --gef=/path/to/input/bin/GEF \
    --bin-size=1 \
    --gem=/path/to/output/bin/GEM
    
##example test
saw convert gef2gem \
    --gef=./C04144D5.tissue.gef \
    --bin-size=20 \
    --gem=./C04144D5.bin20.tissue.gem

Conversion from a cellbin GEF to a cellbin GEM.

circle-exclamation

gem2gef

Conversion from a bin GEM to a bin GEF.

circle-info

If your input a GEM is of bin1, the output GEF will be a visualization GEF that includes expression counts of [bin1, 5, 10, 20, 50, 100, 150, 200].

If your input GEM is not of bin1, the output GEF will contain the expression counts of that specific bin size.

Conversion from a cellbin GEM to a cellbin GEF.

bin2tissue

Extract tissue-coverage expression information from a raw bin GEF.

circle-info

The tissue segmentation mask is essential for defining the tissue boundaries of a sample, enabling the generation of an expression matrix at the tissue dimension.

If microscope images were not captured during the experimental process, this sub-module could still be applied directly to extract tissue segmentation results based on the transcriptomic expression matrix.

Under the output directory, there are a bin1_img_tissue_cut.tif of tissue segmentation and <SN>.tissue.gef.

bin2cell

Extract cellbin expression information from a raw bin GEF.

circle-info

A cell segmentation mask is used to delineate the boundaries of individual cells, which is then utilized to generate an expression matrix at the cell dimension.

circle-check

visualization

Conversion from a raw GEF to a visualization GEF.

A raw GEF typically records the spatial expression matrix of bin1 in a sparse matrix format to reduce file size. Due to the 500nm high precision of Stereo-seq chip, the amout of expression matrix data generated is too large. Unless necessary, the software only retains the bin1 dimension data in a shuffle matrix. For GEF to be visualized in StereoMap, it is necessary to have comprehensive information on various bin sizes, usually with a bin list of [ 1, 5, 10, 20, 50, 100, 150, 200].

gef2h5ad

Conversion from a bin GEF to an AnnData H5AD.

circle-info

AnnData H5ADarrow-up-right is a widely used data format for downstream analysis. And AnnData package version >= 0.8.0.

Conversion from a cellbin GEF to an AnnData H5AD.

gem2h5ad

Conversion from a bin GEM to an AnnData H5AD.

Conversion from a cellbin GEM to an AnnData H5AD.

gef2rds

Conversion from a bin GEF to a RDS file.

circle-info

The RDS file format is a serialized data structure that saves and loads Seuratarrow-up-right objects in R.

Conversion from a cellbin GEF to a RDS file, for analysis in Seurat

gem2rds

Conversion from a bin GEM to a RDS file.

Conversion from a cellbin GEM to a RDS file, for analysis in Seurat.

h5ad2rds

Conversion from an AnnData H5AD to a RDS file, for analysis in Seurat.

gef2img

Plot a heatmap of a bin GEF.

It supports using the feature expression matrix to generate a grayscale image heatmap of the spatial expression.

tar2img

Extract TIFF images from an image .tar.gz file. Usually including a microscope image aligned with the matrix, a tissue segmentation image and a cell segmentation image, if required algorithmic or manual processing results are recorded in the image .tar.gz file.

img2rpi

Conversion from TIFF images to an RPI file, used in StereoMap.

circle-info

Layer names can be set arbitrarily, but follow the format of <stain_type>/<image_type>, like DAPI/TissueMask. For the image of cell segmentation, we recommend you setting the layer name with a prefix of "CellMask", so that StereoMap display cell borders directly.

merge

Merge images (up to three) into one image.

circle-info

Note that the order of the image input represents its color channel, R-G-B.

Merged image of microscopy image SS200000135TL_D1_ssDNA_regist.tif and tissue segmentation mask file SS200000135TL_D1_ssDNA_tissue_cut.tif to evaluate the performance of tissue segmentation.

Part of the merged image of the microscopy image SS200000135TL_D1_ssDNA_regist.tif and cell segmentation mask file SS200000135TL_D1_ssDNA_mask.tif to evaluate the performance of cell segmentation.

overlay

Stack the template points onto the image, to check whether the image template crosspoints derived by image QC are accurate.

circle-info

The matrix template file, <stain_type>_matrix_template.txt, can be found in visualization.tar.gz.

Stack the matrix template onto SS200000135TL_D1_ssDNA_regist.tif image to verify the registration outcome.

Last updated