Tutorial 2: MCGAE's spatial clustering of colorectal cancer liver metastasis data elucidated the phenomenon of tumor invasion

Load library

import os
import pandas as pd
import scanpy as sc
import torch
import torch.optim as optim
import torch.nn as nn
import warnings
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.metrics.cluster import normalized_mutual_info_score as nmi_score
from sklearn.metrics import adjusted_rand_score as ari_score
from sklearn.cluster import KMeans
import itertools
from MCGAE.model import MCGAE
from MCGAE.utils import load_dataset, norm_and_filter, compute_adata_components, search_res, refine_label, set_seed, \
    Moran_I

# Suppressing runtime warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)

Prepare file path

"""
BASE_DIR: Project directory
data_dir: Data directory
result_dir: Result directory
file_path: File path
"""
file_name = "ST-colon1"

BASE_DIR = r"D:\Work\MCGAE project\MCGAE-master"
file_path = os.path.join(BASE_DIR, "benchmark", "Colorectal Cancer", f"{file_name}")
dir_path = os.path.join(file_path, "raw_data")

Extract image information

adata = load_dataset(dir_path, use_image=True)
sc.pp.filter_genes(adata, min_cells=1)
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", n_top_genes=3000)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
adata = adata[:, adata.var["highly_variable"]]
n_clusters = 20
print(n_clusters)

Tiling image: 100%|███████████████████████████████████████████████████████████████████████████████ [ time left: 00:00 ]
Extract feature: 100%|████████████████████████████████████████████████████████████████████████████ [ time left: 00:00 ]


The morphology feature is added to adata.obsm['X_morphology']!
20
Graph constructed!


adata.obsm["graph_orig"] = interaction

Compute multiple view components

set_seed(1234)
compute_adata_components(adata, n_components=100)
save_obj_z = pd.DataFrame()

For denoise data training: Epoch [0/100], Loss: 0.0041
For denoise data training: Epoch [10/100], Loss: 0.0006
For denoise data training: Epoch [20/100], Loss: 0.0006
For denoise data training: Epoch [30/100], Loss: 0.0006
For denoise data training: Epoch [40/100], Loss: 0.0006
For denoise data training: Epoch [50/100], Loss: 0.0006
For denoise data training: Epoch [60/100], Loss: 0.0006
For denoise data training: Epoch [70/100], Loss: 0.0006
For denoise data training: Epoch [80/100], Loss: 0.0006
For denoise data training: Epoch [90/100], Loss: 0.0006
Graph_hat constructed!

Train model

model = MCGAE(
    adata,
    n_latent=50,
    n_components=100,
    use_pca=True,
    # fusion_mode="holistic",
    fusion_mode="fractional",
    # fusion_mode="vanilla",
    use_emb_x_rec=True,
    use_emb_g_rec=True,
    dropout=0.01,  # 0.01
    random_seed=12,
    w_morph=1.9,
)

model.train(
    weight_decay=5e-4,
    # weight_decay=0.0,
    w_recon_x=0.05,
    w_recon_g=0.1,
    w_contrast=0.1,
    w_cluster=0.1,
    n_clusters=n_clusters,
    cl_start_epoch=100,
    compute_g_loss="cross_entropy",
)

Now the cycle is: 12
Searching resolution...

training:   2%|█▎                                                                      | 7/400 [00:00<00:17, 22.04it/s]

Epoch: 0, Loss: 2.7255704402923584


training:   8%|█████▌                                                                 | 31/400 [00:00<00:07, 50.33it/s]

Epoch: 20, Loss: 1.2939739227294922


training:  12%|████████▉                                                              | 50/400 [00:01<00:06, 55.35it/s]

Epoch: 40, Loss: 0.9415194988250732


training:  17%|████████████▏                                                          | 69/400 [00:01<00:05, 56.74it/s]

Epoch: 60, Loss: 0.7995103001594543


training:  22%|███████████████▍                                                       | 87/400 [00:01<00:05, 57.74it/s]

Epoch: 80, Loss: 0.7482431530952454


training:  28%|███████████████████▌                                                  | 112/400 [00:02<00:04, 58.68it/s]

Epoch: 100, Loss: 0.7114622592926025


training:  32%|██████████████████████▊                                               | 130/400 [00:02<00:04, 58.09it/s]

Epoch: 120, Loss: 0.6843234300613403


training:  37%|██████████████████████████                                            | 149/400 [00:02<00:04, 58.12it/s]

Epoch: 140, Loss: 0.6560841202735901


training:  43%|██████████████████████████████▎                                       | 173/400 [00:03<00:03, 57.56it/s]

Epoch: 160, Loss: 0.6310422420501709


training:  48%|█████████████████████████████████▍                                    | 191/400 [00:03<00:03, 57.97it/s]

Epoch: 180, Loss: 0.610443115234375


training:  52%|████████████████████████████████████▌                                 | 209/400 [00:03<00:03, 58.08it/s]

Epoch: 200, Loss: 0.593916654586792


training:  57%|███████████████████████████████████████▋                              | 227/400 [00:04<00:02, 57.91it/s]

Epoch: 220, Loss: 0.5803815722465515


training:  63%|███████████████████████████████████████████▉                          | 251/400 [00:04<00:02, 58.28it/s]

Epoch: 240, Loss: 0.5691173076629639


training:  67%|███████████████████████████████████████████████                       | 269/400 [00:04<00:02, 58.02it/s]

Epoch: 260, Loss: 0.560197114944458


training:  73%|███████████████████████████████████████████████████▎                  | 293/400 [00:05<00:01, 57.74it/s]

Epoch: 280, Loss: 0.5517188310623169


training:  78%|██████████████████████████████████████████████████████▍               | 311/400 [00:05<00:01, 58.08it/s]

Epoch: 300, Loss: 0.5455691814422607


training:  82%|█████████████████████████████████████████████████████████▊            | 330/400 [00:05<00:01, 58.34it/s]

Epoch: 320, Loss: 0.5398231744766235


training:  87%|████████████████████████████████████████████████████████████▉         | 348/400 [00:06<00:00, 57.37it/s]

Epoch: 340, Loss: 0.5346991419792175


training:  93%|█████████████████████████████████████████████████████████████████     | 372/400 [00:06<00:00, 58.27it/s]

Epoch: 360, Loss: 0.5296562314033508


training:  98%|████████████████████████████████████████████████████████████████████▎ | 390/400 [00:07<00:00, 58.23it/s]

Epoch: 380, Loss: 0.5251146554946899


training: 100%|██████████████████████████████████████████████████████████████████████| 400/400 [00:07<00:00, 55.49it/s]

Epoch: 399, Loss: 0.5210220813751221

Analysis of model calculation results

temp = model.get_model_output()
emb, y_pred, emb_rec = temp["emb"], temp["y_pred"], temp["x_rec"]
adata.obsm["z"] = emb
adata.obs["pred"] = y_pred
res = search_res(adata, n_clusters, rep="z", start=0.3, end=3, increment=0.02)
sc.pp.neighbors(adata, use_rep="z", n_neighbors=10, random_state=1234)
sc.tl.leiden(adata, key_added="leiden", resolution=res, random_state=1234)
new_type = refine_label(adata, key='leiden', radius=30)
adata.obs['leiden'] = new_type
sc.pl.spatial(adata, img_key="hires", color='leiden', title="MCGAE")
plt.tight_layout()

Searching resolution...

jpg

<Figure size 1920x1440 with 0 Axes>

Data expression after noise reduction

sc.pp.scale(adata, zero_center=False, max_value=1)
adata2 = adata.copy()
adata2.X = emb_rec
sc.pl.spatial(adata, img_key="hires", color="TF", title="TF_raw_Exp", color_map="RdYlGn_r")
sc.pl.spatial(adata2, img_key="hires", color="TF", title="TF_denoise", color_map="RdYlGn_r")
sc.pl.spatial(adata, img_key="hires", color="ALB", title="ALB_raw_Exp", color_map="RdYlGn_r")
sc.pl.spatial(adata2, img_key="hires", color="ALB", title="ALB_denoise", color_map="RdYlGn_r")

jpg