Transcriptogram analyses for the epitelial-mesenchymal transition
Epithelial-Mesenchymal Transition (EMT); Single-Cell RNA-seq; MCF10A cells; TGF-β1 treatment; Batch effect correction; Gene expression normalization; Transcriptogram analysis; Protein-protein interactions; Principal Component Analysis (PCA); Differential gene expression; Co-expression clustering; Temporal gene expression patterns
Epithelial-Mesenchymal Transition (EMT) is a biological process essential for embryonic development, tissue healing, and tumor progression. This study investigates transcriptional changes during EMT using Single-Cell RNA-seq data from MCF10A cells treated with TGF-β1 at different time points. The data were obtained from the study published in PNAS (2021) (DOI: 10.1073/pnas.2102050118). Samples were collected in two experimental batches to capture temporal variations: Batch 1: Days 0, 4, and 8 of treatment. Batch 2: Days 0, 1, 2, and 3 of treatment.The presence of two batches required batch effect correction, which was performed by normalizing gene expression between the day 0 samples from both batches. The normalization involved dividing expression values by the total number of reads per cell, followed by applying the ratio of the mean expression values of the control samples. The analysis was conducted using the Transcriptogramer package, which projects gene expression profiles onto an ordered list of proteins based on protein-protein interactions. This method enables the identification of continuous transcriptional patterns and the clustering of co-expressed genes. Principal Component Analysis (PCA) was employed to visualize gene expression variations throughout EMT without scaling (scale=FALSE), preserving the original biological variation. Transcriptograms were generated to represent EMT dynamics, identifying differentially expressed genes and potential partial stages of the process. The results obtained so far suggest distinct gene expression patterns between batches, reinforcing the importance of batch effect correction. The reconstruction of transcriptograms revealed functional clusters, highlighting key genes involved in EMT. Future steps include reconstructing transcriptograms for selected samples, identifying the most variable genes over time, and clustering genes based on co-expression to define a list of genes that best describe EMT in this context.