Apr 16, 2025
|
9
min read
See our Platform in Action as we Conduct this Case Study
Find our demo here: https://revilico.bio/demos
To continue with our EGFR case study series, this edition focuses on another foundational layer of computational drug discovery: characterizing disease-relevant targets using our Target Analytics platform. We’ll be analyzing EGFR by utilizing some of our single-cell RNA sequencing and clustering tools.
Understanding EGFR in the Context of Cancer Biology
EGFR is a transmembrane receptor tyrosine kinase involved in cellular processes such as proliferation, survival, and differentiation. In many cancers, including non-small cell lung cancer (NSCLC) and certain subtypes of breast cancer, EGFR is either over expressed or mutated, leading to constitutive activation of downstream signaling pathways. These factors make EGFR a critical node in the oncogenic network and a powerful therapeutic target.
Running scRNA-Seq for Wild-Type vs Mutant EGFR
We begin with our Gene Expression module in our scRNA-Seq Analysis tab, where users can upload both control and experimental .h5ad files, supporting side-by-side evaluation of their wild-type and mutated profiles. Our platform provides key outputs, including volcano plots of differentially expressed genes, edge-per-node graphs for interaction network topology, functional annotation networks, and enrichment score distributions. Here, you will see the scRNA-Seq profiles of MDA-MB-436 human breast cancer cell line, which is a model for triple-negative breast cancer (TNBC), and you'll see a large focus of IL6 as a potential disease target. This outlines the general capabilities of diverse disease target identification, but for the sake of this article we will focus on EGFR to remain in line with our case study.
Performing Clustering Analysis and UMAP Visualization
To spatially interpret our gene expression profiles, users can visualize gene-level heterogeneity across cell populations using both our 2D and 3D UMAP plots. Using EGFR as our focal gene of interest and our dynamic filtering tools, we refined visualizations by expression intensity and cluster proximity, allowing us to observe how EGFR expression varies across identified cell clusters. As shown below, experimental variants between control disease cell lines and paclitaxel-treated cells can be clearly visualized. The control group (Red Cluster) retains high EGFR expression, while the treated group (Blue Cluster) shows significant downregulation. This enables automated, gene-by-gene analysis across diverse conditions and drug treatments. Revilico also maintains a repository of over 300 million single-cell profiles from both murine and human models, available upon request.

Figure 1. A visualization EGFR expression in a control file (left) vs the experimental file (right) using Revilico's Clustering Analysis. The expression level threshold is set at 70 where all cells expressing EGFR at this threshold are more red.
Extracting Amino Acid Sequences for Downstream Modeling
With transcriptional patterns established, we now turn to protein-level analysis. Revilico’s Target Data Extraction tools enable users to identify genes most closely linked to a given disease through the Gene Relevancy Engine. This engine scans the entire PubMed database across all known human genes, quantifying gene-disease associations based on frequency of mention in the literature.


Figure 2. Top genes associated with breast cancer (top) and lung cancer (bottom).
For breast cancer and lung cancer, EGFR ranks among some of the top disease-associated genes. From there, users can extract the amino acid sequence directly via UniProt integration by selecting the protein ID corresponding to human EGFR.
Structural Analysis with AlphaFold
With the amino acid sequence now extracted, users can transition to the Target Analysis Dashboard, where EGFR’s structure can be rendered, manipulated, and interrogated.

Figure 3. 3D structure of EGFR using Revilico’s AlphaFold feature.
Summary
Revilico’s Target Analytics Platform enables users to perform scRNA-Seq comparisons between wild-type and mutant forms, visualize gene clustering with dimensionality reduction techniques, extract protein sequences for structure-guided discovery, and integrate biological and structural data into a cohesive drug discovery workflow.
Stay tuned for the next part of our EGFR case study series, where we'll use our target analysis tools to more deeply understand EGFR's structural and thermodynamic territories.
Join our Slack community: Join here