Virtual Screening

Aug 6, 2025

min read

The Engine

The Virtual Screening suite is a comprehensive platform for predicting and analyzing biological activity across targets, designed for target engagement, safety assessment, and hit identification. It integrates binding affinity prediction, post-screening analytics, toxicity and selectivity profiling, and CYP450 risk assessment into a single workflow so teams can prioritize compounds with the best balance of potency and safety.

Users upload SMILES libraries and protein sequences to run high-throughput predictions, then explore interactive dashboards for hit isolation, clustering, correlation analysis, and metabolic risk evaluation, with results tracked and downloadable in the Command Center.

The Algorithm

The suite unifies several coordinated engines that map chemical structure to activity, selectivity, and risk:

Binding Affinity Analysis
- Two modes: single-protein screening for IC50, Kd, Ki, EC50 prioritization, and multi-protein screening for selectivity and toxicity profiling.
- Inputs: protein sequence and CSV compound library with a smilesString header, or curated target panels from SIP dashboards, DrugBank, HPA druggable proteomes, and UniProt disease targets.
- Models combine molecular fingerprints and learned embeddings with sequence or structure encodings, producing pKd or pIC50-style scores with confidence estimates.
Activity Hit IO
- Five modes: data visualization, hit isolation by percentage, process data for trackable runs, clustering and phores with UMAP chemical space, and embedded hit isolation via SMILES similarity to reference molecules.
- Outputs include distribution plots, correlation matrices, hit tables, clustering summaries, and similarity rankings, exportable as CSV or to RevilicoGPT for AI analysis.
Drug Toxicity & Selectivity Dashboard
- Consumes multi-protein affinity outputs to quantify selectivity windows, flag off-target risks, and visualize chemical space with adjustable activity and toxicity thresholds.
- Supports curated enzyme and target panels, forward selection, antilog transforms, and custom filtering for rapid triage.
CYP450 Dashboard
- Predicts CYP inhibition scores on a 0 to 1 scale per isoform with user-set thresholds for risk tolerance, accepting raw SMILES or CSV uploads.
- Summaries highlight likely non-inhibitors, moderate risks, and strong inhibitors to reduce metabolic liability early.

All engines include automated validation of file formats, SMILES parsing, and sequence checks, with pipeline naming, progress monitoring, and reproducible parameter capture.

Algorithm Validation

Activity models are benchmarked against public bioactivity resources and internal holdout sets for classification and regression accuracy across diverse target families. Binding affinity predictions show consistent rank ordering on retrospective screens, with top 5 to 10 percent enrichments aligning with known actives. Multi-protein runs reproduce expected selectivity patterns on reference panels. CYP predictions separate known inhibitors from non-inhibitors at commonly used thresholds, supporting early risk filtering. Visualization modules mirror standard cheminformatics diagnostics, ensuring distributions, correlations, and clusters reflect underlying data quality.

Scientific Impact

The Virtual Screening suite enables data-driven decisions from first pass screening through preclinical triage:

Prioritize potent compounds via single-target ranking while preserving scaffold diversity.
Quantify selectivity across protein panels to de-risk off-target activity.
Map chemical space with UMAP to reveal series, clusters, and pharmacophore trends.
Integrate hit isolation, similarity search, and enrichment metrics for rapid hypothesis testing.
Reduce metabolic liabilities by filtering CYP inhibitors and visualizing isoform-specific risk profiles.

By connecting potency, selectivity, and toxicity readouts in one place, teams can move from large libraries to focused, testable hit lists with clear rationale.

Business Impact

The suite accelerates hit finding and reduces costly dead ends by:

Scaling high-throughput activity prediction and post-screen analytics for libraries up to tens of thousands of compounds.
Improving decision quality with unified dashboards for potency, selectivity, and CYP risk.
Lowering experimental spend by filtering poor candidates before assays.
Streamlining collaboration through standardized inputs, reproducible pipelines, and Command Center tracking.

The result is faster, clearer progression from virtual screens to experimentally ready hit sets, with transparent parameters and exports that support downstream docking, ADMET, and lead optimization.