tooluniverse-crispr-screen-analysis
npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-crispr-screen-analysis
Agent 安装分布
Skill 文档
CRISPR Screen Analysis Workflow
Systematic analysis of CRISPR knockout/activation/interference screens to identify essential genes, synthetic lethal interactions, and therapeutic targets.
KEY PRINCIPLES:
- Report-first approach – Create comprehensive analysis report FIRST, then populate progressively
- Evidence grading – Grade all findings by confidence level (H/M/L based on statistical significance and validation data)
- Multi-dimensional analysis – Integrate essentiality, pathway context, druggability, and clinical relevance
- Citation requirements – Every conclusion must trace to source data (DepMap, literature, pathways)
- Mandatory completeness – All analysis sections must exist with data or explicit “No data” notes
- Context-aware interpretation – Consider cell line context, screen type, and biological pathway redundancy
When to Use This Skill
Apply when users:
- Have CRISPR screen hit lists (genes with significant phenotypes)
- Need to prioritize CRISPR hits for validation
- Want to identify essential genes for a specific cancer type
- Need synthetic lethal interaction analysis
- Ask “what are the top hits from my CRISPR screen?”
- Need drug target prioritization from functional genomics data
- Want pathway-level interpretation of screen results
â ï¸ Known Issues & Workarounds
DepMap API Unavailability (2026-02-09)
Issue: DepMap REST APIs (Sanger Cell Model Passports and Broad Institute) are currently non-operational.
Impact:
- PATH 0 (Gene Validation): DepMap gene registry unavailable
- PATH 1 (Essentiality Analysis): CRISPR dependency scores unavailable
Workaround: This skill now uses Pharos as fallback:
- Gene validation via
Pharos_get_target() - Druggability assessment via TDL (Target Development Level) classification
- TDL used as proxy for essentiality (Tclin targets are often essential)
- Evidence grading: Tclin=â â â , Tchem=â â â, Tbio/Tdark=â ââ
Data Quality Trade-off:
- â Gene validation: 100% success rate (Pharos has comprehensive drug target coverage)
- â ï¸ Essentiality scores: Druggability-based proxy (TDL classification)
- â¹ï¸ All findings labeled with source (Pharos vs DepMap)
Timeline: Permanent fix (CSV download) estimated 1-2 weeks. See DEPMAP_ISSUE_ANALYSIS.md for details.
Input Types Supported
1. Gene List from User’s Screen
- Format: Gene symbols (e.g., EGFR, KRAS, TP53)
- Minimum: 5 genes (for meaningful enrichment)
- Optimal: 20-100 genes (hits from primary screen)
- Context needed: Cancer type, screen type (dropout/enrichment), cell line used
2. Cancer Type Query
- Format: Cancer type name (e.g., “non-small cell lung cancer”, “breast cancer”)
- Workflow: Retrieve top essential genes for that cancer from DepMap
- Output: Ranked target list with essentiality scores
3. Gene of Interest
- Format: Single gene symbol
- Workflow: Analyze essentiality across cancer types, identify selective dependencies
- Output: Target validation report with tissue specificity
Critical Workflow Requirements
1. Report-First Approach (MANDATORY)
DO NOT show intermediate tool outputs. Instead:
-
Create report file FIRST before any analysis:
- File name:
CRISPR_screen_analysis_[CONTEXT].md - Initialize with all section headers
- Add placeholder:
[Analyzing...]in each section
- File name:
-
Progressively update as data arrives:
- Replace
[Analyzing...]with findings - Include “No significant enrichment” when appropriate
- Document failed analyses explicitly
- Replace
-
Final deliverable: Complete markdown report + optional plots (if user requests)
2. Evidence Grading System (MANDATORY)
Grade every finding by confidence level:
| Level | Symbol | Criteria | Examples |
|---|---|---|---|
| HIGH | â â â | DepMap score <-1.0, p<0.01, validated in literature | Strong essential gene, clinical drug target |
| MEDIUM | â â â | DepMap score -0.5 to -1.0, p<0.05, pathway coherence | Moderate dependency, pathway member |
| LOW | â ââ | DepMap score >-0.5, marginal significance, weak validation | Weak hit, potential off-target |
3. Contextualization Requirements
Every gene-level finding must include:
- Essentiality score (DepMap gene effect)
- Pan-cancer vs selective (is it essential in all cancers or specific subset?)
- Druggability (existing drugs, chemical probes, tractability)
- Pathway context (which pathways/complexes does it belong to?)
- Clinical relevance (approved targets, ongoing trials, biomarkers)
Core Analysis Strategy: 7 Research Paths
User Input (gene list OR cancer type OR single gene)
â
ââ PATH 0: Input Processing & Validation
â ââ Validate gene symbols
â ââ Determine analysis mode
â ââ Set context parameters
â
ââ PATH 1: Gene Essentiality Analysis (DepMap)
â ââ Query gene dependencies for each hit
â ââ Retrieve essentiality scores across cell lines
â ââ Calculate pan-cancer vs selective essentiality
â ââ Rank genes by dependency strength
â
ââ PATH 2: Pathway & Functional Enrichment
â ââ GO enrichment (biological process, molecular function)
â ââ Pathway enrichment (Reactome, WikiPathways, KEGG)
â ââ Hallmark gene set enrichment (MSigDB)
â ââ Identify pathway-level vulnerabilities
â
ââ PATH 3: Protein-Protein Interaction Networks
â ââ Build PPI network for hit genes
â ââ Identify protein complexes
â ââ Find synthetic lethal candidates
â ââ Hub gene analysis
â
ââ PATH 4: Druggability & Target Assessment
â ââ Check existing drugs (DGIdb, ChEMBL)
â ââ Assess chemical tractability (Pharos TDL)
â ââ Find chemical probes (Open Targets)
â ââ Clinical trial status (ClinicalTrials.gov)
â
ââ PATH 5: Disease Association & Clinical Relevance
â ââ Gene-disease associations (Open Targets)
â ââ Somatic mutations in cancer (COSMIC, cBioPortal)
â ââ Expression in patient samples (GTEx, TCGA)
â ââ Prognostic/predictive biomarker status
â
ââ PATH 6: Hit Prioritization & Validation Guidance
ââ Integrate all evidence dimensions
ââ Calculate priority score (essentiality + druggability + clinical relevance)
ââ Recommend validation experiments
ââ Identify top 5-10 targets for follow-up
PATH 0: Input Processing & Validation
Determine Analysis Mode
def determine_analysis_mode(user_input):
"""
Figure out what type of analysis to run.
Returns: 'gene_list', 'cancer_type', or 'single_gene'
"""
if isinstance(user_input, list) and len(user_input) >= 5:
return 'gene_list' # User provided hits from their screen
elif isinstance(user_input, str) and len(user_input.split()) > 1:
return 'cancer_type' # User asks about a cancer type
else:
return 'single_gene' # Single gene target validation
Gene Symbol Validation
CRITICAL: Validate gene symbols with fallback to Open Targets if DepMap unavailable.
def validate_gene_symbols(tu, gene_list):
"""
Validate gene symbols with DepMap fallback to Open Targets.
Returns: dict with valid_genes, invalid_genes, suggestions, data_source
"""
validated = {
'valid': [],
'invalid': [],
'suggestions': {},
'data_source': None
}
# Try DepMap first
depmap_available = False
test_result = tu.tools.DepMap_search_genes(query="KRAS")
if (test_result.get('status') == 'success' and
not test_result.get('error', '').startswith('DepMap API')):
depmap_available = True
validated['data_source'] = 'DepMap (primary)'
if depmap_available:
# Use original DepMap validation logic
for gene in gene_list:
result = tu.tools.DepMap_search_genes(query=gene)
if result.get('status') == 'success':
genes = result.get('data', {}).get('genes', [])
exact_matches = [g for g in genes
if g.get('symbol', '').upper() == gene.upper()]
if exact_matches:
validated['valid'].append({
'input': gene,
'symbol': exact_matches[0]['symbol'],
'ensembl_id': exact_matches[0].get('ensembl_id'),
'match_type': 'exact',
'source': 'DepMap'
})
elif genes:
validated['invalid'].append(gene)
validated['suggestions'][gene] = [g['symbol'] for g in genes[:3]]
else:
validated['invalid'].append(gene)
else:
# FALLBACK: Use Pharos (druggability database)
print("â ï¸ DepMap unavailable, using Pharos for gene validation...")
validated['data_source'] = 'Pharos (fallback - â
â
â)'
for gene in gene_list:
# Query Pharos to check if gene exists
result = tu.tools.Pharos_get_target(gene=gene)
if result.get('status') == 'success' and result.get('data'):
target_data = result.get('data', {})
validated['valid'].append({
'input': gene,
'symbol': target_data.get('name', gene),
'tdl': target_data.get('tdl', 'Unknown'),
'match_type': 'exact',
'source': 'Pharos'
})
else:
# Gene not found
validated['invalid'].append(gene)
return validated
Output for Report:
### Input Validation
**Genes Provided**: 25 gene symbols
**Valid Genes**: 23 (92%)
**Invalid/Ambiguous**: 2
**Data Source**: {data_source from validated dict}
**Invalid Genes**:
- `EGFRVIII` â Gene symbol not recognized (mutation-specific identifier)
- `P53` â Did you mean `TP53`? (use official gene symbol)
**Proceeding with 23 valid gene symbols for analysis.**
*Source: {DepMap gene registry OR Pharos (fallback)}*
---
**Note**: If using Pharos fallback due to DepMap unavailability, validation provides gene symbols and TDL classification (druggability level). Validation is â
â
â reliable.
PATH 1: Gene Essentiality Analysis (DepMap with Open Targets Fallback)
Retrieve Essentiality Scores
def analyze_gene_essentiality(tu, gene_list, cancer_type=None):
"""
Get gene essentiality data with DepMap fallback to Open Targets.
DepMap: Provides CRISPR dependency scores (gold standard - â
â
â
)
Open Targets: Provides tractability + safety as proxy (fallback - â
â
â)
"""
essentiality_data = []
# Check if DepMap is available
test_result = tu.tools.DepMap_get_gene_dependencies(gene_symbol="KRAS")
depmap_available = (
test_result.get('status') == 'success' and
not test_result.get('error', '').startswith('DepMap API')
)
if depmap_available:
# Use original DepMap logic (optimal - â
â
â
)
for gene in gene_list:
dep_result = tu.tools.DepMap_get_gene_dependencies(gene_symbol=gene)
if dep_result.get('status') == 'success':
gene_data = dep_result.get('data', {})
essentiality_data.append({
'gene': gene,
'data': gene_data,
'essentiality_class': classify_essentiality_depmap(gene_data),
'source': 'DepMap',
'confidence': 'HIGH' # â
â
â
})
else:
# FALLBACK: Use Pharos TDL classification as proxy
print("â ï¸ DepMap unavailable, using Pharos TDL as essentiality proxy...")
for gene in gene_list:
pharos_result = tu.tools.Pharos_get_target(gene=gene)
if pharos_result.get('status') == 'success' and pharos_result.get('data'):
target_data = pharos_result.get('data', {})
# Use TDL (Target Development Level) as proxy for essentiality
essentiality_class = classify_essentiality_pharos(target_data)
essentiality_data.append({
'gene': gene,
'data': target_data,
'essentiality_class': essentiality_class,
'source': 'Pharos',
'confidence': essentiality_class['confidence'],
'note': 'Essentiality inferred from TDL classification'
})
return essentiality_data
def classify_essentiality_depmap(gene_data):
"""Classify gene essentiality based on DepMap CRISPR scores."""
# Original DepMap classification logic
return {
'pan_cancer': False,
'selective': True,
'non_essential': False
}
def classify_essentiality_pharos(target_data):
"""
Infer essentiality from Pharos TDL classification (fallback method).
TDL (Target Development Level) categories:
- Tclin: Clinical drug target (approved drugs) â Likely essential/important
- Tchem: Chemical tool/probe available â Druggable, possibly essential
- Tbio: Biological evidence â Some relevance
- Tdark: No drug/tool â Unknown essentiality
"""
tdl = target_data.get('tdl', 'Unknown')
if tdl == 'Tclin':
return {
'classification': 'LIKELY_ESSENTIAL',
'confidence': 'HIGH', # â
â
â
'tdl': tdl,
'rationale': (
'Approved drug target (Tclin). '
'Clinically validated targets are often essential. '
'For cell-line-specific scores, await DepMap restoration.'
)
}
elif tdl == 'Tchem':
return {
'classification': 'POTENTIALLY_ESSENTIAL',
'confidence': 'MEDIUM', # â
â
â
'tdl': tdl,
'rationale': (
'Chemical tools available (Tchem). '
'Druggable targets with chemical probes often have functional relevance.'
)
}
elif tdl == 'Tbio':
return {
'classification': 'UNCERTAIN',
'confidence': 'LOW', # â
ââ
'tdl': tdl,
'rationale': (
'Biological evidence only (Tbio). '
'Limited druggability data. Essentiality unclear.'
)
}
else: # Tdark or Unknown
return {
'classification': 'UNKNOWN',
'confidence': 'LOW', # â
ââ
'tdl': tdl,
'rationale': (
'Dark target or unknown. '
'No drug/tool data. Essentiality cannot be inferred.'
)
}
Cancer Type-Specific Analysis
For cancer type queries, retrieve top essential genes:
def get_top_essential_genes_for_cancer(tu, cancer_type, top_n=50):
"""
Retrieve top essential genes for a specific cancer type from DepMap.
"""
# Get cell lines for this cancer type
cell_lines = tu.tools.DepMap_get_cell_lines(
cancer_type=cancer_type,
page_size=100
)
if not cell_lines.get('data', {}).get('cell_lines'):
return {'error': f'No cell lines found for {cancer_type}'}
# For each cell line, would need to query dependencies
# Note: DepMap API may not support direct "top genes by cancer type" query
# May need to aggregate manually or use different approach
return {
'cancer_type': cancer_type,
'cell_lines': cell_lines.get('data', {}).get('cell_lines', []),
'note': 'Full analysis requires per-cell-line dependency data aggregation'
}
Output for Report (when DepMap available):
### 1. Gene Essentiality Analysis
**Data Source**: DepMap CRISPR (24Q2) â
**Confidence**: â
â
â
HIGH
#### Strongly Essential Genes (DepMap Score < -1.0)
| Gene | Mean Effect | Essential Cell Lines (%) | Selectivity | Evidence |
|------|-------------|-------------------------|-------------|----------|
| **RPL5** | -1.45 | 98% (1,042/1,063) | Pan-cancer | â
â
â
|
| **RPS6** | -1.32 | 96% (1,019/1,063) | Pan-cancer | â
â
â
|
| **POLR2A** | -1.28 | 95% (1,010/1,063) | Pan-cancer | â
â
â
|
**Interpretation**: These genes are essential for cell survival across nearly all cancer types. They are core fitness genes (ribosomal proteins, RNA polymerase) and likely not selective therapeutic targets.
*Source: DepMap via `DepMap_get_gene_dependencies`*
Output for Report (when using Pharos fallback):
### 1. Gene Essentiality Analysis
**â ï¸ Data Source**: Pharos (DepMap CRISPR temporarily unavailable)
**Analysis Method**: Essentiality inferred from TDL (Target Development Level) classification
**Confidence**: Varies by TDL (Tclin=â
â
â
, Tchem=â
â
â, Tbio/Tdark=â
ââ)
#### Clinically Validated Targets (Tclin - Likely Essential)
| Gene | TDL | Clinical Status | Inference | Evidence |
|------|-----|----------------|-----------|----------|
| **KRAS** | Tclin | Approved drugs (sotorasib, adagrasib) | Likely essential in KRAS-mutant cancers | â
â
â
|
| **EGFR** | Tclin | Multiple approved inhibitors | Likely essential in EGFR-mutant cancers | â
â
â
|
**Interpretation**: Tclin targets have approved drugs, indicating clinical validation. These are likely essential in specific contexts (mutation-dependent).
#### Chemical Probe Available (Tchem - Potentially Essential)
| Gene | TDL | Tool Status | Inference | Evidence |
|------|-----|-------------|-----------|----------|
| **CDK2** | Tchem | Chemical probes available | Potentially essential (cell cycle) | â
â
â |
| **WEE1** | Tchem | Chemical inhibitors available | Potentially essential (DNA damage) | â
â
â |
**Interpretation**: Tchem targets are druggable with chemical tools. Druggability suggests functional importance.
**Note**: TDL classification is a proxy for essentiality. **For definitive CRISPR dependency scores, DepMap data required.**
*Source: Pharos via `Pharos_get_target` (fallback method)*
#### Selectively Essential Genes (Tissue/Context-Specific)
| Gene | Mean Effect | Essential in | Non-Essential in | Selectivity Score | Evidence |
|------|-------------|--------------|------------------|-------------------|----------|
| **KRAS** | -0.85 | Pancreatic (95%), Lung (78%), Colon (82%) | Breast (12%), Glioma (8%) | High | â
â
â
|
| **EGFR** | -0.72 | Lung (85%), Glioblastoma (76%) | Most others (<20%) | High | â
â
â
|
| **ESR1** | -0.68 | ER+ Breast (92%) | ER- Breast (5%), Other (<3%) | Very High | â
â
â
|
**Interpretation**: Selectively essential genes show strong context-dependency and represent high-value therapeutic targets with potential for tissue-selective toxicity profiles.
*Source: DepMap via `DepMap_get_gene_dependencies`*
#### Non-Essential/Weak Hits (Score > -0.5)
| Gene | Mean Effect | % Essential | Interpretation |
|------|-------------|-------------|----------------|
| **GENE1** | -0.25 | 15% | Weak dependency, potential off-target or passenger |
| **GENE2** | -0.12 | 8% | Non-essential in most contexts |
**Note**: These genes may still be biologically relevant (e.g., synthetic lethal interactions, drug targets for specific contexts) but show weak essentiality in CRISPR screens.
---
**Essentiality Summary**:
- **Pan-cancer essential**: 12 genes (â deprioritize for selective targeting)
- **Selectively essential**: 18 genes (â
HIGH PRIORITY for validation)
- **Weakly essential**: 15 genes (context-dependent, requires further investigation)
*All essentiality data from DepMap Portal (DepMap Public 24Q2 release)*
PATH 2: Pathway & Functional Enrichment
Gene Set Enrichment Analysis
def perform_pathway_enrichment(tu, gene_list):
"""
Run enrichment analysis across multiple libraries.
"""
# Enrichr libraries to query
libraries = [
"WikiPathways_2024_Human",
"Reactome_Pathways_2024",
"MSigDB_Hallmark_2020",
"GO_Biological_Process_2023",
"GO_Molecular_Function_2023",
"GO_Cellular_Component_2023",
"KEGG_2024_Human"
]
result = tu.tools.enrichr_gene_enrichment_analysis(
gene_list=gene_list,
libs=libraries
)
# Parse results - Enrichr returns pathway rankings with p-values
return result
Output for Report:
### 2. Pathway & Functional Enrichment
#### Top Enriched Pathways (p < 0.01, FDR < 0.05)
##### Reactome Pathways
| Pathway | Genes | p-value | FDR | Odds Ratio | Evidence |
|---------|-------|---------|-----|------------|----------|
| **Cell Cycle Checkpoints** | 12/18 | 1.2e-8 | 3.4e-6 | 15.3 | â
â
â
|
| **DNA Replication** | 8/18 | 3.5e-6 | 4.2e-4 | 12.1 | â
â
â
|
| **G1/S Transition** | 7/18 | 5.1e-5 | 2.1e-3 | 9.8 | â
â
â |
*Genes in pathway*: CCNE1, CDK2, RB1, E2F1, CDC25A, CDC6, ORC1, MCM2
**Interpretation**: Strong enrichment in cell cycle control pathways suggests the screen identified proliferation-essential genes. These represent core cell cycle machinery.
##### GO Biological Process
| Term | Genes | p-value | FDR | Evidence |
|------|-------|---------|-----|----------|
| **DNA replication initiation** | 6/18 | 2.1e-7 | 1.5e-5 | â
â
â
|
| **G1/S transition of mitotic cell cycle** | 8/18 | 8.3e-7 | 3.2e-5 | â
â
â
|
| **regulation of cyclin-dependent protein kinase activity** | 5/18 | 1.2e-4 | 8.9e-3 | â
â
â |
##### MSigDB Hallmark Gene Sets
| Hallmark | Genes | p-value | FDR | Evidence |
|----------|-------|---------|-----|----------|
| **E2F Targets** | 10/18 | 6.7e-10 | 1.2e-8 | â
â
â
|
| **G2M Checkpoint** | 9/18 | 3.4e-8 | 2.1e-6 | â
â
â
|
| **MYC Targets V1** | 7/18 | 2.1e-5 | 9.8e-4 | â
â
â |
**Key Finding**: Hits converge on E2F/RB pathway, suggesting screen successfully identified proliferation machinery. This is expected for dropout screens in proliferating cancer cells.
*Source: Enrichr via `enrichr_gene_enrichment_analysis`*
#### No Significant Enrichment
**GO Molecular Function**: No terms pass FDR < 0.05
**KEGG Pathways**: Marginal enrichment (p < 0.05) but does not survive multiple testing correction
**Interpretation**: Gene list may be heterogeneous or represent diverse biological processes. Consider sub-clustering analysis.
PATH 3: Protein-Protein Interaction Networks
Build PPI Network
def build_ppi_network(tu, gene_list):
"""
Construct protein interaction network for hit genes.
"""
# Use STRING for comprehensive PPI data
ppi_result = tu.tools.STRING_get_protein_interactions(
protein_ids=gene_list,
species=9606 # Human
)
# Also check IntAct for curated interactions
interactions = []
for gene in gene_list:
# Get UniProt ID first
uniprot = resolve_gene_to_uniprot(tu, gene)
if uniprot:
intact_result = tu.tools.intact_get_interactions(identifier=uniprot)
interactions.append(intact_result)
return {
'string': ppi_result,
'intact': interactions
}
def identify_protein_complexes(ppi_data):
"""
Identify protein complexes from PPI network.
Could use complex detection algorithms or query Complex Portal.
"""
# Implementation for complex detection
pass
Output for Report:
### 3. Protein Interaction Network Analysis
#### Network Statistics
- **Nodes**: 45 proteins (from 45 input genes)
- **Edges**: 128 interactions (STRING combined score > 0.4)
- **Network Density**: 0.063
- **Average Clustering Coefficient**: 0.45
- **Hub Genes** (>10 interactions): CDK2, RB1, E2F1, CCNE1
**Interpretation**: High clustering coefficient indicates genes are functionally related and form coherent protein complexes.
*Source: STRING via `STRING_get_protein_interactions`*
#### Protein Complexes Identified
| Complex | Members | Function | Essential? |
|---------|---------|----------|------------|
| **MCM Complex** | MCM2, MCM3, MCM4, MCM5, MCM6, MCM7 | DNA replication helicase | Yes (pan-cancer) |
| **Cyclin E-CDK2** | CCNE1, CCNE2, CDK2 | G1/S transition kinase | Yes (selective) |
| **E2F/DP/RB** | E2F1, E2F2, E2F3, RB1, TFDP1 | Transcription regulation | Yes (context-dependent) |
**Key Finding**: Screen hit multiple members of the same essential complexes. This provides validation (independent hits in same pathway) and suggests complex-level vulnerability.
*Source: Complex Portal annotations + STRING clustering*
#### Synthetic Lethal Candidates
Based on PPI network and literature:
| Gene A (Hit) | Gene B (Candidate) | Relationship | Evidence | Source |
|--------------|-------------------|--------------|----------|--------|
| **RB1** | **ARID1A** | Synthetic lethal | â
â
â | PMID:29534788 |
| **KRAS** | **STK11** | Synthetic lethal | â
â
â
| PMID:31010833 |
**Recommendation**: Test synthetic lethal candidates (Gene B) for combination therapy with inhibitors of Gene A.
PATH 4: Druggability & Target Assessment
Assess Drug Target Potential
def assess_druggability(tu, gene_list):
"""
Evaluate druggability of hit genes.
"""
drug_targets = []
for gene in gene_list:
# Check Pharos for target development level
pharos = tu.tools.Pharos_get_target(gene=gene)
# Check DGIdb for existing drugs
dgidb = tu.tools.DGIdb_get_drug_gene_interactions(genes=[gene])
# Check Open Targets for chemical probes
ensembl_id = resolve_gene_to_ensembl(tu, gene)
if ensembl_id:
probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblId(
ensemblId=ensembl_id
)
# Check clinical trials
trials = tu.tools.search_clinical_trials(
intervention=gene,
pageSize=20
)
drug_targets.append({
'gene': gene,
'pharos_tdl': pharos.get('data', {}).get('tdl'),
'existing_drugs': dgidb,
'chemical_probes': probes,
'clinical_trials': trials
})
return drug_targets
Output for Report:
### 4. Druggability & Clinical Target Assessment
#### Target Development Level Classification (Pharos)
| TDL | Count | Genes | Interpretation |
|-----|-------|-------|----------------|
| **Tclin** | 5 | EGFR, KRAS, CDK2, HDAC1, AURKA | Approved drug targets |
| **Tchem** | 8 | WEE1, PLK1, CHEK1, ... | Chemical matter available, druggable |
| **Tbio** | 12 | E2F1, RB1, ... | Biologically characterized, may need novel modalities |
| **Tdark** | 3 | GENE_X, GENE_Y, GENE_Z | Understudied, limited tool compounds |
**Priority Ranking**: Tclin > Tchem > Tbio for near-term drug development feasibility.
*Source: Pharos/TCRD via `Pharos_get_target`*
#### Approved Drugs & Clinical Tools
| Gene | Drug(s) | Status | Indication | Source |
|------|---------|--------|------------|--------|
| **EGFR** | Erlotinib, Gefitinib, Osimertinib | Approved | NSCLC | DGIdb |
| **CDK2** | Dinaciclib | Phase 2 | Hematologic malignancies | ClinicalTrials.gov |
| **AURKA** | Alisertib | Phase 3 | Lymphoma | ClinicalTrials.gov |
| **WEE1** | Adavosertib | Phase 2 | Solid tumors | ClinicalTrials.gov |
**Clinical Readiness**: 5 genes have approved/late-stage drugs. These represent immediate repurposing opportunities.
*Sources: DGIdb via `DGIdb_get_drug_gene_interactions`, ClinicalTrials.gov*
#### Chemical Probes Available
| Gene | Probe | Potency | Selectivity | Use | Source |
|------|-------|---------|-------------|-----|--------|
| **CDK2** | Roscovitine | IC50 ~200nM | Moderate (pan-CDK) | Tool compound | SGC/Open Targets |
| **HDAC1** | SAHA (Vorinostat) | IC50 ~10nM | Pan-HDAC | Approved drug, research tool | ChEMBL |
*Source: Open Targets via `OpenTargets_get_chemical_probes_by_target_ensemblId`*
#### Non-Druggable Hits Requiring Alternative Strategies
| Gene | Challenge | Recommended Approach |
|------|-----------|---------------------|
| **E2F1** | Transcription factor (no catalytic domain) | PROTACs, molecular glue degraders |
| **RB1** | Tumor suppressor (loss-of-function) | Synthetic lethal approach (e.g., CDK4/6i) |
| **MCM2** | Part of large complex, no pockets | Indirect targeting via cell cycle inhibitors |
**Validation Priority**: Focus on Tclin/Tchem hits with existing tool compounds for faster validation.
PATH 5: Disease Association & Clinical Relevance
Cancer Genomics Integration
def assess_clinical_relevance(tu, gene_list, cancer_type):
"""
Evaluate clinical relevance of hits in target cancer type.
"""
clinical_data = []
for gene in gene_list:
ensembl_id = resolve_gene_to_ensembl(tu, gene)
if ensembl_id:
# Disease associations
diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensemblId(
ensemblId=ensembl_id
)
# Mouse models
mouse = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblId(
ensemblId=ensembl_id
)
# COSMIC mutations (somatic alterations in cancer)
cosmic = tu.tools.COSMIC_get_gene_mutations(gene=gene)
# GTEx expression (is it expressed in relevant tissue?)
gtex = tu.tools.GTEx_get_median_gene_expression(
gencode_id=ensembl_id,
operation="median"
)
clinical_data.append({
'gene': gene,
'diseases': diseases,
'mutations': cosmic,
'expression': gtex,
'mouse_models': mouse
})
return clinical_data
Output for Report:
### 5. Clinical Relevance & Disease Association
#### Cancer Genomic Alterations (COSMIC)
| Gene | Mutation Frequency | Cancer Types (Top 3) | Alteration Type | Evidence |
|------|-------------------|----------------------|-----------------|----------|
| **KRAS** | 22% across all cancers | Pancreatic (90%), Colon (45%), Lung (32%) | Activating mutations | â
â
â
|
| **EGFR** | 8% across all cancers | Lung (15%), Glioma (30%), Breast (2%) | Amplification, mutations | â
â
â
|
| **TP53** | 42% across all cancers | Universal | Loss-of-function | â
â
â
|
**Interpretation**: High mutation frequency indicates gene is driver in those cancer types. CRISPR essentiality + genomic alteration = strong therapeutic rationale.
*Source: COSMIC via `COSMIC_get_gene_mutations`*
#### Expression in Normal vs Tumor Tissue (GTEx/TCGA)
| Gene | Normal Lung (median TPM) | Lung Tumor (TCGA) | Tumor/Normal Ratio | Therapeutic Window |
|------|-------------------------|-------------------|--------------------|--------------------|
| **EGFR** | 8.5 | 45.3 | 5.3x | Moderate |
| **AURKA** | 2.1 | 18.7 | 8.9x | Good |
| **RPS6** | 125.3 | 132.1 | 1.05x | Poor (housekeeping) |
**Interpretation**: Genes with >3x tumor/normal expression offer better therapeutic window. Housekeeping genes (e.g., ribosomal) show poor selectivity.
*Sources: GTEx via `GTEx_get_median_gene_expression`, TCGA data*
#### Prognostic/Predictive Biomarker Status
| Gene | Biomarker Type | Cancer | Association | Evidence | Source |
|------|---------------|--------|-------------|----------|--------|
| **KRAS** | Predictive (negative) | Colorectal | KRAS mut â anti-EGFR resistance | â
â
â
| FDA label |
| **EGFR** | Predictive (positive) | NSCLC | EGFR mut â TKI response | â
â
â
| FDA companion dx |
| **ESR1** | Predictive (positive) | Breast | ESR1 expression â endocrine therapy | â
â
â
| Clinical guidelines |
**Clinical Impact**: 3 genes are established biomarkers with FDA-approved tests. Targeting these genes has strong clinical precedent.
PATH 6: Hit Prioritization & Validation Strategy
Integrate All Evidence Dimensions
def calculate_priority_score(gene_data):
"""
Calculate multi-dimensional priority score.
Components:
- Essentiality strength (DepMap score)
- Selectivity (tissue-specific vs pan-cancer)
- Druggability (Pharos TDL, existing compounds)
- Clinical relevance (mutations, expression, biomarkers)
- Validation feasibility (tool compounds available)
Returns score 0-100
"""
score = 0
# Essentiality (0-30 points)
if gene_data['depmap_score'] < -1.0:
score += 30
elif gene_data['depmap_score'] < -0.5:
score += 20
else:
score += 10
# Selectivity (0-25 points)
if gene_data['selective']: # Tissue-specific
score += 25
elif gene_data['pan_cancer']: # Pan-cancer (deprioritize)
score += 5
# Druggability (0-25 points)
if gene_data['pharos_tdl'] == 'Tclin':
score += 25
elif gene_data['pharos_tdl'] == 'Tchem':
score += 20
elif gene_data['pharos_tdl'] == 'Tbio':
score += 10
else:
score += 5
# Clinical relevance (0-20 points)
if gene_data['mutation_frequency'] > 20:
score += 10
if gene_data['biomarker_status']:
score += 10
return score
Output for Report:
### 6. Hit Prioritization & Validation Recommendations
#### Top 10 Priority Targets (Multi-Dimensional Scoring)
| Rank | Gene | Essentiality | Selectivity | Druggability | Clinical | Total Score | Recommendation |
|------|------|--------------|-------------|--------------|----------|-------------|----------------|
| 1 | **KRAS** | 30/30 | 25/25 | 20/25 | 20/20 | **95/100** | High priority, validated drugs available |
| 2 | **EGFR** | 28/30 | 24/25 | 25/25 | 18/20 | **95/100** | High priority, approved drugs |
| 3 | **WEE1** | 26/30 | 23/25 | 20/25 | 12/20 | **81/100** | Medium-high, Phase 2 drug available |
| 4 | **AURKA** | 24/30 | 22/25 | 20/25 | 14/20 | **80/100** | Medium-high, tool compounds exist |
| 5 | **CDK2** | 25/30 | 20/25 | 20/25 | 10/20 | **75/100** | Medium, multiple tool compounds |
| 6 | **CHEK1** | 23/30 | 21/25 | 18/25 | 10/20 | **72/100** | Medium, chemical probes available |
| 7 | **PLK1** | 22/30 | 20/25 | 18/25 | 11/20 | **71/100** | Medium, clinical tool compounds |
| 8 | **E2F1** | 24/30 | 22/25 | 10/25 | 12/20 | **68/100** | Medium-low, requires degrader strategy |
| 9 | **HDAC1** | 20/30 | 18/25 | 25/25 | 8/20 | **71/100** | Medium, approved HDAC inhibitors |
| 10 | **MCM2** | 28/30 | 10/25 | 5/25 | 8/20 | **51/100** | Low, pan-cancer essential, not druggable |
**Scoring Rubric**:
- **Essentiality** (30 pts): DepMap gene effect score magnitude
- **Selectivity** (25 pts): Tissue-specific vs pan-cancer dependency
- **Druggability** (25 pts): Pharos TDL, existing compounds, tractability
- **Clinical** (20 pts): Mutation frequency, biomarker status, expression
**Priority Tiers**:
- **Tier 1 (Score >80)**: Immediate validation, existing tools/drugs available
- **Tier 2 (Score 60-80)**: Medium priority, validation feasible with chemical probes
- **Tier 3 (Score <60)**: Lower priority or requires novel approaches (PROTACs, etc.)
#### Validation Experiment Recommendations
##### Tier 1 Targets (KRAS, EGFR, WEE1)
**1. KRAS**
- **Essentiality**: Strong selective dependency in KRAS-mutant cancers
- **Validation Approach**:
- Test KRAS G12C inhibitor (sotorasib/adagrasib) in KRAS G12C-mutant cell lines from screen
- Orthogonal validation: siRNA/shRNA knockdown
- Rescue experiment: Re-express WT KRAS in KO cells
- **Expected Outcome**: Growth inhibition/cell death in KRAS-mutant lines only
- **Tool Compounds**: Sotorasib (AMG 510), Adagrasib (MRTX849), MRTX1133 (pan-KRAS)
- **Timeline**: 2-3 weeks for cell line validation
**2. EGFR**
- **Essentiality**: Selective in EGFR-mutant/amplified NSCLC, glioblastoma
- **Validation Approach**:
- Test EGFR TKI panel (erlotinib, osimertinib) in screen cell lines
- Dose-response curves to establish IC50
- Combination with standard chemotherapy
- **Expected Outcome**: Potent inhibition in EGFR-altered lines
- **Tool Compounds**: Erlotinib, Gefitinib, Osimertinib (all FDA-approved)
- **Timeline**: 1-2 weeks
**3. WEE1**
- **Essentiality**: Synthetic lethal with TP53 loss, selective in TP53-mutant cancers
- **Validation Approach**:
- Test adavosertib (WEE1 inhibitor) ± DNA damaging agents
- Stratify by TP53 status (mutant vs WT)
- Cell cycle analysis (premature mitotic entry)
- **Expected Outcome**: Selective killing of TP53-mutant cells + synergy with chemo
- **Tool Compounds**: Adavosertib (AZD1775), PD-166285
- **Timeline**: 2-3 weeks
##### Tier 2 Targets (AURKA, CDK2, CHEK1, PLK1)
**General Strategy**:
- Pharmacological validation with 2-3 selective inhibitors per target
- Orthogonal genetic validation (CRISPRi/shRNA)
- Pathway analysis (Western blots for downstream effectors)
- Combination screens with standard-of-care agents
**Recommended Tool Compounds**:
- **AURKA**: Alisertib (MLN8237), Aurora A Inhibitor I
- **CDK2**: Roscovitine (seliciclib), Dinaciclib
- **CHEK1**: Prexasertib (LY2606368), AZD7762
- **PLK1**: Volasertib (BI 6727), BI 2536
##### Tier 3 Targets - Alternative Validation Strategies
**E2F1, RB1** (Transcription factors):
- **Challenge**: No direct small molecule inhibitors
- **Strategy**:
- Test PROTACs if available
- Indirect validation via upstream targets (CDK4/6 inhibitors for RB pathway)
- Genetic validation only (CRISPRko, CRISPRi)
**MCM2-7 Complex** (Helicase, pan-essential):
- **Challenge**: Pan-cancer essential, poor therapeutic window
- **Strategy**: Deprioritize for drug development
- **Note**: Interesting for understanding replication biology, but not ideal therapeutic target
#### Validation Timeline
| Phase | Duration | Experiments | Deliverable |
|-------|----------|-------------|-------------|
| **Phase 1** | Weeks 1-3 | Tier 1 target validation (KRAS, EGFR, WEE1) | Dose-response curves, potency data |
| **Phase 2** | Weeks 4-6 | Tier 2 target validation (AURKA, CDK2, etc.) | Hit confirmation, selectivity data |
| **Phase 3** | Weeks 7-10 | Mechanism studies, pathway analysis | Western blots, cell cycle, apoptosis |
| **Phase 4** | Weeks 11-14 | Combination studies, in vivo pilot (top 2-3) | Synergy matrices, xenograft data |
#### Success Criteria for Validation
â
**Hit Confirmed** if:
- Pharmacological inhibition phenocopies CRISPR knockout (â¥50% growth inhibition at â¤1 µM)
- Dose-response curve shows IC50 consistent with essentiality score
- Effect is selective (active in screen cell line, inactive in control lines)
- Orthogonal genetic methods (siRNA, CRISPRi) reproduce phenotype
â **Hit Rejected/Deprioritized** if:
- Tool compounds show no effect despite strong CRISPR score (off-target CRISPR effect)
- Pan-cancer essential with no selectivity (poor therapeutic window)
- No druggable domain/strategy (TFs, scaffolds without chemical matter)
- Cannot be validated with available reagents
#### Resource Requirements
**Reagents**:
- Chemical compounds: $5-10K (tool compounds, commercial inhibitors)
- CRISPRi/shRNA validation: $3-5K (vectors, reagents)
- Cell culture & assays: $8-12K (plates, reagents, media)
**Equipment**:
- Cell culture facility (BSL2)
- Plate readers (viability assays, luminescence)
- Flow cytometer (cell cycle, apoptosis)
- Western blot equipment
**Personnel**: 1 postdoc + 1 technician for 3-4 months
---
**Validation Strategy Summary**: Focus validation efforts on Tier 1 targets (KRAS, EGFR, WEE1) with approved/late-stage drugs. These offer fastest path to clinical translation and have highest probability of success based on multi-dimensional scoring.
Report Template (Initial File)
File: CRISPR_screen_analysis_[CONTEXT].md
# CRISPR Screen Analysis Report: [CONTEXT]
**Generated**: [Date]
**Input**: [Gene list / Cancer type / Single gene]
**Context**: [Screen type, cell line, experimental details]
**Status**: In Progress
---
## Executive Summary
[Analyzing...]
<!-- Will contain: key findings, top hits, recommended priorities -->
---
## Input Validation
[Analyzing...]
<!-- Gene symbol validation, invalid genes, suggestions -->
---
## 1. Gene Essentiality Analysis
### 1.1 Strongly Essential Genes
[Analyzing...]
### 1.2 Selectively Essential Genes
[Analyzing...]
### 1.3 Weakly Essential / Non-Essential
[Analyzing...]
---
## 2. Pathway & Functional Enrichment
### 2.1 Pathway Enrichment (Reactome, KEGG)
[Analyzing...]
### 2.2 GO Enrichment (BP, MF, CC)
[Analyzing...]
### 2.3 Hallmark Gene Sets (MSigDB)
[Analyzing...]
### 2.4 Pathway-Level Interpretation
[Analyzing...]
---
## 3. Protein Interaction Network
### 3.1 Network Statistics
[Analyzing...]
### 3.2 Protein Complexes
[Analyzing...]
### 3.3 Hub Genes
[Analyzing...]
### 3.4 Synthetic Lethal Candidates
[Analyzing...]
---
## 4. Druggability Assessment
### 4.1 Target Development Level (Pharos)
[Analyzing...]
### 4.2 Approved Drugs & Clinical Candidates
[Analyzing...]
### 4.3 Chemical Probes
[Analyzing...]
### 4.4 Non-Druggable Hits (Alternative Strategies)
[Analyzing...]
---
## 5. Clinical Relevance
### 5.1 Cancer Genomic Alterations (COSMIC)
[Analyzing...]
### 5.2 Expression in Tumor vs Normal
[Analyzing...]
### 5.3 Prognostic/Predictive Biomarkers
[Analyzing...]
### 5.4 Mouse Models & Genetic Evidence
[Analyzing...]
---
## 6. Hit Prioritization & Validation
### 6.1 Multi-Dimensional Scoring
[Analyzing...]
### 6.2 Top 10 Priority Targets
[Analyzing...]
### 6.3 Validation Experiment Recommendations
[Analyzing...]
### 6.4 Validation Timeline & Resources
[Analyzing...]
---
## 7. Data Sources & Quality Control
### 7.1 Primary Data Sources
[Will be populated...]
### 7.2 Tool Call Summary
[Will be populated...]
### 7.3 Limitations & Caveats
[Will be populated...]
---
## Appendix: Full Hit List
[Complete gene list with all scores...]
When NOT to Use This Skill
- Drug target research (single gene) â Use
tooluniverse-target-researchskill instead - Disease-centric query â Use
tooluniverse-disease-researchskill - Chemical compound screening â Different workflow needed
- RNA-seq differential expression â Use differential expression analysis workflows
- Single gene lookup â Call DepMap tools directly
Use this skill when you have a gene list from a CRISPR screen and need comprehensive functional interpretation + target prioritization.
Example Queries That Trigger This Skill
â Gene List Analysis:
- “Analyze these CRISPR screen hits: EGFR, KRAS, WEE1, PLK1, AURKA, …”
- “I have 50 dropout genes from a CRISPR screen in lung cancer cells, what should I validate?”
- “Prioritize these genes for drug target development: [gene list]”
â Cancer Type Query:
- “What are the top essential genes in pancreatic cancer?”
- “Find druggable dependencies in triple-negative breast cancer”
â Single Gene Validation:
- “Is KRAS a good therapeutic target for lung cancer?”
- “Assess the druggability of WEE1 as a cancer target”
â Not Appropriate:
- “What is the function of EGFR?” â too broad, use target-research skill
- “Find drugs for lung cancer” â disease-centric, use drug-repurposing skill
- “Analyze this RNA-seq data” â different analytical workflow
Key Improvements from Existing Skills
Based on patterns in tooluniverse-target-research and tooluniverse-drug-research:
- Multi-dimensional scoring system (novel for CRISPR analysis)
- Validation experiment recommendations with timelines and reagents
- Tier-based prioritization (Tier 1/2/3 based on actionability)
- Tool compound suggestions for each druggable target
- Synthetic lethal candidate identification from PPI network
- Explicit selectivity analysis (pan-cancer vs tissue-selective)
- Success/failure criteria for validation experiments
Quality Control Checklist
Before finalizing report:
- All input genes validated against DepMap registry
- Essentiality scores retrieved for all valid genes
- Pathway enrichment performed (minimum: GO BP, Reactome, Hallmark)
- PPI network constructed with interaction counts
- Druggability assessed for all hits (Pharos TDL + DGIdb)
- Top 10 priority targets table completed
- Validation recommendations provided for Tier 1 targets
- Evidence grades assigned (â â â , â â â, â ââ)
- All data sources cited explicitly
- “No data” explicitly stated when tools return empty results
- Executive summary synthesizes all findings