DCFN-Bio — Changelog

What's changed in the engine's user-visible output, in reverse chronological order. Pre-1.0 versioning convention:

0.x.0 — feature additions, output-shape changes
0.x.Y — quality fixes, prompt refinements, copy updates
1.0.0 — reserved for first paid Tier 1 customer signing a contract

DCFN-Bio is the newest member of the DCFN portfolio (announced 2026-04-29). It applies the same engine substrate that powers DCFN-Patents and DCFN-Research to bio-literature: PubMed corpus, KEGG / Reactome pathway data, structural gap detection in biomarker / mechanism research. Engine pipeline is in active development; current state is the Tier 0 try-me scaffolding with payment + delivery infrastructure live and the substantive engine pipeline pending.

v0.3.1 — 2026-05-01

Methodology Audit polish + ingestion robustness

Polish pass on top of v0.3.0's first end-to-end engine pipeline run. Four targeted fixes; no engine-substrate changes.

Methodology Audit prompt — explicit prohibition on engine-internal IDs/scores. v0.3.0's smoke memo leaked one event svw_001 reference into customer prose. The translation-pass block in decision_memo_synth.py now carries an absolute-prohibition list (no svw_NNN, entropy_node_id, h_001, paperId, pmid:NNNNN, kegg:hsaNNNNN, raw composite=/severity=/confidence= parentheticals, or verbatim use of "entropy node," "golden trajectory," "SVW event," "apriori rule," "bridge node"). Mirrors DCFN-Research v0.5.1's discipline; orphan-year rule already in place.
Pathway ingestion parallelized. fetch_pathways_for_terms() now fans out across terms with a ThreadPoolExecutor (max_workers=6, one per term in the typical fan-out). Soft-fail per-term preserved. v0.3.0 smoke had pathway latency at ~167s of 207s total; expected ~30-40s post-fix.
PubMed ELink retry with exponential backoff. _fetch_links() now retries 3× with 1s/2s/4s backoff on transient NCBI errors (Response ended prematurely, connection resets, 5xx). After 3 attempts the batch returns empty and the main pipeline continues (citation graph degrades gracefully; CTE/SVW handle empty edge sets).
VERSION bump. attribution.py and app.py → 0.3.1. Footer + engine_version surface the new build.

v0.3.0 — 2026-05-01

Engine pipeline goes live — `run_l1_pipeline()` is real, not a stub

Bio crosses from "substrate-with-stub" to "demoable engine." /analyze now invokes the full DCFN engine substrate against the user's biomarker / mechanism / pathway query, mirroring the proven DCFN-Research orchestration: PubMed corpus + KEGG/Reactome pathway anchoring → QEB encoding → concept graph construction → CTE five-operation traversal → apriori + SVW convergence detection → hypothesis generation → Article + Tech Report + Methodology Audit memo.

Engine substrate mirrored from DCFN-Research (proven code, no need to reinvent — every module is domain-agnostic except for the config.py parameters, which Bio adopts as-is for v0.3.0 since the embedding / graph / traversal thresholds are calibrated for static literature corpus analysis):

config.py — copied from Research; QEB / CTE / Apriori / SVW thresholds, embedding dim, temporal-decay constants, golden-token pathfinding weights. PubMed env vars already shared. Bio-specific overrides land in v0.3.x as we measure where bio corpora diverge from the cross-domain corpora Research runs against.
qeb.py — QECO Module 1: Sentence-BERT (or TF-IDF fallback) encoding with three-signal adaptive confidence and epistemic grounding scoring.
concept_graph.py — CTE typed concept graph: 16 node types, 14 edge types, temporal decay, domain tagging, OBI canonical-vs-stale resolution, hidden-citation detection, tooling-OBI signals.
cte_operations.py — CTE five operations (Backward / Forward / Entropy / Branch / Golden Token) plus generate_hypotheses() for the DeepMind Generation gap.
apriori.py — QECO Module 2: Apriori co-occurrence pattern discovery on semantic attributes.
svw.py — QECO Module 3 / DeepMind Social Cognition gap: convergence clusters + convergence anchors.
report_generator.py — Article + Technical Report renderers (markdown + .docx).

New Bio-specific module:

decision_memo_synth.py — Methodology Audit + Hypothesis Validation Protocol + Lead Compound Recommendation memo synthesizer (Charter §19 prescriptive artifact). Persona: drug-discovery program lead at biotech / pharma R&D ($5M-$50M lead-compound bets). Mirrors DCFN-Research's decision_memo_synth.py structure (PersonaSpec dataclass, formatter helpers, .docx renderer with shared brand palette, soft-fail on Claude unavailability) with eight section structure: Executive Finding · Methodology Audit · Hypothesis Validation Protocol · Lead Compound Recommendations · Emerging Consensus · Contested Claims · Recommended Actions Ranked · What the Engine Could Not See. Same prompt-discipline scaffolding as Research v0.5.1 (paired-year-with-paper-title rule, translation-pass for engine jargon, action-led prose paragraphs).

app.py orchestration (run_l1_pipeline()):

Replaces the v0.2.x stub. Per-stage update_session() calls with stage field so the report-page poller can show progress (ingestion_pubmed → ingestion_pathways → qeb_encoding → concept_graph → cte_traversal → apriori → svw → hypotheses → render_reports → methodology_audit_memo → done).
PubMed corpus capped at DCFN_BIO_MAX_ARTICLES (default 150) — bounded for sampler-tier latency. Citation-graph enrichment via PubMed ELink runs in-line.
Pathway anchoring is soft-fail: KEGG / Reactome unreachable, the engine pipeline continues with literature-only context.
Methodology Audit memo is soft-fail: missing ANTHROPIC_API_KEY, Claude rate-limit, render error — Article + Tech Report still produced, session marked ready, memo_skipped_reason carries the explanation.
Persists pubmed_articles.json, pathways.json, dcfn_report.json, Article (.md + .docx + .txt), Tech Report (.md), Methodology Audit memo (.docx + .md sidecar) into data/sessions/{session_id}/.

Pathway adapter:

ingest_pathways.py — added fetch_pathways_for_terms(terms, max_per_term=3) — heuristic-term fan-out (full query + bigrams + standalone tokens) with dedupe by pathway_id. Used by the orchestrator to anchor mechanism claims against pathway topology without forcing the caller to do its own NER.

Version: app.py and attribution.py VERSION 0.2.1 → 0.3.0. Engine version exposed in dcfn_report.json meta as 0.3.0-prototype.

End-to-end smoke test (alpha-synuclein neurodegeneration, 60 articles): 207s wall-clock on local CPU (32s PubMed, 167s pathways, 2.5s QEB, sub-second graph + CTE + apriori + SVW, 100s methodology audit Claude call). Generated 5 hypotheses, 1 convergence event, 30 convergence anchors, 58 entropy nodes, 7 pathways, 2500-word memo. No tracebacks; soft-fails behaved correctly when ANTHROPIC_API_KEY was the gate.

v0.2.1 — 2026-05-03

Remove Render-era `tier_manager.py` (dead code on Cloud Run)

tier_manager.py deleted. Module polled Render's API to swap the service plan from "starter" to "pro" mid-payment. Cloud Run autoscales per service revision config (CPU/memory/GPU all defined statically), so per-plan swapping is N/A.
3 callsites in app.py refactored:
Landing handler — tier_ready now always True (no Render API to consult)
/run Stripe-confirm handler — dropped record_payment + upgrade_to_active; verified-Stripe-cookie set is the only persistence layer
/tier-status endpoint — short-circuits to {"plan":"pro","ready":true}, mirroring Patents v0.7.4's pattern
Engine version bump attribution.VERSION 0.2.0 → 0.2.1.

v0.2.0 — 2026-05-01

Cloud Run + GCS sessions migration — foundation cuts

Mirrors the DCFN-Patents (v0.8.0 → v0.9.0 → v0.6.0) and DCFN-Research (v0.4.0) playbook. Bio's Render URL was never shared publicly, so this is a one-shot cutover (no redirect-branch parallel run). Lays the substrate for Cloud Run deploy + GCS-backed sessions; engine pipeline (PubMed → KEGG/Reactome → concept graph → CTE → Methodology Audit) still pending per CHANGELOG v0.1.0.

Dockerfile base swap. python:3.11-slim → pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime. SentenceTransformer auto-detects torch.cuda.is_available() and uses GPU when present (Cloud Run with --gpu=1 --gpu-type=nvidia-l4 gets ~50-100x speedup on SBERT-heavy steps); falls back to CPU transparently when no GPU is attached. v0.2.0 Cloud Run deploy runs CPU-only (engine pipeline doesn't yet make SBERT load-bearing); GPU attachment follows when concept-graph + CTE traversal land. COPY . . instead of selective copies so new modules land automatically.
New module session_storage.py. LocalSessionStorage / GCSSessionStorage adapter behind a single session_store singleton. Backend selected via DCFN_SESSION_BACKEND={local,gcs}; gcs requires DCFN_SESSION_BUCKET. Defaults to local (no behavior change for local dev). Cloud Run sets DCFN_SESSION_BACKEND=gcs DCFN_SESSION_BUCKET=lef-ai-dcfn-bio-sessions.
New module tier_config.py. Per-tier engine knob overrides (Tier 0 / 1 / 2). Tier 0 baseline preserves existing scaffolding behavior (pubmed_max_results=200 matches ingest_pubmed.DEFAULT_MAX_RESULTS). Tier 1 / 2 cap values are PLACEHOLDERS until the engine pipeline lands and we know which knobs are actually load-bearing for tier differentiation. Resolution: DCFN_TIER env → session_state["tier"] → dcfn_tier cookie → fallback tier_0.
New module attribution.py. Single source of truth for footer attribution: BUILD_NAME, VERSION, LLC_NAME, LLC_DBA, NV_BUSINESS_LICENSE, PATENT_ATTRIBUTIONS, plus render functions. Patent attribution list matches the v0.1.1 site footer copy (CTE / QECO / Consolidated Supplemental / Tesseract Composition).
New module compute_portfolio_topic.py. Tiny Claude call at end of L1 that infers a 1-3 word biotech topic label (e.g. "CRISPR Off-Target", "p53 Pathway") from pubmed_articles.json + pathways.json for use in Drive filenames. Mirrors Patents compute_portfolio_domain.py.
requirements.txt. Added google-cloud-storage>=2.18,<3.0 for GCSSessionStorage (lazy import; LocalSessionStorage doesn't depend on it).
New .github/workflows/docker-publish.yml. WIF-authenticated GitHub Actions workflow that builds the Bio image and pushes to BOTH GHCR (ghcr.io/syntaricodex/lef-dcfn-bio) and Google Artifact Registry (us-central1-docker.pkg.dev/lef-ai/dcfn-bio/dcfn-bio) on every push to main + every v*.*.* tag.

Charter §19 (prescriptive artifact) — open gap

Bio doesn't yet ship a prescriptive deliverable: the engine pipeline that would generate one (Methodology Audit synth, "Hypothesis Validation Protocol" or "Lead Compound Recommendation Memo") isn't built. Bio cannot be marketed at customer-facing volume until that artifact exists per Charter §19. v0.2.0 closes the infra gap; the artifact gap is the next blocker.

Not in this release

No call-site migration yet — attribution.py, tier_config.py, session_storage.py, compute_portfolio_topic.py exist but app.py still uses local data/sessions/ directly. Wiring follows when the engine pipeline lands.
No accounts.py / auth_session.py / usage_tracker.py — Phase 5+ work, gated on engine pipeline shipping.
No Decision Memo synth — Charter §19 artifact lands with engine pipeline.
Render service unchanged. Z will decommission Render in a separate move once satisfied with the Cloud Run service.

v0.1.1 — 2026-04-30

Patent attribution added

The v0.1.0 ship had no patent attribution in the footer — gap caught during a portfolio-wide audit. Bio rides the same substrate as Research and Patents (CTE + QECO + Consolidated Supplemental). Footer now matches the unified attribution shape used by the other DCFN-deployed sites:

"Built on the LEF Ai engine — patented CTE cognitive traversal (App. No. 64/002,205), QECO optimization (App. No. 63/993,979), and structural-discovery substrate (Consolidated Supplemental, App. No. 64/043,294) · 8 U.S. Patents Pending"

Same correction applied to the Firebase brand site's DCFN-Bio card.

v0.1.0 — 2026-04-29

Tier 0 try-me infrastructure

$85 / 30-day Pro tier access window unlocks unlimited Methodology Audit runs during the window. Stripe checkout via verify-on-demand callback (no webhooks). Render API integration auto-upgrades to Pro tier on payment, with a tier-status polling banner during the 2-5 min container rebuild.
Domain-Wide Delegation Drive publishing — generated artifacts upload to a dedicated Drive folder under the configured Workspace user's storage. No service-account quota burden.

Substrate adapters

PubMed E-utilities adapter (ingest_pubmed.py) — per-session PubMed ingest with citation graph enrichment. Pulls papers + abstracts + MeSH terms; uses ELink to populate references and citations PMID lists for each paper, supporting load-bearing-foundation detection in the Methodology Audit. Schema matches DCFN-Research's adapter so any shared downstream module reads it unchanged.
KEGG + Reactome pathway adapter (ingest_pathways.py) — pulls validated pathway topology (gene/protein members, pathway labels, source URLs) from both sources. Anchors mechanism evidence quality grading to actual pathway structure rather than prose claims.

Engine pipeline

Currently a Day-1 stub. Submitting a query through /analyze returns a placeholder generating page; the substantive engine pipeline (concept graph → CTE traversal → Methodology Audit synthesis in biotech-regulatory vocabulary) is in active development. Phase 0 fruit-test target: 2026-05-11 against the p53 response pathway.

Hard scope boundaries

Not a clinical tool. Public + research-grade data only. No identified patient samples. No diagnostic or prognostic claims. The engine's job is gap-surfacing for labs, funders, and regulators — not direct care.
Sampler scope by design. Each Methodology Audit produces a structural read of the user's specific biomarker / mechanism query — not an exhaustive landscape. The cap is a product choice, not a capacity limit.

Changelog