A technical deep dive · Applied R&D, 2024–2025

Insight Engine.

A production-oriented KG-RAG prototype for messy organizational and community data. Hybrid Graph + RAG retrieval over Neo4j and pgvector, with citation-grounded answer synthesis and an evidence chain back to source for every claim it makes.

By Sam Fath Vancouver, BC ~18 min read
I · Origin

I built this because I lived the problem.

During a major organizational transition at UBC, I became the “Human RAG System.” I spent months manually retrieving answers from 500+ scattered documents to onboard 350 staff. I realized that institutional memory is fragile, and when it walks out the door, the cost is catastrophic.

From the field notes · UBC, 2023

Generic AI defaults to the wisdom of crowds. Useful for general knowledge, problematic for organizations whose competitive advantage is their specific, hard-won expertise. Ask a generic chatbot how your team handles X and it will tell you what the internet thinks, not what your organization actually knows.

Traditional search isn’t an alternative either. It requires knowing what to look for. When institutional knowledge is fragmented across hundreds of documents, even experienced staff struggle to find context quickly, and new hires have no starting point at all. The Insight Engine is the architectural answer to the problem I watched go unanswered: how do you build AI that preserves what an organization actually knows, traceable to source for every claim?

II · The architecture

KG-RAG with Schema-on-Edge.

Vector RAG retrieves chunks that look similar to a query. That works for paraphrase. It fails for questions like how does X depend on Y, where the answer requires walking a chain of relationships, not pattern-matching on tokens.

The Insight Engine combines vector retrieval (pgvector, 768-dim Gemini embeddings) with graph traversal (Neo4j) over an automatically-extracted knowledge graph. But the architectural choice that matters most is Schema-on-Edge: preserving rich evidence at the relationship level rather than collapsing it into a generic edge type.

Figure i · An edge in the knowledge graph
{
  source: "concept_uuid",
  target: "concept_uuid",
  label: "REQUESTS",
  properties: {
    verbatim: "Users have been asking for...",
    emotion: "frustration",
    confidence: 0.87,
    source_chunk_id: "chunk_uuid",
    intelligence_type: "DEMAND_SIGNAL"
  }
}

The difference between X is related to Y and X CAUSES Y per chunk_837 at confidence 0.87, verbatim “directly led to” is the difference between guessing and verifying. Every answer the system gives traces back to a specific person or document making a specific claim.

The bet behind the whole architecture: solve KG-RAG once, then re-aim it at different corpora. The same architecture later powered Headwater’s community intelligence pipeline. Same ingestion, same Schema-on-Edge graph, same citation discipline, applied to a different corpus.

III · The methodological pattern

The pancake method.

Layered knowledge synthesis: top-down inference at volume, stacked with bottom-up knowledge-graph construction. Each direction catches what the other misses.

Bottom-up alone produces strong citations but loses pattern legibility. Top-down alone produces patterns ungrounded in the source data.

Stacked, they enable citation-backed synthesis with both pattern coherence and source-level traceability, with a quantitative scaffold computed first to constrain the narrative layer to be grounded.

The stack
Top, bucketed inference
Large-scale pattern extraction at volume.
↓ informs · ↑ grounds
Quantitative scaffold
Computed metrics constrain the narrative layer.
↑ feeds · ↓ verifies
Bottom, knowledge graph
Entity-and-relationship extraction with citation paths preserved.
IV · The pipeline

From document to answer, with citations all the way down.

Documents enter the system and flow through a queue-based pipeline of ten worker services. Each stage is a separate process; failures don’t cascade; intermediate state is observable. The output is a unified knowledge layer that powers Q&A, exploration, and analytics.

Figure ii

The Information Factory

Insight Engine, Information Factory A diagram of how raw documents flow through four stages (chunking, multi-layer parallel classification, entity resolution, and cross-document inference) and emerge as a Schema-on-Edge knowledge graph where every edge carries verbatim quote, emotion, confidence, source chunk, and intelligence type metadata. The Information Factory How raw documents become a Schema-on-Edge knowledge graph PROCESS Raw input PDFs Comments Transcripts CSV / JSON unsorted, unstructured STAGE 1 Chunk sentence-aware splits via tiktoken cl100k_base + token counts + chunk hash STAGE 2 Classify parallel processors 9 entity types · Concept, Actor … 12+ relationship types 100+ emotion taxonomy 6 moral foundations (MFT) sentiment / valence linguistic biomarkers + rich classification per chunk STAGE 3 Resolve canonical forms, ontology rules, duplicate merge + canonical IDs + deduplication STAGE 4 Connect cross-document bridges, transitive validation + inferred edges + confidence scores RESULT · SCHEMA-ON-EDGE GRAPH Every edge carries the evidence that justified it. Every traversal is auditable. "Users" ACTOR REQUESTS "Feature X" CONCEPT Edge metadata, attached at extraction: verbatim "Users have been asking for…" emotion frustration confidence 0.87 source chunk_uuid_837 type DEMAND_SIGNAL
V · The interfaces

Four ways to interrogate the corpus.

Real screens from the system. Each addresses a different reading mode: structural exploration, direct Q&A, deliberative synthesis, and high-level cluster topography.

Knowledge Graph Explorer interface showing GitLab as a central node with ~60 direct neighbors, thematic clusters in a sidebar, and an AI-generated summary panel

Figure iii · The Knowledge Graph Explorer with a GitLab-onboarding corpus loaded. Central node selected, 60 direct neighbours visualised, thematic clusters listed at left.

Interface 1 of 4 · Structural exploration

Knowledge Graph Explorer.

Interactive force-directed graph of concepts, documents, and inferred bridge edges. Filter by node type, relationship type, source document, confidence threshold, or minimum degree. Click a node to see its AI-generated summary and direct neighbours.

Heavy filtering computation runs in a Web Worker so the canvas stays responsive even at 100K+ nodes. Virtualization (react-window) handles search across the full node set. Thematic clusters in the sidebar come from UMAP projection plus agglomerative clustering, with cluster labels synthesized by the unification worker.

Q&A Insight Interface showing a Hybrid Graph+RAG mode answer with citation chips, source count, and per-query cost tracking

Figure iv · The Q&A interface in Hybrid mode. Numbered citations point to specific source chunks; per-query cost shown explicitly.

Interface 2 of 4 · Direct Q&A

Hybrid Graph + RAG Q&A.

Server-Sent Events stream tokens, citations, and cost data as the answer is built. The hybrid mode runs vector retrieval over pgvector and graph traversal over Neo4j in parallel, then synthesizes a citation-grounded response.

A worked example: If I’m joining the @gl-database group, what access steps and responsibilities do I have, and where are they documented? The system pulls four sources across two retrieval paths, synthesizes a stepwise answer, and tags each claim with a numbered citation pointing back to source chunks.

Every answer is auditable. No claim appears that doesn’t trace to a chunk_id you can open and read.

Multi-Agent Strategic Roundtable interface showing role-bounded agents on different base models with iterations, creativity, and role-bounding configuration

Figure v · The Multi-Agent Roundtable. Role-bounded agents on different base models deliberate; the Compiler produces a final synthesis paired with an explicit uncertainty map.

Interface 3 of 4 · Deliberative synthesis (research mode)

Multi-Agent Strategic Roundtable.

A research interface that runs role-bounded agents (Skeptic, Synthesiser, Compiler) on different base models (Claude, GPT-4o, Gemini 2.5 Pro) against a single question. Configurable iterations, creativity, role-bounding strength, and compiler length.

Output: a final synthesis paired with an explicit uncertainty map listing the disagreements between agents. The point isn’t agent consensus. It’s making the disagreement legible. Useful for technical and product trade-off analysis where the question deserves more than a single model’s first answer.

Conceptual Manifold view showing thematic clusters projected into 2D embedding space

Figure vi · The Conceptual Manifold. Thematic clusters projected into 2D, bridge nodes highlighted, the shape of the corpus made legible.

Interface 4 of 4 · High-level topography

Conceptual Manifold.

Embedding-space projection of the entire corpus into 2D, with concepts coloured by thematic cluster and bridge nodes highlighted. Built from SuperVectors (768-dim semantic + 128-dim Node2Vec structural), reduced via UMAP with adaptive parameters per corpus size.

The manifold answers a different question than the graph explorer. The graph shows you specific connections; the manifold shows you the shape of the corpus: which themes are dense, which are sparse, where the bridges are, what’s truly orthogonal.

VI · The discovery layer

From search engine to research partner.

After ingestion, a dedicated discovery-worker runs cross-document inference: finding semantic bridges between concepts that appear in unrelated documents, calculating similarity, identifying merge candidates, and building a corpus-level connection graph. A separate validation-worker then runs LLM verification on inferred edges and assigns confidence scores.

At query time, the system traverses these inferred bridges automatically. A question that touches a concept in Document A pulls in context from Documents B and C if the discovery layer found relevant connections, even when the user wouldn’t have known to look there.

A search engine returns what you asked for. A research partner finds the things you should have asked about.

VII · Scale & stack

Designed to stay interactive at 100K+ nodes.

The “hairball problem” (large graphs collapsing into unreadable visual noise) is solved at three layers: rendering, computation, and clustering.

Layer 1 · Rendering

Fluid interactivity.

Heavy filtering computation runs in a dedicated Web Worker, off the main thread. Sets are serialized to arrays before posting (a non-obvious gotcha that breaks naive implementations). The canvas stays responsive even when filters update on every keystroke.

Layer 2 · Computation

Adaptive ML parameters.

Node2Vec walk length, walk count, and clustering algorithm all switch based on graph size. Below 1,500 nodes: agglomerative clustering with Ward linkage. Above: MiniBatchKMeans. Above 5,000: dimensionality reduction parameters drop to keep memory bounded.

Layer 3 · Clustering

Tiny cluster merging.

UMAP plus naive clustering produces a long tail of micro-clusters that add noise without insight. The pipeline merges clusters below a size threshold into their nearest neighbour by centroid distance. The result is a smaller, readable set of meaningful themes rather than a fragmented mess.

Polyglot by design. Node.js for API and orchestration where async I/O dominates. Python for the ML pipeline where the libraries live. Three databases because each does one thing well: pgvector for semantic similarity, Neo4j for graph traversal, Redis for queues and caching. Ten worker services coordinated through queue-based orchestration so failures isolate and intermediate state stays observable.

An honest scope note: I’m an architect, not a software engineer. I build using AI-assisted development: making the architectural decisions, debugging integrations, and understanding the systems while using AI to accelerate the code. The Insight Engine is a working prototype that demonstrates the architectural bet is sound. What remains is engineering execution: hardening, multi-tenancy, observability, deployment automation, edge-case handling under adversarial input. Those are real items, but they’re a resource question rather than a conceptual one.

VIII · Architectural decisions

Why this, not that.

Four architectural choices that mattered, and the reasoning behind each. The reasoning is more important than the conclusion. For a different problem, a different choice would be right.

i.

Hybrid Graph + RAG, not pure vector RAG.

Pure vector retrieval is excellent at paraphrase and semantic similarity. It struggles with relational questions (how does X depend on Y) because those answers live in connections, not in any single chunk. Graph traversal handles the connections; vector retrieval handles the surface phrasing. The Insight Engine runs both in parallel and synthesizes. Most production RAG systems pick one and pay the cost of what they miss.

ii.

Tiered LLM routing, not a single frontier model.

Most extraction work (entity tagging, relationship typing, basic structure) is bounded and repetitive. Reserving a frontier model for those tasks is overkill. The router sends routine extraction to Gemini Flash and reserves Pro for transitive validation, complex reasoning, and high-stakes synthesis. Result: roughly 60% reduction in API spend with no measurable accuracy regression on the eval harness. The pattern matters more than the specific models. This is right tool for each subtask, not cheap model everywhere.

iii.

Schema-on-Edge, not collapsed metadata.

Most knowledge graphs simplify edges to RELATED_TO with the rich metadata stripped or stored in a sidecar. The Insight Engine preserves verbatim quote, emotional context, confidence, source chunk ID, and intelligence type at the edge itself. The cost is a fatter graph. The benefit is that every traversal carries its own evidence: any answer the system gives can be traced to a specific person on a specific document making a specific claim. The metadata is the proof of work.

iv.

Polyglot Node + Python, not a monolithic stack.

Each language earns its place. Node.js owns the API surface, queue orchestration, and worker coordination: areas where async I/O dominates and the JavaScript ecosystem is mature. Python owns the ML pipeline (Node2Vec, UMAP, agglomerative clustering, NetworkX) where the libraries are uniquely strong. The two communicate via subprocess execution coordinated by BullMQ, which is uglier than a single-language stack but lets each side use its native idioms. The wrong choice is forcing one language to do both jobs badly.

IX · Evaluation

How I know it works.

Evaluation has two layers. First, established public multi-hop QA benchmarks for retrieval against comparable baselines, checking whether the system handles questions that span sources rather than living inside any one chunk. Second, a 30-query custom harness designed to stress-test the failure modes that standard benchmarks under-test: citation faithfulness (does the answer actually reflect the cited chunks?), out-of-corpus refusal behaviour (does it correctly say I don’t know instead of confabulating?), and multi-hop reasoning chains (does it traverse two and three hops correctly?).

The custom harness is small by design. Its purpose isn’t statistical power; it’s targeted regression-testing on the specific shapes of failure I cared about during development. Each query is paired with a known-good answer, expected citation set, and expected refusal behaviour where applicable.

X · Use cases & what remains

Validated across three corpora.

Tested against three corpora with very different shapes: financial due diligence packets, academic research synthesis, and property management regulatory compliance documentation. The architecture is the same; only the corpus changes.

Accelerated onboarding.

Use case · 01

Reduces time-to-competency by giving new hires instant, citation-backed answers from the entire knowledge base. Reduces dependency on senior staff availability.

What’s our process for X? answered in seconds, with sources, instead of finding the right person.

Auditable AI assistants.

Use case · 02

The KG acts as structured ground truth. Precise citations enable verification. Constrained retrieval reduces hallucination. Suitable for environments where every claim must be defensible: compliance, regulated workflows, anywhere “wrong but confident” is unacceptable.

Every claim traces to a specific source. Audit trails come for free.

Living knowledge management.

Use case · 03

Static repositories become living knowledge bases that evolve as documents are added. Visual analysis reveals knowledge gaps. The KG re-canonicalizes as new content arrives.

Organizational memory that survives staff turnover instead of leaving with the people who held it.

The core architectural problems have been de-risked: automated knowledge extraction, entity resolution, cross-document synthesis, graph traversal at 100K+ nodes, citation-grounded answer synthesis, adaptive ML parameters, and cost control.

What remains is engineering execution. For organizations evaluating this approach, the prototype demonstrates that the architectural bet is sound. For evaluators of my work, it demonstrates that the substantive design decisions (what the system is, not just whether the buttons work) are mine.

Continue Portfolio & case studies  →
Working together

Three paths, depending on what you need.

Currently leading at ZKXP Innovation and running Headwater. Most project work flows through either ZKXP or Headwater, depending on whether the need is organizational AI adoption or audience/community intelligence. Here is where to go for what.

For general conversations, role inquiries, or ambiguous projects, write me directly at samcfath@gmail.com.

i.

In-house roles

For roles where AI adoption, knowledge systems, and adult-learning practice need to live in the same person.

LinkedIn →
ii.

Engagement inquiries

Project work routes through ZKXP Innovation, where the engineering, data, and AI capacity I draw on actually lives.

zkxp.xyz →
iii.

Intelligence engagements

Audience and community analysis routes through Headwater. A Creator Program for educators and online creators; Brands & Studios for enterprise.

headwater.cc →
Vancouver, BC