VOOZH about

URL: https://dev.to/john_mahoney_41e9c2589ceb/how-we-built-real-time-deposition-analysis-with-claudes-streaming-api-4f4h

⇱ How we built real-time deposition analysis with Claude's streaming API - DEV Community


Medical-malpractice plaintiff attorneys spend 3+ hours in expert depositions hunting for two things: admissions they can use at trial, and inconsistencies they can impeach. Both windows close in seconds. If you don't catch them live, you're reading the transcript a week later wishing you had.

We built a live-feed analyzer that watches the deposition stream, runs Claude against every 30-second window, and surfaces real-time signals to the attorney's laptop while they question the witness.

Architecture

Three hops:

  1. Deepgram transcribes the live audio over WebSocket
  2. Our Node WS server buffers transcript into 30-second segments
  3. Claude (Haiku 4.5, streaming) analyzes each segment and returns a 12-key JSON

The JSON is the heart of the system. Every segment returns:

{"medical":{accuracyScore,inaccuracies,accurateStatements,confidence,summary},"daubert":{vulnerabilityScore,vulnerabilities,strengths,overallRisk},"priorTestimony":{inconsistencies,impeachmentOpportunities,summary},"crossExam":{questions,keyWeaknesses,recommendedApproach},"elements":{duty,breach,causation,damages},// each { advanced, quote }"admission":{isAdmission,quote,significance,whyMatters},"evasion":{isEvasive,pattern,escalationScript},"coverage":{topicsCovered,notes},"foundation":{triggers},// FRE 613/803(18)/803(6)/702/30(b)(6)"chartContradiction":{contradicted,witnessClaim,chartEvidence,severity},"literatureHits":[{witnessClaim,pubmedQuery,foundationScript,pubmedUrl}]}

The per-segment loop

// Per segment: single Claude call with the whole expert-witness-specific prompt
const msg = await anthropic.messages.create({
 model: 'claude-haiku-4-5-20251001',
 max_tokens: 8192, // critical — 4096 truncates crossExam mid-stream
 messages: [{ role: 'user', content: buildPrompt(segment, caseContext, chart) }],
});
const analysis = sanitizeResult(JSON.parse(extractJson(msg.content[0].text)));
ws.send(JSON.stringify({ type: 'analysis', analysis, segment }));

Three things we got wrong the first time:

  1. max_tokens=4096 was too small. The 12-key output needs ~6-8K on dense segments. If crossExam is written near the end of the stream, it gets truncated and the UI shows "Cross-examination analysis failed." Bumped to 8192.
  2. Chart context wasn't propagating. chartContradiction can't fire without the chart data in the prompt. We now stash ws._sessionChartContext on a setChartContext WS message before analysis begins.
  3. Cloudflare killed idle WebSockets after 100s. Claude's longer analyses took 45-90s, and during dense segments the WS went silent. Added a 25s keepalive ping from the client.

What we skipped (for now)

  • PDF.js text-layer positioning for the chart contradiction pin (today it's at the file-list row, not the page)
  • Firm-scoped vector index over historical transcripts (cross-case expert inconsistency)
  • Live PubMed API calls — today we generate the search query, the attorney clicks through

Full writeup (including the chart-cross-reference and co-counsel channel) is on our blog at medicalai.law/blog/how-ai-analyzes-deposition-real-time.

Questions welcome.