VOOZH about

URL: https://n8n.io/workflows/12471-generate-consensus-based-answers-using-claude-gpt-grok-and-gemini/

โ‡ฑ Generate consensus-based answers using Claude, GPT, Grok and Gemini | n8n workflow template


Back to Templates

Generate consensus-based answers using Claude, GPT, Grok and Gemini

Last update

Last update 4 months ago

Categories

Share


The original LLM Council concept was introduced by Andrej Karpathy and published as an open-source repository demonstrating multi-model consensus and ranking.
This workflow is my adaptation of that original idea, reimplemented and structured as a production-ready n8n template. Original repository - https://github.com/karpathy/llm-council

This n8n template implements the LLM Council pattern: a single user question is processed in parallel by multiple large language models, independently evaluated by peer models, and then synthesized into one high-quality, consensus-driven final answer.
It is designed for use cases where answer quality, balance, and reduced single-model bias are critical.

๐Ÿ“Œ Section 1: Trigger & Input

โšก When Chat Message Received (Chat Trigger)
Purpose:
Receives a userโ€™s message and initiates the entire workflow.

How it works:

A user sends a chat message

The message is stored as the Original Question

The same input is forwarded simultaneously to multiple LLM pipelines

Why it matters:
Provides a clean, unified entry point for all downstream multi-model logic.

๐Ÿ“Œ Section 2: Stage 1 โ€” Parallel LLM Responses

๐Ÿค– Basic LLM Chains (x4)
Models used:

Anthropic Claude

OpenAI GPT

xAI Grok

Google Gemini

Purpose:
Each model independently generates its own response to the same question.

Key characteristics:

Identical prompt structure for all models

Independent reasoning paths

No shared context between models

Why it matters:
Produces diverse perspectives, reasoning styles, and solution approaches.

๐Ÿ“Œ Section 3: Stage 2 โ€” Response Anonymization

๐Ÿงพ Set Nodes (Response A / B / C / D)
Purpose:
Stores model outputs in an anonymized format:

Response A

Response B

Response C

Response D

Why it matters:
Prevents evaluator models from knowing which LLM authored which response, reducing bias during evaluation.

๐Ÿ“Œ Section 4: Stage 3 โ€” Peer Evaluation & Ranking

๐Ÿ“Š Evaluation Chains (Claude / GPT / Grok / Gemini)
Purpose:
Each model acts as a reviewer and:

Analyzes all four anonymized responses

Describes strengths and weaknesses of each

Produces a strict FINAL RANKING from best to worst

Ranking format (strict):

FINAL RANKING:

  1. Response B
  2. Response A
  3. Response D
  4. Response C

Why it matters:
Creates multiple independent quality assessments from different model perspectives.

๐Ÿ“Œ Section 5: Stage 4 โ€” Ranking Aggregation

๐Ÿงฎ Code Node (JavaScript)
Purpose:
Aggregates all peer rankings by:

Parsing ranking positions

Calculating average position per response

Counting evaluation occurrences

Sorting responses by best average score

Output includes:

Aggregated rankings

Best response label

Best average score

Why it matters:
Transforms subjective rankings into a structured, quantitative consensus.

๐Ÿ“Œ Section 6: Stage 5 โ€” Final Consensus Answer

๐Ÿง  Chairman LLM Chain
Purpose:
One model acts as the Council Chairman and:

Reviews all original responses

Considers peer rankings and aggregated scores

Identifies consensus patterns and disagreements

Produces a single, clear, high-quality final answer

Why it matters:
Delivers a refined response that reflects collective model intelligence rather than a simple average.

๐Ÿ“Š Workflow Overview
Stage Node / Logic Purpose
1 Chat Trigger Receive user question
2 LLM Chains Generate independent responses
3 Set Nodes Anonymize outputs
4 Evaluation Chains Peer review & ranking
5 Code Node Aggregate rankings
6 Chairman LLM Final synthesized answer
๐ŸŽฏ Key Benefits

๐Ÿง  Multi-model intelligence โ€” avoids reliance on a single LLM
โš–๏ธ Reduced bias โ€” anonymized peer evaluation
๐Ÿ“Š Quality-driven selection โ€” ranking-based consensus
๐Ÿ” Modular architecture โ€” easy to add or replace models
๐ŸŒ Language-flexible โ€” input and output languages configurable
๐Ÿงฉ Production-ready logic โ€” clear stages, deterministic ranking

๐Ÿš€ Ideal Use Cases

High-stakes decision support

Complex technical or architectural questions

Strategy and research synthesis

AI assistants requiring higher trust and reliability

Comparing and selecting the best LLM-generated answers