VOOZH about

URL: https://dev.to/sendotltd/a-math-expression-parser-in-250-lines-of-javascript-recursive-descent-right-associative-and-ebb

⇱ A Math-Expression Parser in 250 Lines of JavaScript β€” Recursive Descent, Right-Associative ^, and -2^2 = -4 - DEV Community


Every "math evaluator in N lines" article either uses shunting-yard (Dijkstra's stack-based RPN converter) or hand-waves over the hard cases (-2^2, right associativity, error positions). This one uses recursive descent because the grammar and the code map 1-to-1, which makes the whole thing easier to read and write about. ~250 lines of JS, 36 unit tests, supports functions, variables, constants, parentheses, and reports errors with a column number and a caret.

πŸ‘ expression-eval UI: math input

🌐 Demo: https://sen.ltd/portfolio/expression-eval/
πŸ“¦ GitHub: https://github.com/sen-ltd/expression-eval

Why recursive descent (not shunting-yard)

Both work. Shunting-yard is more compact (one for loop, two stacks). But for explaining what's happening, recursive descent wins:

  • Grammar ↔ code is 1-to-1. You can read the BNF and the implementation in parallel and they look the same.
  • Stack traces show the AST shape. When you're debugging "why is -2^2 returning 4 instead of -4", the call stack literally is the parse tree.
  • Error positions are natural. The "current token" is always explicit, so reporting "unexpected ) at column 6" falls out for free.

The grammar:

expression := additive
additive := multiplicative (('+' | '-') multiplicative)*
multiplicative := unary (('*' | '/' | '%') unary)*
unary := ('-' | '+')? power
power := atom ('^' unary)?
atom := number | identifier | identifier '(' args? ')' | '(' expression ')'
args := expression (',' expression)*

Operator precedence is encoded in the nesting depth. That's the whole magic. The deeper a rule is, the tighter it binds.

Trap 1: making ^ right-associative

2^3^2 should be 2^(3^2) = 2^9 = 512, not (2^3)^2 = 64.

For left-associative operators (+ - * / %) the parser builds a left-leaning tree with a while loop:

parseAdditive() {
 let left = this.parseMultiplicative();
 while (current is '+' or '-') {
 const right = this.parseMultiplicative();
 left = { type: "binop", op, left, right }; // accumulate on the LEFT
 }
 return left;
}

For the right-associative ^, switch from loop to recursion:

parsePower() {
 const base = this.parseAtom();
 if (current is '^') {
 consume();
 const exponent = this.parseUnary(); // RECURSE
 return { type: "binop", op: "^", left: base, right: exponent };
 }
 return base;
}

Recursion (not iteration) means the AST for 2^3^2 looks like:

binop ^
 number 2
 binop ^
 number 3
 number 2

DFS evaluation walks the inner 3^2 = 9 first, then 2^9 = 512.

Pin both sides:

test("^ is right-associative", () => {
 assert.equal(evaluateExpression("2^3^2"), 512); // not 64
});

test("parse produces right-associative AST for '2^3^2'", () => {
 const ast = parse("2^3^2");
 assert.equal(ast.op, "^");
 assert.equal(ast.left.value, 2);
 assert.equal(ast.right.op, "^");
 assert.equal(ast.right.left.value, 3);
 assert.equal(ast.right.right.value, 2);
});

Trap 2: -2^2 = -4 (unary minus binds LOOSER than ^)

This is where calculator implementations disagree:

Tool -2^2 Convention
WolframAlpha -4 ^ > unary -
Python (**) -4 ** > unary -
Mathematica -4 ^ > unary -
Excel 4 unary - > ^ ⚠
Google Sheets 4 unary - > ^ ⚠
Mathematical convention -4 ^ > unary -

The math convention is -2^2 = -(2^2) = -4. Excel is the outlier (probably for backward compatibility with VisiCalc-era spreadsheets).

To get this right, ^ must bind tighter than unary -. That's achieved by the grammar nesting:

unary := ('-' | '+')? power ← unary sees '-' first, then drops down to power
power := atom ('^' unary)? ← but power's right-hand side can recurse to unary

Walking -2^2 through the parser:

  1. parseUnary() sees -, consumes it, calls parsePower()
  2. parsePower() calls parseAtom(), reads 2
  3. Next token is ^, consume, recurse to parseUnary() for the exponent
  4. The recursion reads 2
  5. parsePower() returns { ^, 2, 2 }
  6. parseUnary() wraps: { unary -, { ^, 2, 2 } } β†’ evaluates to -(2^2) = -4 βœ“

The reason the right-hand side of ^ recurses to unary (instead of parsePower() or parseAtom()) is to also allow negative exponents: 2^-3 = 1/8. Both rules β€” -2^2 = -4 and 2^-3 = 0.125 β€” fall out of the same single grammar choice.

test("^ binds tighter than * but unary minus binds looser than ^", () => {
 assert.equal(evaluateExpression("-2^2"), -4); // -(2^2)
 assert.equal(evaluateExpression("(-2)^2"), 4); // explicit parens win
 assert.equal(evaluateExpression("2 * 3^2"), 18); // 2 * (3^2)
});

test("^ accepts unary minus on the exponent side", () => {
 assert.equal(evaluateExpression("2^-3"), 0.125); // 2^(-3)
});

If a user wants 4, they write (-2)^2. The parens are non-optional and that's the right design.

Trap 3: numeric literal tokenisation order matters

Numbers seem easy, but the edge cases bite:

  • 42 β€” integer
  • 3.14 β€” float
  • .5 β€” leading-dot float
  • 1e3 β€” scientific (= 1000)
  • 1.5e-2 β€” negative exponent (= 0.015)
  • 1e β€” error: exponent has no digits
  • 1e+ β€” error: sign with no digits

The naΓ―ve "scan while character matches [0-9.eE+-]" approach breaks on 1+2 because the + between numbers gets pulled into the first literal.

The fix is to scan state by state in fixed order:

// 1. integer part (optional if dot follows)
while (i < length && isDigit(source[i])) i++;

// 2. fractional part (optional)
if (source[i] === ".") {
 i++;
 while (i < length && isDigit(source[i])) i++;
}

// 3. exponent (optional)
if (source[i] === "e" || source[i] === "E") {
 i++;
 if (source[i] === "+" || source[i] === "-") i++;
 const expStart = i;
 while (i < length && isDigit(source[i])) i++;
 if (i === expStart) throw new ExpressionError("exponent has no digits", start);
}

The exponent can have a sign, but after the sign you need at least one digit β€” 1e and 1e+ are invalid literals, not "1 followed by an identifier e". Once you've committed to scanning a number, partial exponents are syntax errors:

test("tokenize rejects exponent without digits", () => {
 assert.throws(() => tokenize("1e"), ExpressionError);
 assert.throws(() => tokenize("1e+"), ExpressionError);
});

Trap 4: error positions with a caret

"Syntax error" is hostile. Give the column and a visual pointer:

export class ExpressionError extends Error {
 constructor(message, position) {
 super(message);
 this.name = "ExpressionError";
 this.position = position; // 0-indexed column
 }
}

Inside parseAtom(), throw with the token's position:

throw new ExpressionError(`unexpected "${t.value}"`, t.position);

In the UI, render a caret:

const caret = "".repeat(Math.max(0, position)) + "^";
els.error.textContent = `${msg}\n${source}\n${caret}`;

The user sees:

parse: unexpected ")"
1 + + )
 ^

Pin the position in tests:

test("error positions are reported", () => {
 try {
 parse("1 + + )");
 } catch (e) {
 assert.ok(e instanceof ExpressionError);
 assert.equal(e.position, 6); // the ")" column
 return;
 }
 assert.fail("expected ExpressionError");
});

Function calls as part of atom

When the parser sees an identifier, it peeks at the next token. If it's (, it's a function call:

parseAtom() {
 // ...
 if (t.type === "ident") {
 this.next();
 if (peek is "(") {
 this.next();
 const args = [];
 if (peek is not ")") {
 args.push(this.parseAdditive());
 while (peek is ",") {
 this.next();
 args.push(this.parseAdditive());
 }
 }
 consume(")");
 return { type: "call", name: t.value, args };
 }
 return { type: "ident", name: t.value };
 }
 // ...
}

Variadic functions (min, max, hypot) fall out for free β€” the args loop already accepts any count. The evaluator just spreads:

const FUNCTIONS = {
 min: (...xs) => Math.min(...xs),
 max: (...xs) => Math.max(...xs),
 hypot: (...xs) => Math.hypot(...xs),
 // ...
};

Variables override constants

CONSTANTS = { pi, e, tau, ... } is hard-coded, but user-supplied variables win:

case "ident": {
 if (Object.prototype.hasOwnProperty.call(variables, node.name)) {
 return Number(variables[node.name]); // variables first
 }
 if (Object.prototype.hasOwnProperty.call(CONSTANTS, node.name)) {
 return CONSTANTS[node.name];
 }
 throw new ExpressionError(`undefined identifier "${node.name}"`, 0);
}

Two design choices worth calling out:

  1. hasOwnProperty.call(obj, key) instead of key in obj to avoid prototype pollution β€” without hasOwnProperty, a user variable named __proto__ would be intercepted by the prototype chain.
  2. User variables override constants because the user's intent is more specific. e = 10 makes sense as an override (e.g., "the elasticity coefficient is named e in my problem").
test("variables override constants if same name", () => {
 assert.equal(evaluateExpression("e + 1", { e: 10 }), 11);
});

Edge cases: division by zero, sqrt of negative

I let JavaScript's defaults through:

  • 1 / 0 β†’ Infinity (no throw)
  • sqrt(-1) β†’ NaN
  • min(3, 1, 2) β†’ 1

The display layer formats them:

export function formatResult(value) {
 if (Number.isNaN(value)) return "NaN";
 if (!Number.isFinite(value)) return value > 0 ? "∞" : "βˆ’βˆž";
 if (Number.isInteger(value) && Math.abs(value) < 1e15) return String(value);
 return String(Number(value.toPrecision(12)));
}

Number(value.toPrecision(12)) is the trick to make 0.1 + 0.2 = 0.3 display cleanly. toPrecision(12) rounds floats to 12 significant digits then Number() strips trailing zeros β€” so 0.30000000000000004 shows as 0.3.

TL;DR

  • Recursive descent beats shunting-yard for readability. Grammar maps 1-to-1 to code.
  • Right-associative ^ is recursion (not loop) on the operator's RHS.
  • -2^2 = -4 is the math convention; it falls out naturally if ^ (power) sits a level below unary minus in the grammar.
  • Numeric literals need state-machine scanning (integer β†’ fractional β†’ exponent), not a regex.
  • Errors carry a column and the UI renders a caret. Pin positions in tests.
  • Function calls are atom-level; variadic functions work for free.
  • Variables override constants with hasOwnProperty.call to avoid proto pollution.

Source: https://github.com/sen-ltd/expression-eval β€” MIT, ~250 lines of JS + 36 unit tests, zero dependencies, no build step.


πŸ›  Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.