VOOZH about

URL: https://dev.to/eyanpen/how-falkordb-stores-edges-why-neighbor-lookup-is-odegree-3013

⇱ How FalkorDB Stores Edges: Why Neighbor Lookup Is O(degree) - DEV Community


Many people have a question when they first see FalkorDB's architecture:

It doesn't use traditional adjacency lists but maintains edges with sparse matrices — how does it efficiently find all edges of a given node?

And a follow-up question:

If neighbor data is already stored contiguously, why is the query complexity still O(degree) instead of O(1)?


1. How Traditional Graph Databases Store Edges

Traditional graph databases (like Neo4j) typically use:

Adjacency List

For example:

A -> B
A -> C
A -> D

Internally it looks more like:

A:
 edge1 -> edge2 -> edge3

That is:

  • Each node maintains its own edge linked list
  • To find all edges of a node:

    • Simply traverse the linked list

Therefore the complexity is:

O(degree)

Where:

degree = number of edges

For example:

  • out_degree
    Number of outgoing edges

  • in_degree
    Number of incoming edges


2. FalkorDB Is Completely Different: Sparse Matrix

FalkorDB's core design is not an adjacency list.

It is based on:

  • Sparse Matrix
  • GraphBLAS

to maintain the entire graph.

For example:

A(id=0) -> B(id=1)

Internal representation:

M[0,1] = edge_id

Meaning:

source=0
target=1

An edge exists.


3. One Matrix Per Edge Type

For example:

(:User)-[:FRIEND]->(:User)
(:User)-[:LIKES]->(:Post)

FalkorDB maintains:

FRIEND matrix
LIKES matrix

This way during traversal:

No need to scan the entire graph.


4. How Multi-edges Are Maintained

FalkorDB supports:

A -[:CALL]-> B
A -[:CALL]-> B
A -[:CALL]-> B

Therefore a matrix cell cannot simply be:

M[0,1] = 1

It's more like:

M[0,1] = [3,8,15]

That is:

edge ids

Essentially similar to:

  • sparse tensor
  • compressed adjacency structure

5. How to Efficiently Find Edges?

Many people mistakenly think:

0 0 0 1 0 0 1 1 0

Means:

You must scan the entire row to find the 1s.

That's completely wrong.

Because:

Sparse Matrix Doesn't Store Zeros At All


6. What Does a Sparse Matrix Actually Store?

For example:

[0,0,0,1,0,0,1,1,0]

The actual storage looks more like:

[3,6,7]

Meaning:

index 3 has an edge
index 6 has an edge
index 7 has an edge

Zeros don't exist at all.

Therefore:

Finding neighbors of node A:

neighbors(A) = [3,6,7]

Return directly.


7. CSR / CSC: Industrial-Grade Sparse Matrix Structures

Real implementations typically use:

  • CSR (Compressed Sparse Row)
  • CSC (Compressed Sparse Column)

For example:

Matrix:

A: 0 0 0 1 0 0 1 1
B: 1 0 0 0 0 0 0 0
C: 0 1 0 0 1 0 0 0

CSR might store it as:

indices = [3,6,7,0,1,4]
row_ptr = [0,3,4,6]

Explanation:

  • A's data is at indices[0:3]
  • B's data is at indices[3:4]
  • C's data is at indices[4:6]

So:

Finding all edges of A:

indices[0:3]

Gives us:

[3,6,7]

8. Why Is the Complexity Still O(degree)?

This is the most commonly misunderstood point.

Many people ask:

Since [3,6,7] is already in contiguous memory,
isn't a direct memcpy just O(1)?

The answer:

Locating the array is O(1)

But:

Traversing the array is still O(k)

Where:

k = degree

9. What Does Algorithmic Complexity Actually Measure?

For example:

MATCH(a)-[e]->()
RETURN e

The database doesn't just return:

an array pointer

It must:

  • Traverse each edge
  • Decode the edge object
  • Construct the result set
  • Return to the client

Therefore:

for edge in neighbors:
 emit(edge)

Must execute:

degree times

So the overall complexity is:

O(degree)

10. Output-sensitive Complexity

This is a classic concept:

The size of the output itself counts toward complexity

For example:

If:

A has 1 million edges

Even if:

finding the array start

Only takes:

O(1)

But:

Returning 1 million edges:

Cannot be:

O(1)

Because:

You must at least "look at" each element.


11. Why Is FalkorDB Still Fast?

Because:

[3,6,7]

Is:

  • Contiguous memory
  • Cache-friendly
  • SIMD-friendly

The CPU can:

  • Prefetch
  • Vector load
  • Branch prediction

While traditional adjacency lists:

edge1 -> edge2 -> edge3

Involve:

Pointer chasing

Which causes:

  • Cache misses
  • Memory stalls
  • Branch mispredictions

Therefore:

FalkorDB has clear advantages in:

  • High fan-out traversal
  • Multi-hop pattern matching
  • Graph analytics
  • GraphRAG

scenarios.


12. Neo4j vs FalkorDB: The Essential Difference

Neo4j is more like:

nodes + edge linked lists

Suited for:

  • OLTP
  • Single-hop queries
  • High-frequency edge updates

FalkorDB is more like:

a graph computation engine

Suited for:

  • Multi-hop traversal
  • Pattern matching
  • Graph analytics
  • Vectorized computation

For example:

(A)-[:F]->(B)-[:F]->(C)

Neo4j:

pointer traversal

FalkorDB:

matrix multiply

That is:

F × F

This is its biggest architectural difference.


13. Final Summary

FalkorDB's core philosophy:

Don't store "empty"
Only store "existing edges"

Therefore:

0 0 0 1 0 0 1

Actually becomes:

[3,6]

Querying all edges of a node:

  • Locating adjacency data:

    • O(1)
  • Returning all edges:

    • O(degree)

Where:

degree = number of edges for the current node

Not:

total number of edges in the entire graph

This is the core performance model of a Sparse Matrix graph database.


14. Does Splitting Edges Into Multiple Types vs. a Single Type Affect Query Speed?

A common question:

Since locating edges is O(1) and returning edges is O(degree),
does categorizing edges into one type vs. multiple types affect query speed?

The answer: It depends on whether the query specifies an edge type.


When the Query Specifies an Edge Type

For example:

MATCH(a)-[:FRIEND]->(b) RETURN b

FalkorDB only scans the FRIEND matrix.

If all edges are categorized as a single type (e.g., :REL), the matrix contains all edges, making the degree larger.

Multiple types = smaller matrices = less traversal = faster.


When the Query Does Not Specify an Edge Type

For example:

MATCH(a)-[]->(b) RETURN b

FalkorDB needs to merge results from multiple matrices.

In this case:

  • Total traversal volume is the same (total degree)
  • Multiple types have slight merge overhead
  • Single type traverses one matrix directly

The difference is minimal, approximately no impact.


Summary

Scenario Single Type vs. Multiple Types Impact
Query specifies edge type Multiple types faster Only scans the corresponding matrix, smaller degree
Query does not specify edge type Nearly no difference Same total degree, slight merge overhead with multiple types

Practical modeling recommendation:

Splitting into multiple types is the better practice.
Most real-world queries specify a relationship type, and splitting types significantly reduces the number of edges that need to be traversed.