Command Palette

Search for a command to run...

Graph Databases Are Overkill — Until Suddenly They're Not

The moment your relational schema starts fighting you is the moment you should have already learned Neo4j

Postgres is probably the right answer. For most applications, most of the time, a well-indexed relational database will take you further than you think and cause fewer problems than the alternatives.

This post is not about those applications.

This post is about the specific moment when your JOIN chain is five levels deep, your query planner is sweating, and you realize the data you're modeling is not a table — it's a network. And you've been forcing a network into a spreadsheet this whole time.

The Problem With Modeling Relationships Relationally

Relational databases are excellent at storing entities. They're awkward at storing relationships between entities — especially when those relationships are the point.

Consider a simple example: fraud detection in financial transactions.

You want to answer questions like:

Which accounts have transacted with this flagged account, directly or indirectly?

Are there any shared devices or IP addresses connecting these two users?

What's the shortest path between this vendor and a known fraudulent entity?

In Postgres, answering "indirectly connected accounts up to 3 hops away" looks something like:

WITH RECURSIVE connected AS ( SELECT to_account FROM transactions WHERE from_account = 'A' UNION SELECT t.to_account FROM transactions t INNER JOIN connected c ON t.from_account = c.to_account ) SELECT * FROM connected;

This works. It also gets exponentially slower as depth and data volume increase — because relational databases weren't designed to traverse relationships. They were designed to filter rows.

What a Graph Database Actually Does Differently

A graph database stores data as nodes and edges natively.

A node is an entity — a user, a transaction, a product

An edge is a relationship — PURCHASED, CONNECTED_TO, REPORTED_BY

Both nodes and edges can carry properties

The key difference: in a relational database, relationships are computed at query time via JOINs. In a graph database, relationships are stored as first-class data. Traversal is a read operation, not a computation.

The same fraud query in Cypher (Neo4j's query language):

MATCH path = (a:Account {id: 'A'})-[:TRANSACTED_WITH*1..3]-(connected:Account) RETURN connected, length(path) as hops ORDER BY hops

Cleaner to write. Faster to execute at scale. And — importantly — easier to reason about, because the query shape mirrors the problem shape.

Where Graph Databases Actually Win

Not every problem needs a graph. But some problems are almost impossible to model cleanly without one.

🔗 Relationship-heavy domains

Use Case

Why Graph Wins

Fraud detection

Multi-hop connection traversal at speed

Recommendation engines

"Users like you also liked..." is a graph query

Knowledge graphs

Entities and their semantic relationships

Access control / permissions

Role hierarchies, inherited permissions

Supply chain analysis

Dependency chains, risk propagation

Social networks

Follows, mutual connections, influence mapping

The common thread: the relationships between entities carry meaning, and you need to query across those relationships at depth.

A Real Example — GST Reconciliation

GST reconciliation in India sounds like a spreadsheet problem. On the surface it is: match invoices, find discrepancies, flag mismatches.

But the interesting questions are relational:

Which vendors consistently file late and how does that propagate risk to their buyers?

Are there vendor clusters where a single bad actor affects a network of downstream filers?

What's the trust score of a vendor based on their transaction history and their counterparties' histories?

These are graph questions. Forcing them into SQL means recursive CTEs, self-joins, and query plans that hurt to look at.

Modeling the same data in Neo4j:

(:Vendor)-[:FILED_INVOICE]->(:Invoice)-[:BILLED_TO]->(:Buyer)

(:Vendor)-[:CONNECTED_TO {shared_pan: true}]->(:Vendor)

Suddenly, vendor risk scoring becomes a traversal. Cluster detection becomes a built-in algorithm. The queries read like the problem statement.

The Learning Curve Is Smaller Than You Think

The two things that trip people up:

  1. Thinking in graphs instead of tables

This is the real shift. You stop asking "what tables do I need?" and start asking "what entities exist, and how do they relate?"

A useful exercise: take any feature you're building and draw it on a whiteboard as nodes and arrows. If that drawing maps naturally to your data model, a graph database probably fits.

  1. Cypher syntax

Cypher is Neo4j's query language and it's genuinely readable once it clicks:

-- Find all products purchased by users who also purchased product X MATCH (u:User)-[:PURCHASED]->(x:Product {id: 'X'}) MATCH (u)-[:PURCHASED]->(other:Product) WHERE other.id <> 'X' RETURN other.name, count(u) as co_purchasers ORDER BY co_purchasers DESC

The --> arrows in the query literally represent edges in the graph. Once you internalize that, the rest follows naturally.

When to Stick With Postgres

Graph databases are not a default upgrade. Reach for them when you need them, not before.

Stick with Postgres if:

Your data is primarily tabular and your queries are primarily filters and aggregations

You need strong ACID transactions across complex write operations

Your team knows SQL and the onboarding cost of a new query language isn't worth it

Relationships exist but aren't frequently traversed at depth

Consider a graph if:

You're writing recursive CTEs to answer basic product questions

The word "network," "graph," or "connections" appears in your product spec

Relationship depth and directionality carry semantic meaning

You need built-in graph algorithms — shortest path, centrality, community detection

The Bottom Line

Graph databases are a specialized tool. The mistake isn't using them — it's either reaching for them too early, or never reaching for them at all when the problem clearly calls for it.

The signal is simple:

If the interesting questions in your product are about how things connect, not just what things exist — you're modeling a graph. You might as well store one.

Neo4j has a free tier, excellent documentation, and a sandbox environment you can spin up in minutes. The next time you catch yourself writing a four-level JOIN or a recursive CTE to answer what feels like a simple question — it's worth 30 minutes to model the same problem as a graph and see what happens.

You might be surprised how much cleaner the answer gets.

Comments

Sign in to leave a comment.