![]() |
VOOZH | about |
Designing nested documents correctly is critical when building a blogging application with MongoDB. In this guide, you will learn how to model posts, comments, and authors using embedded documents while keeping performance and scalability in mind.
Core Concepts Covered:
Before designing nested documents, you need to identify the core entities your app will work with.
Common entities in a blogging application include:
Here is what these entities look like as plain JavaScript objects in MongoDB on our app:
// Users (authors and readers)
const userEntity = {
_id: ObjectId(),
username: 'jane_doe',
email: 'jane@example.com',
displayName: 'Jane Doe',
createdAt: new Date(),
updatedAt: new Date(),
};
----
// Blog posts
const postEntity = {
_id: ObjectId(),
title: 'My First Post',
slug: 'my-first-post',
content: '...',
authorId: ObjectId(), // or embedded author snapshot
tags: ['mongodb', 'blogging'],
status: 'published',
createdAt: new Date(),
updatedAt: new Date(),
};
---
// Comments
const commentEntity = {
_id: ObjectId(),
postId: ObjectId(),
authorId: ObjectId(),
parentCommentId: null, // for threaded replies
text: 'Great post!',
createdAt: new Date(),
updatedAt: new Date(),
};
The key insight here is that identifying entities is only the first step. In MongoDB, access patterns matter more than normalization. The question is not "how do I eliminate redundancy?" but "how does my app read and write this data?"
For Example: If your app always loads the post author's name alongside every post, embedding a small author snapshot inside the post avoids a second database query on every page load.
This is the one of the most common questions developers ask when starting with MongoDB schema design. MongoDB's official data modeling guide also covers how to choose between embedding and referencing.
The table below summarizes when to choose each approach:
| Criteria | Embed | Reference |
|---|---|---|
| Read together | Always | Rarely |
| Array growth | Bounded | Unbounded |
| Data lifecycle | Same as parent | Independent |
| Update frequency | Low | High |
| Data size | Small | Large |
Choosing the right approach upfront saves you from painful schema migrations later.
Start with a single posts collection. Each document represents one blog post.
Here is a well-structured blog post document:
const post = {
_id: new ObjectId(),
title: 'How to Design Nested Documents',
content: 'Designing nested documents correctly is critical...',
author: {
_id: new ObjectId(),
displayName: 'Jane Doe',
slug: 'jane-doe',
},
tags: ['mongodb', 'schema', 'blogging'],
createdAt: new Date(),
updatedAt: new Date(),
};
Here is why each field is designed this way:
Note that the author object here is a snapshot, not the full user document. It contains only the fields needed for displaying a post, displayName and slug. The full user profile (email, bio, preferences) lives in the users collection and is referenced by author._id.
The simplest approach is to embed comments directly inside the post document as an array:
const postWithEmbeddedComments = {
_id: new ObjectId(),
title: 'Post with Embedded Comments',
content: '...',
author: { _id: new ObjectId(), displayName: 'Jane', slug: 'jane' },
tags: ['example'],
comments: [
{ _id: new ObjectId(), author: 'Alice', text: 'Nice!', createdAt: new Date() },
{ _id: new ObjectId(), author: 'Bob', text: 'Thanks.', createdAt: new Date() },
],
createdAt: new Date(),
updatedAt: new Date(),
};
This works well for a prototype or for a blog where each post will never receive many comments. Fetching the post and its comments is a single read operation.
But this approach has serious limitations:
Adding a comment looks like this:
await coll.updateOne(
{ _id: postId },
{
$push: {
comments: {
_id: new ObjectId(),
author: 'Charlie',
text: 'Third comment.',
createdAt: new Date(),
},
},
$set: { updatedAt: new Date() },
}
);
This is fine when the array is small. When the comments array contains thousands of entries, this operation becomes expensive on every single comment submission.
Once you accept that comments can grow without a predictable ceiling, you have two options.
Move comments out of the post document entirely. Each comment becomes its own document in a dedicated comments collection, linked to its post by a postId field.
// Post document — no embedded comments
await postsColl.insertOne({
_id: postId,
title: 'Post with Separate Comments',
content: '...',
author: { _id: new ObjectId(), displayName: 'Jane', slug: 'jane' },
tags: [],
createdAt: new Date(),
updatedAt: new Date(),
});
// Comment documents — stored separately
await commentsColl.insertOne({
_id: new ObjectId(),
postId, // reference to the parent post
authorId: new ObjectId(),
authorDisplayName: 'Alice',
text: 'Great post!',
createdAt: new Date(),
updatedAt: new Date(),
});
This structure scales without limit. Adding a comment is a cheap single-document insert rather than a full post rewrite.
Indexing is critical here. Create a compound index on postId and _id to support cursor-based pagination:
await commentsColl.createIndex({ postId: 1, _id: 1 });
Use cursor-based pagination instead of skip/limit. With skip, MongoDB must count through all preceding documents on every request, this gets slower as the dataset grows. With cursor-based pagination, you pass the _id of the last document you received, and the next query picks up exactly from that point.
Fetch the first page:
const firstPage = await commentsColl
.find({ postId })
.sort({ _id: 1 })
.limit(20)
.toArray();
const lastId = firstPage.at(-1)?._id;
Fetch the next page by passing lastId as the cursor:
const nextPage = await commentsColl
.find({ postId, _id: { $gt: lastId } })
.sort({ _id: 1 })
.limit(20)
.toArray();
Each query goes straight to the right position in the index. Performance stays constant regardless of how deep into the comment history the reader is.
The hybrid approach gives you the best of both worlds: fast post reads and scalable comment storage.
The idea is simple: embed only the most recent comments directly inside the post document, and store all comments (including those recent ones) in the separate comments collection.
const RECENT_COMMENTS_LIMIT = 3;
// Keep the 3 most recent comments embedded in the post
await postsColl.updateOne(
{ _id: postId },
{ $set: { recentComments: recentThree, updatedAt: new Date() } }
);
When a user loads a post, the recentComments array is already in the document, no extra query needed. When a user wants to see older comments, your app queries the comments collection with pagination.
When to choose the hybrid approach:
For most applications, Option 1 (fully separate collection) is simpler and easier to maintain.
Threaded comments, replies to comments, introduce another layer of complexity.
The first approach many people think of is to embed replies inside each comment:
// BAD: recursive embedding
{
_id: ObjectId(),
text: 'First comment',
replies: [
{
text: 'Reply to first',
replies: [
{ text: 'Reply to reply', replies: [ /* ... */ ] }
]
}
]
}
This approach breaks down quickly:
The recommended approach: use a parentCommentId field.
Store all comments, top-level and replies, as flat documents in the comments collection. Each comment stores a reference to its parent comment (or null if it is top-level). This is essentially the parent references pattern described in MongoDB's tree modeling documentation.
await commentsColl.insertMany([
{
_id: comment1Id,
postId,
parentCommentId: null, // top-level comment
authorDisplayName: 'Alice',
text: 'First comment',
createdAt: new Date(),
},
{
_id: new ObjectId(),
postId,
parentCommentId: comment1Id, // reply to Alice's comment
authorDisplayName: 'Bob',
text: 'Reply to Alice',
createdAt: new Date(),
},
]);
Create two indexes to support the most common queries:
await commentsColl.createIndex({ postId: 1, _id: 1 });
await commentsColl.createIndex({ parentCommentId: 1 });
Fetching top-level comments and their replies becomes straightforward:
// All top-level comments for a post
const topLevel = await commentsColl
.find({ postId, parentCommentId: null })
.sort({ createdAt: 1 })
.toArray();
// Replies to a specific comment
const replies = await commentsColl
.find({ parentCommentId: comment1Id })
.toArray();
This flat structure is flexible, performant, and avoids all the pitfalls of recursive embedding.
Once your schema is in place, a few additional practices keep it performant at scale.
Create indexes that match your most frequent queries. MongoDB's indexes documentation explains how different index types affect read and write performance:
// Posts: list by date, filter by author, filter by tag
await posts.createIndex({ createdAt: -1 });
await posts.createIndex({ 'author.slug': 1 });
await posts.createIndex({ tags: 1 });
// Comments: cursor-based pagination per post, replies lookup
await comments.createIndex({ postId: 1, _id: 1 });
await comments.createIndex({ parentCommentId: 1 });
Every query against a collection without a matching index triggers a full collection scan. At scale, this is unacceptable.
When building a list of posts (e.g., a homepage), you do not need the full content field.
Use projections to fetch only what you need:
const listView = await postsColl
.find({})
.project({ title: 1, 'author.displayName': 1, createdAt: 1, tags: 1 })
.sort({ createdAt: -1 })
.limit(10)
.toArray();
This reduces the amount of data transferred from the database to your application on every request.
Sort by _id, take the _id of the last document on the current page, and pass it as the starting point for the next request:
// First page
const firstPage = await commentsColl
.find({ postId })
.sort({ _id: 1 })
.limit(20)
.project({ text: 1, authorDisplayName: 1, createdAt: 1 })
.toArray();
const lastId = firstPage.at(-1)?._id;
// Next page — no skip, no counting
const nextPage = await commentsColl
.find({ postId, _id: { $gt: lastId } })
.sort({ _id: 1 })
.limit(20)
.project({ text: 1, authorDisplayName: 1, createdAt: 1 })
.toArray();
This works efficiently at any depth because the (postId, _id) compound index lets MongoDB jump directly to the cursor position.
MongoDB has supported multi-document ACID transactions since version 4.0. They let you update multiple documents or collections atomically, either all changes succeed, or none of them are applied. You can read more in the MongoDB transactions documentation.
For a blogging application, you might use a transaction when a feature must update several collections while preserving a strict invariant. Examples include incrementing a user's postsCount while inserting a new post, applying monetization or billing logic tied to publishing, or updating denormalized counters across posts and users together.
However, most operations in a well-designed schema only touch a single document at a time, for example, inserting a comment, editing a post, or updating a user's profile. Single-document writes are already atomic in MongoDB and avoid the overhead of full transactions. Design your schema so that transactions are reserved for rare cross-document invariants instead of everyday writes.
Best practices summary:
Avoid these pitfalls when building your schema.
It is tempting to put all related data inside one document. But embedding a full user profile, all their posts, and all comments creates a document that is enormous and expensive to maintain.
// BAD: embedding full user profile and all comments in every post
const overEmbeddedPost = {
_id: 1,
title: '...',
author: { _id: 1, name: '...', email: '...', bio: '...', avatarUrl: '...', preferences: {} },
comments: [
{ _id: 1, author: { fullProfile: '...' }, text: '...', replies: [/* unbounded */] },
// ... thousands of comments
],
};
MongoDB documents cannot exceed 16MB. An unbounded comments array will eventually hit this limit, or cause performance issues long before it does.
Querying by createdAt or postId without a matching index causes a full collection scan. Always create indexes before your collections grow large.
MongoDB is not a relational database. Normalizing everything into small collections and joining them in your application code misses what makes MongoDB powerful.
Starting with embedded comments is fine for a prototype. But if you do not plan for the transition to a separate collection before your app gets real traffic, you face a painful migration later.
Pushing to a large comments array on every new comment rewrites the entire post document. When the array is large, this becomes a significant write bottleneck.
You have now walked through every design decision, from identifying entities to handling unbounded comment growth and threading replies. Let's assemble everything into one cohesive schema you can use as a starting point for your own blogging application.
Each document represents one blog post. The author is an embedded snapshot.
{
_id: ObjectId(),
title: 'How to Design Nested Documents',
slug: 'how-to-design-nested-documents',
content: 'Full article content...',
author: {
_id: ObjectId(), // reference to users collection
displayName: 'Jane Doe',
slug: 'jane-doe',
},
tags: ['mongodb', 'schema', 'blogging'],
createdAt: ISODate(),
updatedAt: ISODate(),
}
Indexes: createdAt descending (list by date), author.slug (posts by author), tags (filter by tag).
Each document is a full user profile. Posts embed a snapshot; the full record lives here.
{
_id: ObjectId(),
username: 'jane_doe',
email: 'jane@example.com',
displayName: 'Jane Doe',
slug: 'jane-doe',
createdAt: ISODate(),
updatedAt: ISODate(),
}
Indexes: slug unique (profile URLs), email unique (authentication).
Each document is one comment or reply. Top-level comments have parentCommentId: null. Replies reference their parent.
{
_id: ObjectId(),
postId: ObjectId(), // reference to posts collection
authorId: ObjectId(), // reference to users collection
authorDisplayName: 'Alice', // embedded snapshot for display
parentCommentId: null, // null = top-level; ObjectId = reply
text: 'Great article!',
createdAt: ISODate(),
updatedAt: ISODate(),
}
Indexes: (postId, _id) ascending (cursor-based pagination per post), parentCommentId (fetch replies).
This design makes every common operation efficient with a single targeted query.
Create all indexes when your application starts or during a setup script:
async function createIndexes(db) {
const posts = db.collection('posts');
const users = db.collection('users');
const comments = db.collection('comments');
await posts.createIndex({ createdAt: -1 });
await posts.createIndex({ 'author.slug': 1 });
await users.createIndex({ slug: 1 }, { unique: true });
await posts.createIndex({ tags: 1 });
await users.createIndex({ slug: 1 }, { unique: true });
await users.createIndex({ email: 1 }, { unique: true });
await comments.createIndex({ postId: 1, _id: -1 });
await comments.createIndex({ parentCommentId: 1 });
}
MongoDB gives you the freedom to model data in a way that fits how your application actually works.
The temptation is to either embed everything (too much data in one document) or reference everything (too many queries). The right answer is almost always in between, embed what is small, bounded, and always read together; reference what grows, changes independently, or is large.
For a blogging application specifically: embed a small author snapshot in each post, but store comments in a separate collection. Use parentCommentId for threaded replies instead of recursive nesting. Index aggressively, paginate comments, and use projections to avoid over-fetching.
These decisions compound over time. A well-designed schema from the start means fewer migrations, lower infrastructure costs, and faster queries as your application grows.
displayName, slug) in posts; keep larger or frequently updated data in a separate users collection._id) for better performance.