VOOZH about

URL: https://www.geeksforgeeks.org/mongodb/design-nested-documents-for-blogging-app-mongodb/

⇱ How to Design Nested Documents for a Blogging App in MongoDB? - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

How to Design Nested Documents for a Blogging App in MongoDB?

Last Updated : 2 Apr, 2026

Designing nested documents correctly is critical when building a blogging application with MongoDB. In this guide, you will learn how to model posts, comments, and authors using embedded documents while keeping performance and scalability in mind.

Core Concepts Covered:

  • How to decide between embedding and referencing in a blogging app
  • How to design nested documents for posts and comments
  • How to avoid common schema design mistakes
  • How to optimize nested structures for performance
  • How to future-proof your schema for scale

Understanding Blogging Schema Design in MongoDB

  • Blogging platforms seem simple at first.
  • You have posts. Posts have authors. Posts have comments. Comments may have replies.
  • But schema design decisions made early can heavily impact performance, scalability, and flexibility later. In a relational database, you would normalize everything into separate tables and join them at query time.
  • MongoDB works differently: you model data around how your application reads and writes it, not around eliminating redundancy.
  • In this tutorial, you will design a blogging app schema step by step using MongoDB document modeling best practices.

What Entities Exist in a Blogging Application?

Before designing nested documents, you need to identify the core entities your app will work with.

Common entities in a blogging application include:

  • Users: Authors who write posts and readers who comment
  • Blog Posts: The main content unit
  • Comments: Responses to posts, possibly threaded
  • Tags: Labels that categorize posts
  • Reactions: Likes or shares on posts or comments

Here is what these entities look like as plain JavaScript objects in MongoDB on our app:

// Users (authors and readers)
const userEntity = {
_id: ObjectId(),
username: 'jane_doe',
email: 'jane@example.com',
displayName: 'Jane Doe',
createdAt: new Date(),
updatedAt: new Date(),
};

----

// Blog posts
const postEntity = {
_id: ObjectId(),
title: 'My First Post',
slug: 'my-first-post',
content: '...',
authorId: ObjectId(), // or embedded author snapshot
tags: ['mongodb', 'blogging'],
status: 'published',
createdAt: new Date(),
updatedAt: new Date(),
};

---

// Comments
const commentEntity = {
_id: ObjectId(),
postId: ObjectId(),
authorId: ObjectId(),
parentCommentId: null, // for threaded replies
text: 'Great post!',
createdAt: new Date(),
updatedAt: new Date(),
};

The key insight here is that identifying entities is only the first step. In MongoDB, access patterns matter more than normalization. The question is not "how do I eliminate redundancy?" but "how does my app read and write this data?"

For Example: If your app always loads the post author's name alongside every post, embedding a small author snapshot inside the post avoids a second database query on every page load.

When should you Embed Documents in MongoDB?

This is the one of the most common questions developers ask when starting with MongoDB schema design. MongoDB's official data modeling guide also covers how to choose between embedding and referencing.

Embed Documents When:

  • The data is always read together (e.g., a post and its author's display name)
  • The relationship is one-to-few (a post has a handful of tags)
  • The array is small and bounded (it will not grow indefinitely)
  • The embedded data shares the same lifecycle as the parent document

Use References Instead When:

  • The related data grows without bounds (comments on a popular post)
  • The related data is updated independently (a user updates their profile)
  • The related data is large and not always needed (a full user biography)
  • Multiple documents reference the same data (many posts share one author)

The table below summarizes when to choose each approach:

CriteriaEmbedReference
Read togetherAlwaysRarely
Array growthBoundedUnbounded
Data lifecycleSame as parentIndependent
Update frequencyLowHigh
Data sizeSmallLarge

Choosing the right approach upfront saves you from painful schema migrations later.

How should a Blog Post Document be Structured?

Start with a single posts collection. Each document represents one blog post.

Here is a well-structured blog post document:

const post = {
_id: new ObjectId(),
title: 'How to Design Nested Documents',
content: 'Designing nested documents correctly is critical...',
author: {
_id: new ObjectId(),
displayName: 'Jane Doe',
slug: 'jane-doe',
},
tags: ['mongodb', 'schema', 'blogging'],
createdAt: new Date(),
updatedAt: new Date(),
};

Here is why each field is designed this way:

  • Author is Embedded: The author's display name and slug are small, read with every post, and rarely change. Embedding them avoids a second query to the users collection on every post load.
  • Tags is an Array: Tags are bounded (a post rarely has more than ten), filtered and sorted together, and a good candidate for a multikey index.
  • createdAt and updatedAt: Timestamps enable sorting posts by newest first, range queries for archives, and future archival logic.

Note that the author object here is a snapshot, not the full user document. It contains only the fields needed for displaying a post, displayName and slug. The full user profile (email, bio, preferences) lives in the users collection and is referenced by author._id.

Should Comments be Embedded Inside Blog Posts?

The simplest approach is to embed comments directly inside the post document as an array:

const postWithEmbeddedComments = {
_id: new ObjectId(),
title: 'Post with Embedded Comments',
content: '...',
author: { _id: new ObjectId(), displayName: 'Jane', slug: 'jane' },
tags: ['example'],
comments: [
{ _id: new ObjectId(), author: 'Alice', text: 'Nice!', createdAt: new Date() },
{ _id: new ObjectId(), author: 'Bob', text: 'Thanks.', createdAt: new Date() },
],
createdAt: new Date(),
updatedAt: new Date(),
};

This works well for a prototype or for a blog where each post will never receive many comments. Fetching the post and its comments is a single read operation.

But this approach has serious limitations:

  • Comments can grow unbounded. A popular post can receive thousands of comments over time.
  • MongoDB documents are limited to 16MB. An array of thousands of comment objects can approach or exceed this limit.
  • Large arrays hurt write performance. Every time a new comment is added with $push, MongoDB may need to rewrite the entire document on disk.
  • Updating a nested array rewrites the whole document. Even a small change, such as editing one comment's text, triggers a full document write.

Adding a comment looks like this:

await coll.updateOne(
{ _id: postId },
{
$push: {
comments: {
_id: new ObjectId(),
author: 'Charlie',
text: 'Third comment.',
createdAt: new Date(),
},
},
$set: { updatedAt: new Date() },
}
);

This is fine when the array is small. When the comments array contains thousands of entries, this operation becomes expensive on every single comment submission.

How do you Handle Unbounded Comment Growth?

Once you accept that comments can grow without a predictable ceiling, you have two options.

Option 1: Store Comments in a Separate Collection

Move comments out of the post document entirely. Each comment becomes its own document in a dedicated comments collection, linked to its post by a postId field.

// Post document — no embedded comments
await postsColl.insertOne({
_id: postId,
title: 'Post with Separate Comments',
content: '...',
author: { _id: new ObjectId(), displayName: 'Jane', slug: 'jane' },
tags: [],
createdAt: new Date(),
updatedAt: new Date(),
});


// Comment documents — stored separately
await commentsColl.insertOne({
_id: new ObjectId(),
postId, // reference to the parent post
authorId: new ObjectId(),
authorDisplayName: 'Alice',
text: 'Great post!',
createdAt: new Date(),
updatedAt: new Date(),
});

This structure scales without limit. Adding a comment is a cheap single-document insert rather than a full post rewrite.

Indexing is critical here. Create a compound index on postId and _id to support cursor-based pagination:

await commentsColl.createIndex({ postId: 1, _id: 1 });

Use cursor-based pagination instead of skip/limit. With skip, MongoDB must count through all preceding documents on every request, this gets slower as the dataset grows. With cursor-based pagination, you pass the _id of the last document you received, and the next query picks up exactly from that point.

Fetch the first page:

const firstPage = await commentsColl
.find({ postId })
.sort({ _id: 1 })
.limit(20)
.toArray();


const lastId = firstPage.at(-1)?._id;

Fetch the next page by passing lastId as the cursor:

const nextPage = await commentsColl
.find({ postId, _id: { $gt: lastId } })
.sort({ _id: 1 })
.limit(20)
.toArray();

Each query goes straight to the right position in the index. Performance stays constant regardless of how deep into the comment history the reader is.

Option 2: Use a Hybrid Approach

The hybrid approach gives you the best of both worlds: fast post reads and scalable comment storage.

The idea is simple: embed only the most recent comments directly inside the post document, and store all comments (including those recent ones) in the separate comments collection.

const RECENT_COMMENTS_LIMIT = 3;
// Keep the 3 most recent comments embedded in the post
await postsColl.updateOne(
{ _id: postId },
{ $set: { recentComments: recentThree, updatedAt: new Date() } }
);

When a user loads a post, the recentComments array is already in the document, no extra query needed. When a user wants to see older comments, your app queries the comments collection with pagination.

When to choose the hybrid approach:

  • Your app prominently displays a comment preview on the post page
  • You want to minimize queries for the common case (loading a post)
  • You accept the added complexity of keeping recentComments in sync

For most applications, Option 1 (fully separate collection) is simpler and easier to maintain.

How do Nested Replies Affect Schema Design?

Threaded comments, replies to comments, introduce another layer of complexity.

The first approach many people think of is to embed replies inside each comment:

// BAD: recursive embedding
{
_id: ObjectId(),
text: 'First comment',
replies: [
{
text: 'Reply to first',
replies: [
{ text: 'Reply to reply', replies: [ /* ... */ ] }
]
}
]
}

This approach breaks down quickly:

  • Deeply nested arrays are hard to query and update. MongoDB's update operators become complex when targeting elements several levels deep.
  • There is no natural depth limit. A discussion thread can grow arbitrarily deep.
  • Fetching only top-level comments requires unwinding the entire structure.

The recommended approach: use a parentCommentId field.

Store all comments, top-level and replies, as flat documents in the comments collection. Each comment stores a reference to its parent comment (or null if it is top-level). This is essentially the parent references pattern described in MongoDB's tree modeling documentation.

await commentsColl.insertMany([
{
_id: comment1Id,
postId,
parentCommentId: null, // top-level comment
authorDisplayName: 'Alice',
text: 'First comment',
createdAt: new Date(),
},
{
_id: new ObjectId(),
postId,
parentCommentId: comment1Id, // reply to Alice's comment
authorDisplayName: 'Bob',
text: 'Reply to Alice',
createdAt: new Date(),
},
]);

Create two indexes to support the most common queries:

await commentsColl.createIndex({ postId: 1, _id: 1 });

await commentsColl.createIndex({ parentCommentId: 1 });

Fetching top-level comments and their replies becomes straightforward:

// All top-level comments for a post
const topLevel = await commentsColl
.find({ postId, parentCommentId: null })
.sort({ createdAt: 1 })
.toArray();


// Replies to a specific comment
const replies = await commentsColl
.find({ parentCommentId: comment1Id })
.toArray();

This flat structure is flexible, performant, and avoids all the pitfalls of recursive embedding.

How do you Optimize Nested Documents for Performance?

Once your schema is in place, a few additional practices keep it performant at scale.

Index Strategically

Create indexes that match your most frequent queries. MongoDB's indexes documentation explains how different index types affect read and write performance:

// Posts: list by date, filter by author, filter by tag
await posts.createIndex({ createdAt: -1 });
await posts.createIndex({ 'author.slug': 1 });
await posts.createIndex({ tags: 1 });


// Comments: cursor-based pagination per post, replies lookup
await comments.createIndex({ postId: 1, _id: 1 });
await comments.createIndex({ parentCommentId: 1 });

Every query against a collection without a matching index triggers a full collection scan. At scale, this is unacceptable.

Use Projections to Limit Returned Fields

When building a list of posts (e.g., a homepage), you do not need the full content field.

Use projections to fetch only what you need:

const listView = await postsColl
.find({})
.project({ title: 1, 'author.displayName': 1, createdAt: 1, tags: 1 })
.sort({ createdAt: -1 })
.limit(10)
.toArray();

This reduces the amount of data transferred from the database to your application on every request.

Paginate Comments with a Cursor

  • Never load all comments for a post in a single query, and avoid skip/limit for deep pagination. skip forces MongoDB to scan and discard all preceding documents on each request. As the comment count grows, later pages become progressively slower.
  • Use cursor-based pagination instead with the _id field. The _id has a timestamp as part of its composition to exactly help in this kind of situation (see the ObjectId section in the BSON types documentation for details).

Sort by _id, take the _id of the last document on the current page, and pass it as the starting point for the next request:

// First page
const firstPage = await commentsColl
.find({ postId })
.sort({ _id: 1 })
.limit(20)
.project({ text: 1, authorDisplayName: 1, createdAt: 1 })
.toArray();


const lastId = firstPage.at(-1)?._id;
// Next page — no skip, no counting
const nextPage = await commentsColl
.find({ postId, _id: { $gt: lastId } })
.sort({ _id: 1 })
.limit(20)
.project({ text: 1, authorDisplayName: 1, createdAt: 1 })
.toArray();

This works efficiently at any depth because the (postId, _id) compound index lets MongoDB jump directly to the cursor position.

When do you Need Multi-Document Transactions?

MongoDB has supported multi-document ACID transactions since version 4.0. They let you update multiple documents or collections atomically, either all changes succeed, or none of them are applied. You can read more in the MongoDB transactions documentation.

For a blogging application, you might use a transaction when a feature must update several collections while preserving a strict invariant. Examples include incrementing a user's postsCount while inserting a new post, applying monetization or billing logic tied to publishing, or updating denormalized counters across posts and users together.

However, most operations in a well-designed schema only touch a single document at a time, for example, inserting a comment, editing a post, or updating a user's profile. Single-document writes are already atomic in MongoDB and avoid the overhead of full transactions. Design your schema so that transactions are reserved for rare cross-document invariants instead of everyday writes.

Best practices summary:

  • Keep documents focused on one entity
  • Avoid unlimited array growth inside documents
  • Model based on access patterns, not relational habits
  • Design for your most common queries first
  • Add indexes before your data grows large

What are Common Mistakes when Designing Nested Schemas?

Avoid these pitfalls when building your schema.

1. Over-Embedding Everything

It is tempting to put all related data inside one document. But embedding a full user profile, all their posts, and all comments creates a document that is enormous and expensive to maintain.

// BAD: embedding full user profile and all comments in every post
const overEmbeddedPost = {
_id: 1,
title: '...',
author: { _id: 1, name: '...', email: '...', bio: '...', avatarUrl: '...', preferences: {} },
comments: [
{ _id: 1, author: { fullProfile: '...' }, text: '...', replies: [/* unbounded */] },
// ... thousands of comments
],
};

2. Ignoring Document Size Limits

MongoDB documents cannot exceed 16MB. An unbounded comments array will eventually hit this limit, or cause performance issues long before it does.

3. Forgetting Index Strategy

Querying by createdAt or postId without a matching index causes a full collection scan. Always create indexes before your collections grow large.

4. Modeling based on Relational Habits

MongoDB is not a relational database. Normalizing everything into small collections and joining them in your application code misses what makes MongoDB powerful.

5. Not Planning for Comment Growth

Starting with embedded comments is fine for a prototype. But if you do not plan for the transition to a separate collection before your app gets real traffic, you face a painful migration later.

6. Updating large Documents too Frequently

Pushing to a large comments array on every new comment rewrites the entire post document. When the array is large, this becomes a significant write bottleneck.

7. Putting it all Together

You have now walked through every design decision, from identifying entities to handling unbounded comment growth and threading replies. Let's assemble everything into one cohesive schema you can use as a starting point for your own blogging application.

8. The Posts Collection

Each document represents one blog post. The author is an embedded snapshot.

{
_id: ObjectId(),
title: 'How to Design Nested Documents',
slug: 'how-to-design-nested-documents',
content: 'Full article content...',
author: {
_id: ObjectId(), // reference to users collection
displayName: 'Jane Doe',
slug: 'jane-doe',
},
tags: ['mongodb', 'schema', 'blogging'],
createdAt: ISODate(),
updatedAt: ISODate(),
}

Indexes: createdAt descending (list by date), author.slug (posts by author), tags (filter by tag).

9. The Users Collection

Each document is a full user profile. Posts embed a snapshot; the full record lives here.

{
_id: ObjectId(),
username: 'jane_doe',
email: 'jane@example.com',
displayName: 'Jane Doe',
slug: 'jane-doe',
createdAt: ISODate(),
updatedAt: ISODate(),
}

Indexes: slug unique (profile URLs), email unique (authentication).

10. The Comments Collection

Each document is one comment or reply. Top-level comments have parentCommentId: null. Replies reference their parent.

{
_id: ObjectId(),
postId: ObjectId(), // reference to posts collection
authorId: ObjectId(), // reference to users collection
authorDisplayName: 'Alice', // embedded snapshot for display
parentCommentId: null, // null = top-level; ObjectId = reply
text: 'Great article!',
createdAt: ISODate(),
updatedAt: ISODate(),
}

Indexes: (postId, _id) ascending (cursor-based pagination per post), parentCommentId (fetch replies).

How these Collections Interact?

  1. When a post is created, copy the author's displayName and slug from the users collection into the embedded author snapshot.
  2. When a reader adds a comment, insert a new document into the comments collection with the relevant postId.
  3. When a reader replies to a comment, insert a new comments document with the parentCommentId set to the parent comment's _id.
  4. When rendering a post list, use a projection to fetch only title, author.displayName, createdAt, and tags, skip the full content.
  5. When rendering a post detail page, fetch the post in one query and comments in a second paginated query.

This design makes every common operation efficient with a single targeted query.

Setting up indexes

Create all indexes when your application starts or during a setup script:

async function createIndexes(db) {
const posts = db.collection('posts');
const users = db.collection('users');
const comments = db.collection('comments');


await posts.createIndex({ createdAt: -1 });
await posts.createIndex({ 'author.slug': 1 });
await users.createIndex({ slug: 1 }, { unique: true });
await posts.createIndex({ tags: 1 });


await users.createIndex({ slug: 1 }, { unique: true });
await users.createIndex({ email: 1 }, { unique: true });


await comments.createIndex({ postId: 1, _id: -1 });
await comments.createIndex({ parentCommentId: 1 });
}

MongoDB gives you the freedom to model data in a way that fits how your application actually works.

The temptation is to either embed everything (too much data in one document) or reference everything (too many queries). The right answer is almost always in between, embed what is small, bounded, and always read together; reference what grows, changes independently, or is large.

For a blogging application specifically: embed a small author snapshot in each post, but store comments in a separate collection. Use parentCommentId for threaded replies instead of recursive nesting. Index aggressively, paginate comments, and use projections to avoid over-fetching.

These decisions compound over time. A well-designed schema from the start means fewer migrations, lower infrastructure costs, and faster queries as your application grows.

Key Takeaways

  • Embed data that is read together and stays bounded in size
  • Reference data that grows unbounded or has an independent lifecycle
  • Always model around access patterns, not relational normalization habits
  • Avoid deep nesting and uncontrolled array growth inside documents
  • Create indexes on every field you filter, sort, or paginate by
  • Design for long-term growth, comment counts do not stay small

Practical Considerations for MongoDB Schema Design

  • Embed only small, frequently used author fields (like displayName, slug) in posts; keep larger or frequently updated data in a separate users collection.
  • MongoDB documents have a 16MB limit, avoid embedding unbounded data like large comment arrays.
  • Arrays are efficient but should remain limited in size; large, growing arrays can hurt performance.
  • Store comments in a separate collection and use indexed, cursor-based pagination (_id) for better performance.
  • MongoDB allows flexible schema changes, but major structural updates still require data migration, plan your design carefully upfront.
Comment
Article Tags:
Article Tags:

Explore