VOOZH about

URL: https://thenewstack.io/how-we-built-a-vectordb-powered-cloud-service-in-6-months/

⇱ How We Built a VectorDB-Powered Cloud Service in 6 Months - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-03-01 08:39:31
How We Built a VectorDB-Powered Cloud Service in 6 Months
sponsor-zilliz,sponsored-post-contributed,
Cloud Services / Data / Open Source

How We Built a VectorDB-Powered Cloud Service in 6 Months

Learn from architectural design decisions made when bringing open source vector database technology to the cloud.
Mar 1st, 2024 8:39am by James Luan
👁 Featued image for: How We Built a VectorDB-Powered Cloud Service in 6 Months
Featured image by stefan moertl on Unsplash.
Zilliz sponsored this post.

In May 2022, our open source vector database version Milvus 2.0 was stabilizing following several significant iterations. Simultaneously, our users expressed a resounding desire for a stable, commercially hosted version of the platform. For Zilliz, the company behind Milvus, the stars seemed to align perfectly, as we were armed with a seasoned team of engineers, a product on the cusp of maturity and a fervent user base clamoring for solutions. Fueled by this momentum, we boldly set our sights on an ambitious objective: to unveil our cloud service, Zilliz Cloud, to the world in a mere six months.

Amid a landscape propelled by the exponential growth of large language models (LLMs), we began creating a comprehensive cloud service from the ground up. We gained numerous insights during the 18-month journey of building Zilliz Cloud, a fully managed vector-search service driven by the open source Milvus database. I’ll discuss the design decisions and invaluable lessons learned on this journey in this two-part series.

👁 Milvus architecture

The Milvus architecture

Step 1: Evaluate Existing Capabilities

When we started the project, we conducted an assessment of existing Milvus capabilities and our overarching objectives.

Evaluate Core Technologies

Our base technology, Milvus, is a cloud native, open source vector database built with storage-computing disaggregation and a microservices framework. This design ensured a seamless integration into Kubernetes clusters, facilitating rapid adaptation to diverse cloud production environments.

Assess Deployment Flexibility

We were confident that by leveraging the Kubernetes Operator, Milvus had excellent service deployment capabilities across major public cloud platforms such as Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure. This versatility was corroborated by numerous team members who had successfully implemented production services on these platforms at previous companies, underscoring the Milvus platform’s scalability and compatibility.

Enhance Observability

While Milvus had basic observability features such as monitoring and logging, we recognized we needed to bolster alerting functionalities tailored for production environments. This enhancement was necessary to supply our internal teams and users with real-time insights and proactive measures to ensure uninterrupted service delivery.

Address Service Completeness

Despite significant strides with Milvus, Zilliz Cloud was still nascent, lacking critical components essential in a managed service. These components include user login authentication, metering and billing systems, payment mechanisms, networking infrastructure, security protocols, a comprehensive web console, user-facing API support, resource scheduling capabilities and workflow management tools. Addressing these was central to fortifying our service’s efficacy and appeal to a broader user base.

Step 2: Establish Design Principles

We definitely had our work cut out for us! Now that we determined what was required for a minimum viable product (MVP), the next step was to maximize our team’s efficiency in developing the Zilliz Cloud MVP within a six-month timeframe. Through some introspection and analysis, we distilled a set of foundational design principles to guide our development efforts.

Use Mature Third-Party Products Whenever Possible

To prepare for market entry, we relied on established cloud and third-party services. AWS’s core offerings, including Elastic Kubernetes Service (EKS), Elastic Compute Cloud (EC2), Simple Storage Service (S3), Elastic Block Store (EBS) and Application Load Balancer (ALB), alongside AWS-managed Kafka and Relational Database Service (RDS), formed the basis of our infrastructure. This approach met our immediate needs, avoided “reinventing the wheel” and paved a cost-effective path for potential adaptation to a multicloud environment, expediting our innovation pace.

Unfortunately, we were confronted with compatibility challenges between GCP/Azure messaging queues and managed Kafka services, which led us to develop a distributed log system using Apache Bookkeeper. The absence of reliable, open source, cloud native distributed logging solutions spurred this initiative, and we are considering open sourcing this solution to assist others building cloud services.

Third-party Software-as-a-Service (SaaS) providers also played a pivotal role in accelerating our platform’s development. For instance, we adopted Stripe for payment processing, addressing metering and taxation requirements. To streamline connections with multicloud marketplaces, we integrated Suger.io. Additionally, we assessed billing-service platforms like Orb and Metronome to optimize our billing operations. Auth0 served as our preferred choice for account management and login functionality, with expanded support for Google login. Establishing our operational alerting system on PagerDuty, chosen for its seamless integration with existing monitoring tools and customizable notification rules, further aided our operational efficiency.

Avoid Multiplying Entities Unnecessarily

We embraced a minimalist design ethos that permeated various facets of our product:

  • Architecture simplicity: Initially, our design had over 60 microservices, which posed significant challenges in development and testing. To simplify our architecture, we pared the list down to fewer than 10 core microservices, including user billing, resources, metadata and scheduling. This reduction clarified dependencies and alleviated the testing burden.
  • Functional simplicity: The initial iteration of Zilliz Cloud prioritized core user functionalities such as registration, cluster deployment and billing. Less urgent features like scaling and backups were deliberately deferred to lighten the workload. We did commit to establishing a robust feedback loop, initially through email-based feedback, and later augmenting it with Zendesk integration so prompt and high-quality feedback could guide further improvements.
  • Design simplicity: Our cloud-service design prioritized efficient communication and user engagement potential, necessitating a disciplined and focused approach. Leveraging rapid A/B testing enabled us to swiftly validate features and adapt based on user engagement metrics.

Anticipate Day 2 Challenges from Day 1

For cloud services, we had to evolve swiftly without compromising the reliability of user interfaces and services. Easier said than done! This is like “swapping out jet engines in midair.” Externally, the service appears seamless, while internally a vigorous cycle of innovation and enhancement is unfolding. We learned quickly that embracing an end-in-mind development approach is key in navigating this complex terrain.

Step 3: Develop the Architecture

Driven by our design principles, we successfully reached the milestone of launching our commercial vector-search product within the six-month timeframe, simultaneously securing our initial group of seed customers. Here is the architectural diagram illustrating the framework of our inaugural release.

👁 Vector search architecture

We were able to build quite a robust solution with the following capabilities.

Multicloud support: While we initially centered on AWS, our commitment to cloud agnosticism led us to evaluate compatibility across public cloud providers including GCP and Alibaba Cloud. By leveraging customizations to the open source Crossplane project, we developed a cloud adapter layer to streamline multicloud support and reduce associated costs. This approach facilitated rapid integration with GCP within just one month and paved the way for seamless integration with other public cloud providers.

Security: Zilliz Cloud services place a very high importance on data security. We adhere strictly to cloud identity and access management (IAM) standards, control data access permissions and implement encryption for all data, whether in transit or at rest. Emphasizing network isolation for optimal performance, we opted for AWS’s EKS network add-ons for its efficiency and user-friendliness. By delineating interaction boundaries between the data and control layers, we’ve realized significant cost savings during the rollout of our “bring your own cloud” (BYOC) product.

Resource pooling: Zilliz Cloud Services adheres to the “law of cloud commutativity,” prioritizing elastic scalability through resource pooling. By decoupling storage and computation and employing dynamic load balancing, we enable efficient utilization of cloud resources. This approach enables us to reserve resources only when necessary, significantly enhancing the utilization of Spot Instances and Lambda functions while driving down costs.

Operations friendliness: Zilliz Cloud is designed with developers and operational staff in mind. Featuring a comprehensive graphical user interface (GUI) and advanced monitoring capabilities, the platform offers triple availability-zone disaster recovery and adheres to strict service-level agreements (SLAs), helping to ensure stability and reliability for production environments.

Lessons Learned

I am proud of this architecture and all the work we did as a team. However, despite our success in getting to market in the six-month period we committed to, there were a number of things we didn’t anticipate. I will review the lessons learned in the second part of this series.

Zilliz is a leading vector database company, offering high-performing and scalable solutions. We’re powered by Milvus, the popular open-source vector database that helps companies from any scale build AI-powered search solutions.
Learn More
TRENDING STORIES
James Luan is the vice president of engineering at Zilliz. With a master's degree in computer engineering from Cornell University, he has extensive experience as a database engineer at Oracle, Hedvig and Alibaba Cloud. James played a crucial role in...
Read more from James Luan
Zilliz sponsored this post.
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
👁 Image
Milvus Lite, a lightweight version of the open source vectorDB Milvus, installs easily & integrates with 20+ AI tools.