![]() |
VOOZH | about |
This Architecture is widely used in many big tech companies as it takes advantage of both real-time data processing as well as batch processing i.e. one can query both fresh data by real-time data processing technique and historical data using batch processing data technique.
Important Topics for Lambda Architecture
Lambda architecture is an excellent architecture for handling massive real-time data and building fault-tolerant, scalable systems.
Lambda ( λ ) architecture is one of 3 big data architecture patterns. Apart from batch and stream processing, Lambda architecture also includes a data serving layer for responding to user queries.
There are two approaches to Lambda Architecture:
Lambda Architecture has mainly three layers to process big data:
👁 Layer-of-Lambda-Architecture
Batch Layer operates on the complete data and thus allows the system to produce the most accurate results. However, the results come at the cost of high latency due to high computation time.
The batch layer stores the raw data as it arrives and computes the batch views for consumption. Naturally, batch processes will occur at some interval and will be long-lived. The scope of data is anywhere from minutes to years.
Stream Layer operates on the real-time data to complement the batch views. It receives the arriving data from various clients and performs incremental updates to the batch layer results and store them in processed data Database.
This layer generates results in a low-latency, near real-time fashion. By implementing incremental algorithms(like insertion sort) at the Stream layer, the computation cost can be significantly reduced. The batch views may be processed with more complex or expensive rules and takes more time but has better data quality and less skew, while the real-time views processed simply by incoming traffic give you access to the latest possible data.
Serving Layer is a server or a set of servers which processes output of various queries from different modules(like analytics module, Notification module) using the results sent from the batch and speed layers.
The outputs from the batch layer in the form of batch views and the speed layer in the form of near-real-time views are stored in the Processed Data DB as well as sent to serving layer, and this output is used by the serving layer to compute the queries on an ad-hoc basis and the database is used by the serving layer to compute the queries on premeditated basis.
Lambda architecture is a flexible and powerful architecture. It is used by many tech companies to process the data they need to drive their most critical decisions and initiatives.
Although this post described the architecture and tradeoffs at a high level, in a real production environment there are many more considerations, like - Scalability, Consistency, Fault Tolerance and Operational requirements.