VOOZH about

URL: https://deepwiki.com/hypervel/bus/9-data-storage-and-persistence

⇱ Data Storage and Persistence | hypervel/bus | DeepWiki


Loading...
Menu

Data Storage and Persistence

Purpose and Scope

This document introduces the data storage and persistence system for batch processing in hypervel/bus. It explains why batches require persistent storage, describes the repository pattern abstraction used to separate storage concerns from business logic, and provides an overview of the contract/implementation architecture.

The batch system tracks long-running operations that span multiple job executions across distributed workers. This requires durable state storage that survives process restarts and enables coordination between workers. This page covers the architectural foundations of the persistence layer.

For detailed documentation of the repository interface methods, see Batch Repository Interface. For implementation details of the database storage backend, see Database Implementation. For cleanup and maintenance operations, see Pruning Old Batch Data.


Why Batches Need Persistent Storage

Batches coordinate the execution of multiple jobs that may run on different workers, at different times, potentially across server restarts. The batch system must maintain state that includes:

  • Job Counts: Total jobs, pending jobs, and failed jobs
  • Execution Status: Whether the batch is running, finished, or cancelled
  • Failure Tracking: IDs of failed jobs for debugging and retry logic
  • Lifecycle Callbacks: Serialized closures for then/catch/finally hooks
  • Metadata: Batch name, creation time, completion time, cancellation time

Without persistent storage, this state would be lost when a worker process terminates, making it impossible to:

  • Track batch completion across multiple job executions
  • Execute lifecycle callbacks when all jobs finish
  • Allow jobs to query their parent batch status
  • Provide monitoring and debugging capabilities
  • Implement batch cancellation that affects all jobs

Sources: src/DatabaseBatchRepository.php68-86 src/Contracts/BatchRepository.php12-70


Repository Pattern Architecture

The persistence layer uses the repository pattern to provide a clean abstraction between batch business logic and storage implementation. This separation enables:

  1. Storage Backend Flexibility: Swap implementations (database, Redis, file system) without changing batch logic
  2. Testability: Mock the repository in tests without requiring actual storage
  3. Contract-First Design: Interface defines capabilities independent of implementation details

Contract Hierarchy


The BatchRepository interface src/Contracts/BatchRepository.php12-70 defines core operations for storing and managing batch state. The PrunableBatchRepository interface extends it with cleanup methods for removing old batch records. The DatabaseBatchRepository class src/DatabaseBatchRepository.php17-335 provides a concrete implementation using database tables.

Sources: src/Contracts/BatchRepository.php1-71 src/DatabaseBatchRepository.php1-20


Storage Architecture

The persistence system consists of three layers:

1. Contract Layer

Defines behavioral contracts that any batch repository must implement:

ContractPurposeMethods
BatchRepositoryCore CRUD and state managementget, find, store, increment*, decrement*, markAsFinished, cancel, delete, transaction, rollBack
PrunableBatchRepositoryCleanup operations for old dataprune, pruneUnfinished, pruneCancelled

2. Implementation Layer

Provides concrete storage backends:

  • DatabaseBatchRepository: Stores batch state in a database table with atomic updates using row-level locking

3. Factory Layer

The BatchFactory instantiates Batch objects from stored data, decoupling persistence format from domain objects. The factory is injected into the repository src/DatabaseBatchRepository.php27-33 and used to convert raw database records into Batch instances src/DatabaseBatchRepository.php301-316


Sources: src/DatabaseBatchRepository.php27-33 src/DatabaseBatchRepository.php301-316


Batch Data Model

Each stored batch record contains the following fields:

FieldTypePurpose
idstringUnique identifier (ordered UUID)
namestringHuman-readable batch name
total_jobsintTotal number of jobs in the batch
pending_jobsintNumber of jobs not yet completed
failed_jobsintNumber of jobs that failed
failed_job_idsJSON arrayList of failed job IDs for debugging
optionsserializedBatch configuration and callbacks
created_attimestampWhen the batch was created
cancelled_attimestamp (nullable)When the batch was cancelled
finished_attimestamp (nullable)When the batch completed

When a PendingBatch is stored src/DatabaseBatchRepository.php68-86 it initializes with zero job counts and generates an ordered UUID. As jobs execute, the repository atomically updates job counts src/DatabaseBatchRepository.php103-136 to track progress.

Sources: src/DatabaseBatchRepository.php68-86


Persistence Operations Overview

The repository supports five categories of operations:

Retrieval Operations



Storage Operations

State Update Operations


These operations use atomic updates with row-level locking to ensure correctness when multiple workers update the same batch concurrently:

All job count updates use the updateAtomicValues helper src/DatabaseBatchRepository.php141-152 which wraps operations in a database transaction with lockForUpdate() to prevent race conditions.

Lifecycle Operations

Transaction Control

These methods delegate to the underlying database connection's transaction management, enabling complex multi-step operations to execute atomically.

Sources: src/DatabaseBatchRepository.php40-257 src/Contracts/BatchRepository.php14-70


Integration with Batch System

The repository integrates with the broader batch processing system at several key points:


  1. Batch Creation: PendingBatch stores itself via store() before dispatching jobs
  2. Job Count Management: As jobs are added and executed, counts are atomically updated
  3. Status Queries: Jobs use find() to check if their batch is cancelled
  4. Lifecycle Completion: When pending jobs reaches zero, the batch marks itself finished
  5. Cleanup: Old batches are removed via pruning methods

Sources: src/DatabaseBatchRepository.php68-181


Configuration Dependencies

The DatabaseBatchRepository requires several dependencies injected via its constructor src/DatabaseBatchRepository.php27-33:

DependencyTypePurpose
$factoryBatchFactoryCreates Batch domain objects from stored data
$resolverConnectionResolverInterfaceResolves database connections by name
$tablestringDatabase table name (default: job_batches)
$connectionstring (optional)Database connection name to use

The connection can be changed at runtime via setConnection(string) src/DatabaseBatchRepository.php329-334 enabling multi-tenant scenarios where different batches use different databases.

Sources: src/DatabaseBatchRepository.php27-33 src/DatabaseBatchRepository.php321-334


Summary

The data storage and persistence system provides:

  • Durable State: Batch state survives process restarts and spans distributed workers
  • Clean Abstraction: Repository pattern decouples storage from business logic
  • Atomic Updates: Row-level locking ensures correctness under concurrent access
  • Flexible Backend: Contract-based design allows alternative storage implementations
  • Lifecycle Support: Tracks batch state from creation through completion or cancellation

The following sections detail the repository interface methods (Batch Repository Interface), the database implementation specifics (Database Implementation), and the pruning system for cleanup (Pruning Old Batch Data).

Sources: src/Contracts/BatchRepository.php1-71 src/DatabaseBatchRepository.php1-335