Last indexed: 7 February 2026 (9db8dd)

Data Storage and Persistence

Purpose and Scope

This document introduces the data storage and persistence system for batch processing in hypervel/bus. It explains why batches require persistent storage, describes the repository pattern abstraction used to separate storage concerns from business logic, and provides an overview of the contract/implementation architecture.

The batch system tracks long-running operations that span multiple job executions across distributed workers. This requires durable state storage that survives process restarts and enables coordination between workers. This page covers the architectural foundations of the persistence layer.

For detailed documentation of the repository interface methods, see Batch Repository Interface. For implementation details of the database storage backend, see Database Implementation. For cleanup and maintenance operations, see Pruning Old Batch Data.

Why Batches Need Persistent Storage

Batches coordinate the execution of multiple jobs that may run on different workers, at different times, potentially across server restarts. The batch system must maintain state that includes:

Job Counts: Total jobs, pending jobs, and failed jobs
Execution Status: Whether the batch is running, finished, or cancelled
Failure Tracking: IDs of failed jobs for debugging and retry logic
Lifecycle Callbacks: Serialized closures for then/catch/finally hooks
Metadata: Batch name, creation time, completion time, cancellation time

Without persistent storage, this state would be lost when a worker process terminates, making it impossible to:

Track batch completion across multiple job executions
Execute lifecycle callbacks when all jobs finish
Allow jobs to query their parent batch status
Provide monitoring and debugging capabilities
Implement batch cancellation that affects all jobs

Sources: src/DatabaseBatchRepository.php68-86 src/Contracts/BatchRepository.php12-70

Repository Pattern Architecture

The persistence layer uses the repository pattern to provide a clean abstraction between batch business logic and storage implementation. This separation enables:

Storage Backend Flexibility: Swap implementations (database, Redis, file system) without changing batch logic
Testability: Mock the repository in tests without requiring actual storage
Contract-First Design: Interface defines capabilities independent of implementation details

Contract Hierarchy

The BatchRepository interface src/Contracts/BatchRepository.php12-70 defines core operations for storing and managing batch state. The PrunableBatchRepository interface extends it with cleanup methods for removing old batch records. The DatabaseBatchRepository class src/DatabaseBatchRepository.php17-335 provides a concrete implementation using database tables.

Sources: src/Contracts/BatchRepository.php1-71 src/DatabaseBatchRepository.php1-20

Storage Architecture

The persistence system consists of three layers:

1. Contract Layer

Defines behavioral contracts that any batch repository must implement:

Contract	Purpose	Methods
`BatchRepository`	Core CRUD and state management	`get`, `find`, `store`, `increment`, `decrement`, `markAsFinished`, `cancel`, `delete`, `transaction`, `rollBack`
`PrunableBatchRepository`	Cleanup operations for old data	`prune`, `pruneUnfinished`, `pruneCancelled`

2. Implementation Layer

Provides concrete storage backends:

DatabaseBatchRepository: Stores batch state in a database table with atomic updates using row-level locking

3. Factory Layer

The BatchFactory instantiates Batch objects from stored data, decoupling persistence format from domain objects. The factory is injected into the repository src/DatabaseBatchRepository.php27-33 and used to convert raw database records into Batch instances src/DatabaseBatchRepository.php301-316

Sources: src/DatabaseBatchRepository.php27-33 src/DatabaseBatchRepository.php301-316

Batch Data Model

Each stored batch record contains the following fields:

Field	Type	Purpose
`id`	string	Unique identifier (ordered UUID)
`name`	string	Human-readable batch name
`total_jobs`	int	Total number of jobs in the batch
`pending_jobs`	int	Number of jobs not yet completed
`failed_jobs`	int	Number of jobs that failed
`failed_job_ids`	JSON array	List of failed job IDs for debugging
`options`	serialized	Batch configuration and callbacks
`created_at`	timestamp	When the batch was created
`cancelled_at`	timestamp (nullable)	When the batch was cancelled
`finished_at`	timestamp (nullable)	When the batch completed

When a PendingBatch is stored src/DatabaseBatchRepository.php68-86 it initializes with zero job counts and generates an ordered UUID. As jobs execute, the repository atomically updates job counts src/DatabaseBatchRepository.php103-136 to track progress.

Sources: src/DatabaseBatchRepository.php68-86

Persistence Operations Overview

The repository supports five categories of operations:

Retrieval Operations

find(batchId): Retrieve a single batch by ID src/DatabaseBatchRepository.php55-63
get(limit, before): Retrieve a paginated list of batches src/DatabaseBatchRepository.php40-50

Storage Operations

store(PendingBatch): Persist a new batch to storage src/DatabaseBatchRepository.php68-86

State Update Operations

These operations use atomic updates with row-level locking to ensure correctness when multiple workers update the same batch concurrently:

incrementTotalJobs(batchId, amount): Add jobs to an existing batch src/DatabaseBatchRepository.php91-98
decrementPendingJobs(batchId, jobId): Mark a job as complete src/DatabaseBatchRepository.php103-117
incrementFailedJobs(batchId, jobId): Mark a job as failed src/DatabaseBatchRepository.php122-136

All job count updates use the updateAtomicValues helper src/DatabaseBatchRepository.php141-152 which wraps operations in a database transaction with lockForUpdate() to prevent race conditions.

Lifecycle Operations

markAsFinished(batchId): Set the finished timestamp src/DatabaseBatchRepository.php157-162
cancel(batchId): Set both cancelled and finished timestamps src/DatabaseBatchRepository.php167-173
delete(batchId): Remove batch record entirely src/DatabaseBatchRepository.php178-181

Transaction Control

transaction(Closure): Execute operations within a database transaction src/DatabaseBatchRepository.php246-249
rollBack(): Rollback the current transaction src/DatabaseBatchRepository.php254-257

These methods delegate to the underlying database connection's transaction management, enabling complex multi-step operations to execute atomically.

Sources: src/DatabaseBatchRepository.php40-257 src/Contracts/BatchRepository.php14-70

Integration with Batch System

The repository integrates with the broader batch processing system at several key points:

Batch Creation: PendingBatch stores itself via store() before dispatching jobs
Job Count Management: As jobs are added and executed, counts are atomically updated
Status Queries: Jobs use find() to check if their batch is cancelled
Lifecycle Completion: When pending jobs reaches zero, the batch marks itself finished
Cleanup: Old batches are removed via pruning methods

Sources: src/DatabaseBatchRepository.php68-181

Configuration Dependencies

The DatabaseBatchRepository requires several dependencies injected via its constructor src/DatabaseBatchRepository.php27-33:

Dependency	Type	Purpose
`$factory`	`BatchFactory`	Creates `Batch` domain objects from stored data
`$resolver`	`ConnectionResolverInterface`	Resolves database connections by name
`$table`	`string`	Database table name (default: `job_batches`)
`$connection`	`string` (optional)	Database connection name to use

The connection can be changed at runtime via setConnection(string) src/DatabaseBatchRepository.php329-334 enabling multi-tenant scenarios where different batches use different databases.

Sources: src/DatabaseBatchRepository.php27-33 src/DatabaseBatchRepository.php321-334

Summary

The data storage and persistence system provides:

Durable State: Batch state survives process restarts and spans distributed workers
Clean Abstraction: Repository pattern decouples storage from business logic
Atomic Updates: Row-level locking ensures correctness under concurrent access
Flexible Backend: Contract-based design allows alternative storage implementations
Lifecycle Support: Tracks batch state from creation through completion or cancellation

The following sections detail the repository interface methods (Batch Repository Interface), the database implementation specifics (Database Implementation), and the pruning system for cleanup (Pruning Old Batch Data).

Sources: src/Contracts/BatchRepository.php1-71 src/DatabaseBatchRepository.php1-335

Refresh this wiki

URL: https://deepwiki.com/hypervel/bus/9-data-storage-and-persistence