VOOZH about

URL: https://deepwiki.com/hypervel/bus/9.3-pruning-old-batch-data

⇱ Pruning Old Batch Data | hypervel/bus | DeepWiki


Loading...
Menu

Pruning Old Batch Data

Purpose and Scope

This document covers the pruning system for cleaning up old batch records from persistent storage. As batches complete, fail, or are cancelled over time, batch metadata accumulates in the database. The pruning system provides methods to remove stale batch records based on age and state.

For information about how batches are stored and persisted, see Database Implementation. For the complete batch repository interface, see Batch Repository Interface.


Overview

The batch processing system stores metadata for every batch in the database, including job counts, state information, timestamps, and serialized options. Without maintenance, this data accumulates indefinitely. The pruning system provides three targeted methods for removing old batch records based on their completion state and age.

Sources: src/Contracts/PrunableBatchRepository.php1-15 src/DatabaseBatchRepository.php1-335


The PrunableBatchRepository Contract

The PrunableBatchRepository interface extends BatchRepository and adds pruning capabilities. The base contract defines a single method:


This interface is implemented by DatabaseBatchRepository, which provides three concrete pruning methods for different batch states.

Sources: src/Contracts/PrunableBatchRepository.php9-15


Pruning Methods

Prune Finished Batches

The prune() method removes batch records that have completed (either successfully or with failures) and are older than a specified date.


Query Criteria:

  • finished_at IS NOT NULL - Only batches that have completed
  • finished_at < $before->getTimestamp() - Completed before the cutoff date

Implementation: src/DatabaseBatchRepository.php186-201

The method uses the finished_at timestamp to identify completed batches. This timestamp is set when:

  • All jobs in the batch complete successfully
  • The batch is cancelled via cancel()

Sources: src/DatabaseBatchRepository.php186-201


Prune Unfinished Batches

The pruneUnfinished() method removes batch records that never completed and are older than a specified date based on their creation time.


Query Criteria:

  • finished_at IS NULL - Only batches that have not completed
  • created_at < $before->getTimestamp() - Created before the cutoff date

Implementation: src/DatabaseBatchRepository.php206-221

This method is useful for cleaning up abandoned or stuck batches that may have failed to complete due to:

  • Application crashes
  • Lost queue workers
  • Indefinitely pending jobs

Sources: src/DatabaseBatchRepository.php206-221


Prune Cancelled Batches

The pruneCancelled() method removes batch records that were explicitly cancelled and are older than a specified date based on their creation time.


Query Criteria:

  • cancelled_at IS NOT NULL - Only batches that were cancelled
  • created_at < $before->getTimestamp() - Created before the cutoff date

Implementation: src/DatabaseBatchRepository.php226-241

Cancelled batches have both cancelled_at and finished_at set when cancel() is called, but this method specifically targets cancelled batches using the cancelled_at marker.

Sources: src/DatabaseBatchRepository.php226-241


Implementation Details

Chunking Strategy

All three pruning methods use an identical chunking strategy to avoid memory exhaustion and long-running database locks:


Each iteration deletes a maximum of 1000 rows. The loop continues until no rows are deleted, indicating all matching records have been removed.

Benefits of chunking:

  • Prevents memory exhaustion from large result sets
  • Reduces database lock duration
  • Allows other queries to interleave during pruning
  • Provides incremental progress for long-running prune operations

Implementation Pattern:


Sources: src/DatabaseBatchRepository.php194-198 src/DatabaseBatchRepository.php214-218 src/DatabaseBatchRepository.php234-238


Query Comparison Table

MethodState FilterTimestamp FieldTimestamp Comparison
prune()finished_at IS NOT NULLfinished_at< $before->getTimestamp()
pruneUnfinished()finished_at IS NULLcreated_at< $before->getTimestamp()
pruneCancelled()cancelled_at IS NOT NULLcreated_at< $before->getTimestamp()

Note: pruneUnfinished() and pruneCancelled() use created_at for age comparison, while prune() uses finished_at. This is because unfinished and cancelled batches may have been created long ago but never completed, so their creation timestamp determines their age.

Sources: src/DatabaseBatchRepository.php186-241


Database Schema Requirements

The pruning methods rely on these database columns in the batch table:


Timestamp Fields:

Sources: src/DatabaseBatchRepository.php72-83 src/DatabaseBatchRepository.php157-173


Maintenance Strategies

Scheduled Pruning

The typical approach is to schedule regular pruning via cron jobs or scheduled tasks:

Daily Finished Batch Cleanup:


Weekly Unfinished Batch Cleanup:


Monthly Cancelled Batch Cleanup:


Retention Policy Recommendations

Batch StateRecommended RetentionRationale
Finished (successful)7-30 daysHistorical data for debugging recent issues
Finished (failed)30-90 daysLonger retention for failure analysis
Unfinished30-60 daysAllow time to investigate stuck batches
Cancelled30-90 daysArchive of intentionally stopped operations

Selective Pruning Strategy

For fine-grained control, combine multiple pruning methods:


Sources: src/DatabaseBatchRepository.php186-241


Return Values

All pruning methods return the total number of batch records deleted:


The return value represents the cumulative count across all chunked delete operations, useful for:

  • Monitoring pruning effectiveness
  • Alerting on unexpected accumulation
  • Tracking storage reclamation
  • Auditing data retention compliance

Sources: src/DatabaseBatchRepository.php186 src/DatabaseBatchRepository.php206 src/DatabaseBatchRepository.php226


Integration with DatabaseBatchRepository


The DatabaseBatchRepository implements the PrunableBatchRepository interface, providing concrete pruning implementations that query the configured database table. The repository uses the injected ConnectionResolverInterface to obtain database connections for executing pruning queries.

Sources: src/Contracts/PrunableBatchRepository.php9-15 src/DatabaseBatchRepository.php17-335