VOOZH about

URL: https://dev.to/ahmet_gedik778845/building-a-content-cleanup-system-for-old-videos-52i2

⇱ Building a Content Cleanup System for Old Videos - DEV Community


Introduction

Content accumulates. After months of fetching 25 trending videos from 7 regions every few hours, you end up with thousands of video records. Many become stale, irrelevant, or simply take up database space. Here's the cleanup system I built for ViralVidVault.

The Cleanup Pipeline

Our cleanup runs as the final step of each cron cycle:

<?php

class ContentCleaner
{
 private \PDO $db;
 private int $maxAgeDays;
 private int $minViews;

 public function __construct(
 \PDO $db,
 int $maxAgeDays = 90,
 int $minViews = 100,
 ) {
 $this->db = $db;
 $this->maxAgeDays = $maxAgeDays;
 $this->minViews = $minViews;
 }

 public function run(): CleanupReport
 {
 $report = new CleanupReport();

 // Step 1: Remove videos older than maxAge days
 $report->expired = $this->removeExpired();

 // Step 2: Remove low-performing videos older than 30 days
 $report->lowPerforming = $this->removeLowPerforming();

 // Step 3: Remove already-flagged stale videos older than 7 days
 $report->stale = $this->removeStale();

 // Step 4: Remove duplicate entries (same video, different fetches)
 $report->duplicates = $this->removeDuplicates();

 // Step 5: Vacuum the database
 $this->vacuum();

 return $report;
 }

 private function removeExpired(): int
 {
 $stmt = $this->db->prepare('
 DELETE FROM videos
 WHERE fetched_at < datetime("now", :days)
 AND is_active = 1
 ');
 $stmt->execute([':days' => "-{$this->maxAgeDays} days"]);
 return $stmt->rowCount();
 }

 private function removeLowPerforming(): int
 {
 $stmt = $this->db->prepare('
 DELETE FROM videos
 WHERE views < :minViews
 AND fetched_at < datetime("now", "-30 days")
 ');
 $stmt->execute([':minViews' => $this->minViews]);
 return $stmt->rowCount();
 }

 private function removeStale(): int
 {
 $result = $this->db->exec('
 DELETE FROM videos
 WHERE is_active = 0
 AND stale_at < datetime("now", "-7 days")
 ');
 return $result;
 }

 private function removeDuplicates(): int
 {
 $result = $this->db->exec('
 DELETE FROM video_regions
 WHERE rowid NOT IN (
 SELECT MIN(rowid)
 FROM video_regions
 GROUP BY video_id, region
 )
 ');
 return $result;
 }

 private function vacuum(): void
 {
 $this->db->exec('VACUUM');
 }
}

The Report Object

<?php

class CleanupReport
{
 public int $expired = 0;
 public int $lowPerforming = 0;
 public int $stale = 0;
 public int $duplicates = 0;

 public function total(): int
 {
 return $this->expired + $this->lowPerforming + $this->stale + $this->duplicates;
 }

 public function __toString(): string
 {
 return sprintf(
 "Cleanup: %d removed (expired=%d, low=%d, stale=%d, dupes=%d)",
 $this->total(), $this->expired, $this->lowPerforming,
 $this->stale, $this->duplicates
 );
 }
}

Integration with Cron

<?php
// At the end of fetch_videos.php

// Step 5: Cleanup old content
$cleaner = new ContentCleaner($db->getPdo(), maxAgeDays: 90, minViews: 100);
$report = $cleaner->run();
echo "Step 5: {$report}\n";

Typical output on ViralVidVault:

Step 5: Cleanup: 47 removed (expired=12, low=18, stale=15, dupes=2)

Database Size Management

Without cleanup, the SQLite database at viralvidvault.com grew to 85MB after 3 months. With the cleanup system running every cron cycle, it stays around 25MB. The VACUUM command reclaims the freed space.

Key Takeaways

  1. Run cleanup as the last step of your content pipeline
  2. Use age-based expiry for the bulk of removals
  3. Remove low-performing content to keep quality high
  4. Clean up stale records after a grace period
  5. VACUUM SQLite after bulk deletes to reclaim space

Part of the "Building ViralVidVault" series.