If you play games on PC, there's a good chance you've got a folder somewhere filled with clips. Match highlights, funny moments, the occasional accidental recording of your desktop because you pressed the wrong hotkey. For me, that folder had grown to hundreds of files, all with generic names like Replay 2026-04-10 00-10-30.mkv, and I'd been putting off organising them for months.
The problem isn't just volume, it's variety. I play a mix of Counter-Strike 2, Valorant, Deadlock, and a few others, and short of opening every single file to check what game it's from, there's no quick way to sort them. I wanted to organise them into a folder structure like Counter Strike 2/2026/April/, and I wanted to do it without spending an entire afternoon scrubbing through footage.
I actually tried to solve this before with a more traditional machine learning method. I went through ten clips each of Counter-Strike 2 and Valorant, labelled them, and tried to train a basic classifier to tell the two apart. Even with just two games and manual feedback, it didn't really work. The model struggled to reliably pick the right one, and scaling that to more games with more clips didn't work and ultimately felt like a dead end.
So I took a different route. I wrote a Python script that uses Gemma 4 31b, running locally on my PC, to identify the game in each clip and sort the files automatically. It actually works, and it's one of the most practically useful things I've done with a local model.
The approach is surprisingly simple
Three screenshots and a vote
Here's how it works: for each video file, the script uses ffmpeg to extract three screenshots from evenly-spaced points throughout the clip. It ends up being roughly a quarter of the way through, halfway, and three-quarters of the way through. This avoids grabbing a loading screen or a black frame at the start, which would throw off the classification. The frames also get downscaled to 512 pixels wide before they're sent to the model, which keeps the token count manageable and stops prompt processing from grinding to a halt on larger vision models.
Each screenshot gets sent to Gemma 4 31b as a base64-encoded image through an OpenAI-compatible API endpoint. I'm running the model through llama.cpp, but it'll work with anything that exposes the same API format. The prompt is deliberately constrained: it tells the model to look at the screenshot and pick from a fixed list of games. Counter-Strike 2, Valorant, R.E.P.O, Deadlock, Minecraft, or Other.
Whichever game gets two or more votes out of three wins. It's a majority vote system, and it handles edge cases better than you'd expect. If one frame happens to catch a menu screen that's ambiguous, the other two frames from actual gameplay nearly always outvote it.
I built two versions of the script, with one primarily being for testing, and one that I eventually let loose on the actual folder filled with clips. The first script takes a single video file and runs it through the whole pipeline. You point it at a clip, it extracts the frames, sends them to the model, and tells you what game it thinks it is. This was massively helpful for testing, because I could quickly check whether the model was actually getting things right.
The output shows you each frame's vote and the final result, so you can see exactly where the model is confident and where it's guessing. In my testing, it got the game right on all three frames pretty much all of the time. Counter-Strike 2 and Valorant were the easiest to distinguish, which makes sense given how visually distinct they are. R.E.P.O was trickier on occasion, but the majority vote system caught most of the misclassifications. Deadlock was funny, though, as sometimes the reasoning traces from the model showed that it didn't know what Deadlock was, whereas others showed that the model had been trained on information somewhere that it was an unreleased Valve game.
The second script is the batch version. Point it at your Videos folder and it processes every video file it finds, classifies each one, parses the date from the filename, and moves it into the right folder structure. A clip from April 2026 that's identified as Counter-Strike 2 ends up in Sorted/Counter Strike 2/2026/April/.
I added a dry-run flag so that I could preview every move the script would make without actually touching my files, and I ran this first to spot-check the results. I also added a --month flag so that I could go month by month, which will also make sorting as I go a lot easier in the future.
Gemma 4 handles this better than I expected
It's a great family of models
I'll be honest, I wasn't sure this would work well when I started. Identifying a game from a single screenshot sounds easy for a human, but I wasn't confident a 31 billion parameter model running on consumer hardware would nail it consistently, especially given that my previous machine learning attempts had failed. Sure, I probably could have improved it with even more samples and some smarter choices, but I'd be accounting for edge cases constantly. I even tried to segment the UI from the processed image, and it was beginning to take more effort than it would have to just process all of the clips manually.
Gemma 4 supports variable image resolution and can process images with different token budgets, but for something like this, even the default settings are more than enough. Game screenshots have distinct HUDs and art styles that make classification pretty easy for a vision model. It's not like I'm asking it to identify a specific weapon skin or read tiny text in the corner of the screen, I'm asking it to tell the difference between Valorant and Counter-Strike 2, and that's a task it handles with ease.
On top of that, Gemma 4 has multiple models that run comfortably on consumer GPUs with enough VRAM, and quantised versions bring the requirements down even further. Even Gemma 4 E4B did an alright job when I tested with just Minecraft, Valorant, and Counter Strike 2, which makes this possible to even run on a laptop. Is it the "easy" way out in terms of computer vision? Sure, but it gets the job done, and it's the exact kind of thing a local LLM is perfect at.
