VOOZH about

URL: https://towardsdatascience.com/solving-a-video-sequencing-puzzle-e4ad29020b7f/

⇱ Solving a Video sequencing puzzle | Towards Data Science


Skip to content

Solving a Video sequencing puzzle

Doing Data Science from Scratch: Video clip processing

6 min read

Doing Data Science from scratch task by task

Video clip processing

👁 Photo by Halacious on Unsplash
Photo by Halacious on Unsplash

This is the next article in my series Doing Data Science from Scratch. We are doing a project to count the number of passing vehicles on the road where I live. The question to be answered is fundamental – when is the best time to go for a walk without being mowed down by speeding cars, trucks, or other agricultural vehicles. My last article focused on recovering event data from log files, and that was a lot of fun. Now I write to explain a bit of a mystery, how I solved it, and what solving it means. My Six (6) motion detectors create video clips of activity, and one of my scripts makes a film of the whole day. When I viewed today’s film, events and activities appeared all out of sequence. A person left my front door before they arrived. It was crazy, and I thought I was going mad. Thankfully not readers, thankfully not.


The mystery

For several days now, I have been running my equipment. There are Six (6) individual motion detection cameras monitoring passing motorists. After several hours I generally stop the cameras and that allows space for video processing, review, calibration and another run. All very scientific if not terribly efficient. Undoubtedly I am in no hurry, and I value the journey rather than getting to the finish line.

Having watched eight (8) short videos, from those six (6) cameras, I had this uncomfortable feeling. First, the Cat seemed to just appear from nowhere. Now I thought that Cat was up to something. Aren’t Cats always up to something? Then a delivery driver appeared to be leaving the house but never arrived – the postal service was also seen leaving but never coming. People just seemed to appear and disappear in no particular order. Watching an example today, I noticed that the timestamp, from the clips, jumped from 10am to 6pm and back to 10am. Hello, that couldn’t be right! We must have a bug! My original script, to combine clips from all those cameras, started life on a different project. Ugh oh! After dinner, I resolved to visit that code again and figure some stuff out.


So what was going on?

The symptom, and my only lead, was that the timestamp on the video frames was jumping around erratically instead of being sequential with time. One moment a frame would appear from 10am with the next from 6pm and then back to 10am. Weird and slightly confusing for me. Some sort of timeline flux or violation of the temporal directive?

Let us examine the original code so I can explain. Lines 28–33 make a list of all the clips available from each Camera. Lines 35–40 sort each list in ascending order. Line 45 creates an empty list which will hold all the VideoFileClip objects. Looking at the code, it seemed ripe for a complete code refactoring. Each cameras contribution to the movie should be in sequence, but the film will have a sense of time shifting since all Camera 1 shots will be seen before any of Camera 2 events. So we have at least a partial explanation for the mystery. I added an image of the code below in case you are on ≤13" of screen real estate and need context.

👁 Image by Author - Dec 7th. Prefer not to embed
Image by Author – Dec 7th. Prefer not to embed

So the script was never designed to make a sequenced film. Hmm, that is what I believe, we refer to as, Technical Debt. There was another problem though!

vids = sorted(vids, reverse=False)

The Python variable ‘vids’ contains a list of file names and that specific line of code sorts the list by text in ascending order. Hmm, that isn’t good either. I pushed the variable contents up to GitHub so you can examine it if you wish. To follow is a marked-up version of the data. Notice the area I highlighted with the rectangle. The sort command is sorting the text, but the sort order is not correct. We are jumping from 9am to 11am and back. Now that makes sense as well. Hmm, perhaps that was a bit sloppy then!

👁 Image

So what was going on? Well, I hope you might agree, but we had an underdeveloped script, aka lousy code, that needed a bit of Tender Loving Care (TLC). Ok, we better do some more programming then! Solved!


The Fix

Having solved the mystery, I am happy that I hadn’t lost my sanity. Still, I am convinced that Cat is up to something. That Cat is always up to something. The Cat is a stray, and it arrived at the end of my last project. In my defence, that is why I have half baked code. I was waiting for the Cat to stray off! Be kind to the animals they have the same right to the planet as we do even if they are up to something!

👁 Image taken by the Author. The stray Cat - does that Cat look innocent to you?
Image taken by the Author. The stray Cat – does that Cat look innocent to you?

Enough of the cute stray Cat! Back to Towards Data Science. So I went ahead and fixed the code. You can check out the new improved version here and I left the data for you as well. Here is the main highlight.

def date_key(elem): 
 comps = elem.split('/') 
 name = comps[6].strip().replace('.mkv','').split('-')[1] 
 return name

It might sound a bit rich, since I do not document my code, but you really should and need to. Add some comments for yourself and others. Here I introduce a new function, ‘date_key(elem)’. I pass in each element of the list of file names and I return the variable name. Do give your variables meaningful names, again, because I don’t and you need to.

Let us examine the variable name with an illustration. I highlighted the value name gets for each line. 20201207 – December 7th, 2020. 094235–09:42:35 GMT. The function returns the date and time from the file name.

👁 Image by the author. Highlighting the content of the variable name for each file name.
Image by the author. Highlighting the content of the variable name for each file name.
videos = [vids,vids2,vids3,vids4,vids5,vids6] 
unprocess = [] 
for v in videos: 
 for z in v: 
 unprocess.append(z) 
vids = sorted(unprocess, reverse=False, key=date_key)

Now, as we discussed, all the videos need to be sorted by the date & time key we just looked at. Perhaps the code still isn’t perfect, but now videos is a list of lists. With some iteration – unprocess is a singular list of all the videos.

vids = sorted(unprocess, reverse=False, key=date_key)

Finally, vids is a list of video clips, but it is now sorted by ascending order and by date_key. Rather than sort by text, we have introduced a sort key. With the introduction of the sort key, the problem is solved.


The Mystery solved, the problem fixed, now what

I wanted to share this adventure, as part of the series, because Doing Data Science from Scratch really does mean from Scratch. Now that I have performed some review, made some calibration changes, it is time for another run. This is Science, and I love it. I hope you do too!

Once I do the next run, I will again review, and see where I am. Certainly, another step closer to the day when I will begin to count passing traffic and answer my research question. Towards Data Science and beyond.


Written By

David Moore

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles