Summary

  • Generative AI, like Stable Video Diffusion, has the potential to create highly realistic and customizable images and videos based on provided prompts.
  • The use of AI-generated images and videos for disinformation and privacy violations is becoming more common, challenging the reliability of visual evidence on the internet.
  • Stable Video Diffusion is just the beginning, and as AI technology continues to advance, the future of online media is threatened by potential misuse and ethical concerns.

The AI revolution has been one of the biggest and most important advancements of 2023. With the likes of OpenAI taking the world by storm with ChatGPT and others like Bing Chat and Google Bard following, generative AI is a pretty powerful technology. Where it gets worrisome is AI image generation, tools that can create custom-made images based on prompts provided to them. Now, with Stable Video Diffusion, things are about to get even worse.

I'm far from fearful when it comes to technology, and I think that generative AI has a lot of uses in both accessibility and fun contexts, but there's no doubt that the technology can be used for evil, too. Disinformation is a phenomenon becoming more and more frequent, and fake images generated have already been demonstrated to trick users in many different contexts. Remember that photo of Pope Francis that was going around where he was wearing a long white puffer jacket? That image wasn't real, but many people thought it was. Images are no longer the silver bullet of proof that people once expected them to be.

Source: Generative AI

Given that it's already impossible nowadays to rely on images as sole proof of something, with videos being next on the chopping block, it's going to be harder than ever to rely on anything you see on the Internet as being real.

Stability AI's Stable Video Diffusion is scarily good

It's only in testing now, though

Stable Video Diffusion follows on from Stable Diffusion released last year, an "open weights" model that arguably kickstarted the AI wave of image generators, at the very least playing a significant part. The video form of this particular model is just as accessible and can be run by anyone who has one of the best Nvidia GPUs.

How this particular model works is pretty interesting, and at the moment, is quite limited in how much it can really do. As Stability AI puts it, "While we eagerly update our models with the latest advancements and work to incorporate your feedback, this model is not intended for real-world or commercial applications at this stage. Your insights and feedback on safety and quality are important to refining this model for its eventual release."

There are two current models available for users to use; the first is SVD, and the second is SVD-XT. These can generate 14 and 25 frames respectively at frame rates customizable between 3 and 30 FPS. With this kind of AI capable of doing so much, it's only a matter of time before people can homebrew their own deep fakes at home of anyone.

Stable Video Diffusion will likely be easy to set up

That's not necessarily a good thing

When Stable Diffusion first took off, a friend of mine trained a model on his friend's face in order to add said friend to the Metal Gear Solid universe in a ridiculously silly custom-made gallery. It was a pretty cool gift and a lot of fun to work on and mess with (the friend gave full consent to have a model trained on his face), but I think back to then, now, completely horrified.

With the hundreds of images of us that are out there, it's already been possible for people to train models on the faces of people who don't give their consent, pretty much anyone out there who has photos of themselves publicly viewable. Now imagine being able to generate an image of somebody and then being able to animate that drawing using Stable Video Diffusion?

There are many implications of this, ranging from privacy violations to the borderline illegal. I have already heard from women in the content creator space who have told me about fans AI generating pornography of them and sending it back to them, almost as if those "fans" were proud of the fact that they had violated another human being's privacy. This has been going on for over a year, and it's an example that I'm familiar with. By no means is it the only privacy implication of tools like these, and in fact, it's likely only going to get worse.

Examples of Stable Video Diffusion are already available

Scary but incredible

The above video, released by Stability AI, shows the power of Stable Video Diffusion. Others have also gone on to show the power of the technology, demonstrating how it can make practically anything move and be animated in a small, few-second window. It takes a lot of computational power, but there are plenty of services like Hugging Face and Replicator that people can essentially rent processing time. I ran it locally, using the image below (distributed with the Stable Video DIffusion software) to test how good it was.

The above image is one I suspect is AI generated, as I cannot find exact matches to it online. Nevertheless, it's a perfect candidate for testing. I ran the Stable Video Diffusion model locally with this image, and in just under an hour, had the following four-second clip.

This is shockingly good. While it's at a low frame rate now, as already mentioned, this is an in-development model that is not meant for general usage yet. I tried with my own photo, a photo of a train arriving in the mist.

Sadly, the result wasn't as good, though it was a more challenging photo for an AI to work with thanks to the fog.

Impressively, it still seemed to understand that the train was, well, a train. It just ended up moving over to the other train track. Still, though, this is beta software, and the results are impressive nonetheless.

The future of online media is being threatened

Stable Video Diffusion is just the start

Regardless of what you may think of how impressive this tech is, it's only the beginning. This is the first open-source model that people will undoubtedly take apart, improve upon, and possibly make use of with a lack of care for ethics. The future of online media is in danger, largely in thanks to AI video and images, and as they get better and better, there are far-reaching implications that will open multiple versions of Pandora's box over the next months and years.

As a computer scientist, the technology is so incredibly impressive that it boggles the mind, and the fast growth of the generative AI landscape is so, so impressive. However, as a person, this technology terrifies me.