Summary

  • Sora, the world's first true text-to-video algorithm, has massive implications for misinformation spread but safeguards are in place.
  • While Sora will be heavily safeguarded upon release, future text-to-video algorithms may present challenges due to potential exploitation risks.
  • Lack of regulation in the AI industry poses risks of harm, especially with the rise of deepfake videos and fake images circulating online.

Sora is the world's first true text-to-video algorithm, in that it can generate footage indistinguishable at a passing glance to real-life footage. It has massive ramifications when it comes to the spread of misinformation, but none of that is going to be a problem... yet.

Sora, developed by OpenAI, is currently being "red teamed" internally, meaning that people external to its development simulate attacks on the model and try to get around its safeguards. If ChatGPT jailbreaks are anything to go by though, it's almost impossible that they'll catch every instance. Humans are creative, and software is always vulnerable to exploitation.

When Sora releases, it'll be heavily safeguarded

What comes next probably won't be

Source: OpenAI

When Sora eventually releases, it's very likely to be expensive to operate, costing end users money to generate videos, and will probably be under active surveillance by staff at OpenAI. Because of this, Sora presents a high enough barrier to entry and presumably sufficient moderation to deter bad actors from using it for illegal activities or spreading disinformation. However, Sora is not my primary concern.

When ChatGPT was first released, companies around the world rushed to develop their own competitors that could match its capabilities. It took some time, but Google's Bard and Microsoft's Bing Chat eventually emerged, offering alternatives to OpenAI's ChatGPT in an effort to reclaim some of its initial widespread influence. All three platforms implemented built-in guardrails to uphold societal values such as diversity and inclusion. While some critics might argue that these guardrails hamper innovation, for many of these companies, it was a preferable option to the risk of another incident like Tay Tweets.

👁 ChatGPT, Copilot, and Gemini logo on a background with a weave
ChatGPT vs Microsoft Copilot vs Google Gemini: What are the differences?

If you've been trying to figure out which generative AI tool is better, you've come to the right place

However, Mixtral 8x7B emerged, and part of its appeal to many was its lack of enforced guardrails right out of the box. It was a truly uninhibited AI that could be asked anything within its training set, no matter how illegal or non-inclusive it might be. Even innocuous questions that could have triggered a seemingly misplaced guardrail in ChatGPT, Gemini, or Copilot would work just fine here. There's a massive appeal to that, but it also comes with a more sinister implication.

While there's only so much damage one can do with text, video was a significantly harder medium to fake. Sora makes it easier, and if we encounter another situation like Mixtral, where a text-to-video algorithm comes along that's totally open and free, that's when we need to worry. It will still likely require a significant amount of computation and VRAM (for example, Stable Video Diffusion with its six-second videos requires a ton of both), but it then opens the door to bad actors with GPU farms. Imagine a botnet of computers globally being used to generate videos. That's something that may genuinely be feasible, particularly as we've already seen malware spread widely to infect computers and mine bitcoin.

👁 Mixtral AI logo with 8x7B on a white background
What is Mixtral 8x7B? The open LLM giving GPT-3.5 a run for its money

If you've heard about Mixtral 8x7B but aren't sure what makes it so special, we have all of the details here

Regulation risks stifling innovation

But no regulation risks harming people

It's becoming increasingly clear that the AI industry requires regulation, but defining the specifics of such regulation is challenging. The European Union has made efforts to regulate AI, though these regulations are primarily focused on preventing its use in government sectors and for profiling individuals. However, the generation of deepfake images, such as those of Taylor Swift, underscores the necessity of broader regulatory measures.

Without regulation, the potential for harm in this emerging industry is undeniable. For instance, the authenticity of video evidence for crimes will become more difficult to establish, as the capability to create convincing fake videos grows. The internet has already seen its share of humorous but misleading fake images, like those depicting the Pope in a fashionable white coat. Moreover, videos falsely showing the Eiffel Tower on fire circulated widely on platforms like TikTok and X (formerly Twitter), but people thought that they were real.

Sora isn't what's going to cause the downfall of society, and hyperbole aside, video generation probably won't either. However, it will sow further distrust in established forms of communication and proof of events unfolding that can have significant societal implications if unchecked. OpenAI will almost certainly be careful around Sora and what it generates, but whatever comes next likely won't be so careful.

👁 Taylor Swift on a print out
Explicit Taylor Swift AI fakes prove that AI needs regulation

AI deepfakes are proving that AI needs regulation, and a recent example involving Taylor Swift has gone viral.