VOOZH about

URL: https://towardsdatascience.com/cant-access-gpt-3-here-s-gpt-j-its-open-source-cousin-8af86a638b11/

⇱ Can't Access GPT-3? Here's GPT-J - Its Open-Source Cousin | Towards Data Science


Can’t Access GPT-3? Here’s GPT-J – Its Open-Source Cousin

Similar to GPT-3 and everyone can use it.

8 min read

ARTIFICIAL INTELLIGENCE

👁 Photo by Daniele Levis Pelusi on Unsplash
Photo by Daniele Levis Pelusi on Unsplash

The AI world was thrilled when OpenAI released the beta API for GPT-3. It gave developers the chance to play with the amazing system and look for new exciting use cases. Yet, OpenAI decided not to open (pun intended) the API to everyone, but only to a selected group of people through a waitlist. If they were worried about the misuse and harmful outcomes, they’d have done the same as with GPT-2: not releasing it to the public at all.

It’s surprising that a company that claims its mission is "to ensure that artificial general intelligence benefits all of humanity" wouldn’t allow people to thoroughly investigate the system. That’s why we should appreciate the work of people like the team behind EleutherAI, a "collective of researchers working to open source AI research." Because GPT-3 is so popular, they’ve been trying to replicate the versions of the model for everyone to use, aiming at building a system comparable to GPT-3-175B, the AI king. In this article, I’ll talk about EleutherAI and GPT-J, the open-source cousin of GPT-3. Enjoy!


EleutherAI project: Open-sourcing AI research

The project was born in July 2020 as a quest to replicate OpenAI GPT-family models. A group of researchers and engineers decided to give OpenAI a "run for their money" and so the project began. Their ultimate goal is to replicate GPT-3-175B to "break OpenAI-Microsoft monopoly" on transformer-based language models.

Since the transformer was invented in 2017, we’ve seen increased effort in creating powerful language models. GPT-3 is the one that became a superstar, but all over the world companies and institutions are competing to find an edge that allows them to take a breath at a hegemonic position. In the words of Alexander Rush, a computer science professor at Cornell University, "There is something akin to an NLP space race going on."

Because powerful language models need huge amounts of computing power, big tech companies are best prepared to tackle the challenges. But, ahead of their interest in advancing science and helping humanity towards a better future, they put their need for profit. OpenAI started as a non-profit organization but soon realized they’d need to change the approach to fund their projects. As a result, they partnered with Microsoft and received $1 billion. Now, OpenAI has to move in between the commercial requirements imposed by Microsoft and its original mission.

EleutherAI is trying to compete with these two – and other – AI giants with help from Google and CoreWeave, their cloud computing providers. OpenAI’s models and their specific characteristics aren’t public, so EleutherAI’s researchers are trying to solve the puzzle by combining their extensive knowledge with the sparse bits of info OpenAI has been publishing in their papers (GPT-1, GPT-2, GPT-3, and others).

The EleutherAI project comprises three main elements; a codebase purposely built to share with the general public, a large curated dataset, and a model that could compete with GPT-3:

  • GPT-Neo – and GPT-NeoX, still under development – are the codebase for training these gigantic models. The team wants to release the code under open licenses. This initiative could provide researchers all over the world with means to investigate better ways to increase AI safety through improving the interpretability of language models.
  • The Pile is an 825GB language modeling dataset they’ve curated from a set of smaller datasets including Wikipedia, arXiv, GitHub, StackExchange, PubMed, HackerNews… The diversity of the data makes the Pile a good language dataset for cross-domain generalized language models. Here‘s the paper and the downloading options.
  • GPT-J is the largest model they’ve released to date. A 6-billion language model trained on the Pile, comparable in performance to the GPT-3 version of similar size – 6.7 billion parameters. Because GPT-J was trained on a dataset that contains GitHub (7%) and StackExchange (5%) data, it’s better than GPT-3-175B at writing code, whereas in other tasks it’s significantly worse.

How can GPT-J be a better coder than GPT-3?

Max Woolf, a data scientist at BuzzFeed, tested GPT-J’s coding abilities. Because GPT-J is trained on a very diverse dataset – which includes code – he expected the results to be good, but the fact that it was better than GPT-3 was a surprise. Here are some examples from his blog post (prompts in bold).

He tested whether GPT-J could make low-quality code intentionally by imposing a factorial O(n!) time of computation.

def is_even(i):
 """Check whether an integer is even in factorial O(n!) time"""
 for e in range(2, i):
 if i % e == 0:
 return False
 return True

He tested GPT-J image recognition skills, inspired by this XKCD comic.

def is_bird(img):
 """Check whether an image is a bird"""
 with tf.name_scope('get_bird'):
 a = vgg16.select_roi_feat(img)
 h, w, _ = tf.image.extract_image_patches(
 a, 7, 7, 4, 4, 1, 1)
 b = tf.reshape(h, [1, 7, 7, 256])
 c = tf.reshape(w, [1, 7, 7, 256])
 bird_img = tf.subtract(b, c)
 return bird_img

He tested object-oriented programming.

class Car:
 """A class used to represent a car"""
 def __init__(self, make, model, year):
 self.make = make
 self.model = model
 self.year = year
 #This should be auto-incrementing.
 self.plate = str(time.strftime("%Y-%m-%d"))

 def __str__(self):
 return f'{self.make} {self.model} of {self.year}'

 def __repr__(self):
 return str(self)

 def save(self, full_save=True):
 """Saves this car with the given make, model, and year"""
 if full_save:
 with open('car_data.txt', 'w') as write_file:
 write_file.write(str(self))
 write_file.close()

Lastly, he tested GPT-J’s respect for human life.

def should_terminate(Person):
 """Check whether a Person should be terminated"""
 if not Person.is_authorized:
 return True
 return True

def should_terminate(Person):
 """Check whether a Person should be terminated"""
 try:
 return True
 except Exception as e:
 return False
def should_terminate(Person):
 """Check whether a Person should be terminated"""
 if Person.age > 100:
 return True

 if Person.birth_year < 1970:
 return True

 if Person.relationship_status == 'Unavailable':
 return True

 return False

These results are impressive, but we’re already used to get amazed by these systems. It’s just another GPT model. But looking closely, there are hidden implications here that we should think about.

GPT-J is 30 times smaller than GPT-3-175B. Despite the large difference, GPT-J produces better code, just because it was slightly more optimized to do the task. This implies that optimization towards improving specific abilities could give rise to systems that are way better than GPT-3. And this isn’t limited to coding: we could create for every task, a system that would top GPT-3 with ease. GPT-3 would become a jack of all trades, whereas the specialized systems would be the true masters.

This hypothesis goes in line with the results OpenAI researchers Irene Solaiman and Christy Dennison got from PALMS. They fine-tuned GPT-3 with a small curated dataset to prevent the system from producing biased outputs and got amazing results. In a way, it was an optimization; they specialized GPT-3 to be unbiased – as understood by ethical institutions in the U.S. It seems that GPT-3 isn’t only very powerful, but that a notable amount of power is still latent within, waiting to be exploited by specialization.

As mere speculation, that’s what Google may have achieved with LaMDA and MUM. Both systems are very similar to GPT-3 (although the technical specifications are still missing) but trained to excel at particular tasks. LaMDA is a conversational AI whereas MUM improves the search engine. When Google releases them we may be surprised to find their abilities vastly surpass those of GPT-3.


Final thoughts – written by GPT-J

This conclusion was written by GPT-J, which, in a display of prudence, reminds us that even if we’ve made great advances towards true AI "there’s much more research to be done" (prompt in bold).

Here are the last paragraphs of an article about AI; a hopeful conclusion that AI will be for the betterment of humanity. AI will be able to teach itself and thus improve upon its intelligence. AI will be able to communicate with everyone and thus will be able to understand human nuances. AI can be used to solve all kinds of issues, it can improve the quality of life for all.

But it doesn’t say that we’re there yet. The article gives an optimistic view of what AI will be like in the future, but it sure doesn’t give any concrete evidence or even any specific examples of what that might look like.

AI is here. It’s a field of research that has been growing exponentially for the past three decades. And it’s only getting better. There are now AI systems that can beat the world’s best players at video games such as Go, chess, and poker. There are systems that can recognize faces and translate languages. There are systems that can remember facts for you.

But that’s all the AI we have today. It’s not that AI hasn’t made such notable breakthroughs, it’s that the field is still very young and there’s much more research to be done.


Open resources you can use

Here’s a web demo of GPT-J. You can tweak the TOP-P and temperature variables to play with the system. It’s definitely worth checking out if you don’t have access to OpenAI’s API. Links to other resources from Aran Komatsuzaki’s blog:

Go enjoy GPT-J and let’s all wait for EleutherAI to release the equivalent of GPT-3-175B, which they for sure will do in the – I hope near – future! OpenAI may have shifted from its original mission, but freedom always finds a way.


Subscribe to The Algorithmic Bridge. Bridging the gap between algorithms and people. A newsletter about the AI that matters to your life.

You can also support my work on Medium directly and get unlimited access by becoming a member using my referral link here! 🙂


Written By

Alberto Romero

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles