How to Use Google’s PaLM 2 API with Python

Customize and integrate Google's LLM in your application.

Aug 14, 2023

14 min read

👁 Image by Alexandre Debiève from Unsplash.

Image by Alexandre Debiève from Unsplash.

Generative AI is all over the place. We see more and more companies investing in this powerful technology as it becomes increasingly clear how much potential it has. And as Gartner states: in the near future, [Generative AI] will become a competitive advantage and differentiator.

"in the near future, [Generative AI] will become a competitive advantage and differentiator."

Unfortunately, developing Generative AI models is not only a complex work of engineering, but it is usually quite a pricey project. Luckily, we do not have to develop these ourselves – we can reuse what has been pre-developed for us: with APIs! Therefore, let’s not wait any longer – let’s jump right into how we can leverage Generative AI by integrating it into our application.

For this article, we’ll be looking at Google’s answer to LLMs: the PaLM 2 API. PaLM 2 is Google’s newest version of their Pathways Language Model, a large language model which uses around five times more training data than their initial model released in 2022.

In this article I will be going through some code examples and showing you how to authenticate to Google Cloud and use, as well as customize the PaLM 2 APIs with Python 3.11.

1 | Getting Started

The PaLM 2 APIs can be accessed through Google Cloud’s Vertex AI platform. Therefore, before we can make any API calls, we will need to set up our Google Cloud account. You can sign up here and get $300 in free credits to start playing around with the services.

As soon as your account and project are set up, we can go ahead and create a service account which we will use to authenticate to the Vertex AI APIs. We use service accounts, because we can ensure access control to our Google Cloud resources by giving them only specific IAM permissions. For our use case, we will give the service account the Vertex AI User role. This might be too broad for your use case, so I recommend checking the available access roles and choose which one fits your needs.

After having created the service account and given it the right permissions, we can go ahead and generate a service account key. Select JSON as the key type and save the file in a safe place.

Great – we are ready to get hands-on! 👏

2 | Authenticate to Google Cloud

In this example, we will authenticate using OAuth 2.0 and request an access token with the help of the service account key we generated in the previous step.

To facilitate this process, we can make use of the [google-auth](https://pypi.org/project/google-auth/) Python library, as shown in the code sample below:

This code sample uses the service account key file "key.json" to request and generate an access token we can use for the Google Cloud APIs. After having obtained our access token, we can start using it to make calls to the Vertex AI PaLM 2 APIs.

3 | Calling the PaLM 2 API

As of today, there are two different PaLM 2 models available in Google Cloud: PaLM 2 for Text (i. e. [text-bison](https://cloud.google.com/vertex-ai/docs/generative-ai/text/text-overview)) and PaLM 2 for Chat (i. e. [chat-bison](https://cloud.google.com/vertex-ai/docs/generative-ai/chat/chat-prompts)). The documentation suggests using text-bison for text tasks that can be completed with one response, and chat-bison for text tasks that require more conversational, back-and-forth interactions.

Let’s start with the text-bison model. For these examples, we will be using the Python [requests](https://pypi.org/project/requests/) library to make the API calls. You can also use the Vertex AI SDK, if you prefer.

PaLM 2 for Text: Sentiment Analysis

The PaLM 2 for Text model can be used for various text-related tasks: including summarization, answering questions, sentiment analysis, etc. It takes the following parameters as input:

prompt: instructions of the task we want the model to perform.
temperature: controls the "creativity" of the model. If we want our model to be more open-ended and creative in its replies, we should increase the temperature. If we want it to be more deterministic, the temperature should be lower. Values range between 0 and 1.
maxOutputTokens: number of tokens to be generated in the output (1 token = 4 characters). Values range between 1 and 1024.
topK: changes the probability of how the model selects tokens for generating output. In each token selection step, the topK tokens with the highest probabilities are sampled and then further filtered with topP. The higher the value, the more random the responses will be. Values range between 1 and 40.
topP: changes the probability of how the model selects tokens for generating output. Tokens are selected until the sum of their probabilities equals topP. The higher the value, the more random the response is. Values range between 0 and 1.

_For more details on the parameters see this documentation._

In this first example, we will perform sentiment analysis on some sample product reviews:

sentences = ["I have been using this product for a long time. Somehow the 
 product I received this time seems to be a fake one. It's 
 very thick and the smell is very chemical.",
 "A good book for for dog owners. The book dosen't just 
 focus on dog training itself, but also on nutrition and care. 
 The book contains all the information needed for beginner dog 
 owners.",
 "This lamp was better than I expected when it comes to its 
 quality. Nonetheless, the colours are not exactly as 
 displayed and it does not really fit with the curtains that I 
 purchased recently. Therefore I give this 3 out of 5 stars."]

The instructions (i. e. the prompt) we will give the model need to be clearly stating the task we want the model to perform, as well as the output we expect it to generate. In our case, we ask it to go through each of the reviews in the sentences list and tell us what the sentiment of these is. We also instruct it to provide the output as a Python list.

prompt = f"What is the sentiment of each of these reviews: {sentences}. 
 Output should be in a python list."

Lastly, we need to define the input parameter values:

We set the temperature to 0 because for this task, we want to avoid the model being too creative. With a lower temperature, we configure the the model be more likely to output exactly the structure we requested.
We set 256 as maxOutputTokens, because it is approximately equivalent to 200 words and is a good length for our task.
We set topK to 40, because this is the model’s default value.
We set topK to 0.95, because this is the model’s default value.

We can now make the API call, similarly to any other API call we would make with the [requests](https://pypi.org/project/requests/) library, as shown below:

We can get the output of the response with response.json()["predictions"][0]["content"]:

"['negative', 'positive', 'positive']"

Neat! Now, let’s do the exact same and try to set the temperature parameter to 1.0. This will imply that our model’s output will become increasingly more creative. The output we get looks as follows:

"The sentiment of each of these reviews is:
1. Negative
2. Positive
3. Positive"

Indeed, we can see that the output is more creative than before and did not properly listen to our prompt mentioning that the "output should be in a python list". Therefore, this a great learning that a good choice of parameter values is very important to obtain the desired output from the model.

Now you might be asking yourself : "does this mean I cannot be sure that the model output is in the correct format?" There are various mechanisms you can set in place in order to test your model’s output and check whether it corresponds to the format you expected. From understanding and correctly configuring the model’s input parameters, to refining your prompt (i. e. prompt engineering), as well as implementing additional static tests that can validate the output’s structure. It is highly recommended to apply these techniques, as Generative AI can make mistakes and we want to be sure to tackle them before they happen.

PaLM 2 for Text: Text Generation

In our next example, let’s ask PaLM 2 to generate a welcome text for new customers landing on an e-commerce website. We want to welcome them and offer a 20% discount code on their first purchase. For this use case, we want our model to come up with something a bit more creative. Therefore, we will set the temperature to 0.5.

discount_code = "2023-promo"

prompt = f"Can you write 2-3 sentences that can be displayed on an 
 e-commerce webpage if someone first visits it? It should contain 
 the code '{discount_code}' which gives the customer 20% discount 
 on their first purchase."

The output is the following:

"**Introducing our 20% off first purchase offer!**

Use the code **2023-promo** at checkout to save 20% on your first 
purchase. This offer is valid for a limited time only, so don't miss out!

**Shop now and start saving!**"

Nice! It even gave us the output in markdown format. Let’s again try to adjust the temperature by setting it to 1.0 . The output is the following:

"**20% OFF YOUR FIRST PURCHASE**

Use code 2023-promo at checkout to save 20% on your first purchase from 
our store. This offer is valid for new customers only and expires on 
March 31, 2023.

**Shop now and save big!**"

We can again see how much more creative our model is compared to the previous output. It even came up with an offer expiration date, without us explicitly telling it to do so.

PaLM 2 for Chat: Conversational Assistant

In the last example, we will test the PaLM 2 for Chat API which is more focused on generating a conversational experience. The model we’ll be using is called chat-bison. It takes the following parameters as input:

messages: contains the messages and messaging history with the bot.
context: defines "guidelines" for the bot’s behavior. For example: what the bot’s name is, what its role is, vocabulary to include/exclude, etc.
examples: sample input and output of how the bot should respond in the conversation.
as well as temperature, maxOutputTokens, topK and topP which do the same as for the text-bison model in the previous section.

For more details on the parameters and what they exactly do, see this documentation.

Let’s create a chatbot which will serve as a customer support agent for an online gardening store "GardenWorld". The bot should respond to questions around plant and flower types, gardening tools, etc. We want the bot to always be friendly and welcoming to customers, and it should greet customers with "Hoowdy gardener! 🌱 " as well as motivate customers to sign up to the newsletter to receive a 10% discount code on their first purchase.

We can define this by setting the context and the examples parameters as shown below:

context = "You're a customer assistance bot for an online gardening store 
 called GardenWorld and you want to help customers. You give 
 advice around gardening best practices, plant and flower types. 
 Always welcome the customer by telling them to subscribe to 
 the GardenWorld newsletter to get a 10% discount on their first 
 purchase."

examples = [{
 "input": {"content": "Hi!"},
 "output": {"content": "Hoowdy gardener! 🌱 "}},
 {
 "input": {"content": "Hello"},
 "output": {"content": "Hoowdy gardener! 🌱 "}
 },
 {
 "input": {"content": "Why should I choose GardenWorld?"},
 "output": {"content": "GardenWorld has been the award winning 
 supplier for best quality plants and 
 gardening equipment. With over 25 years 
 of experience, we continuously provide 
 the best in class service to our 
 customers."}}
 ]

We will give our model three examples of what its output should look like. Of course, the more examples we give, the more the bot can be customized and the better the model will be able to learn what we expect from it. Since our use case is very simple, these three examples should give the bot enough information to provide us with the right answer structure. We will set the temperature to 1.0, maxOutputTokens to 256, topK to 40 and topP to 0.95.

Now, let’s start the conversation by saying "Hi!".

messages = [{
 "author": "user",
 "content": "Hi!",
 }]

Then we can make the API call:

After a few seconds, we get the following response back:

"Hoowdy gardener! 🌱 How can I help you today? Don't forget to subscribe 
to the GardenWorld newsletter to get a 10% discount on your first 
purchase!"

Nice! Our model did exactly what we asked it to do: it used the correct greeting (even including the emoji 🌱 ) and mentioned the 10% discount code.

In order to continue our conversation with the bot, we will have to update the conversation history accordingly. We can define two simple functions which we can invoke after every API call:

update_history() will add the most recent bot reply to the messages list. Then we can use send_message() to send a new question to the bot "I need a nice houseplant that is easy to take care of. What can you recommend?".

Now our messages variable looks as following, with the most recent message at the end of the list:

[{"author": "user", 
 "content": "Hi!" },
 {"author": "1"
 "content": "Hoowdy gardener! 🌱 How can I help you today? Don't forget 
 to subscribe to the GardenWorld newsletter to get a 10% 
 discount on your first purchase!" },
 {"author": "user",
 "content": "I need a nice houseplant that is easy to take care of. What 
 can you recommend?" }]

We can include the messages history in the next API call and the response we get back is the following:

"A snake plant is a great choice for a houseplant because it is very low 
maintenance and can thrive in a variety of conditions. They are also very 
tolerant of neglect, so they are perfect for people who are busy or 
forgetful."

It suggested us to choose the "Snake Plant" which is a houseplant known to be easy to take care of. Let’s also add this message to the messages history.

Then, let’s ask one last question "Any alternatives you can suggest? Also, can I get a discount?". The response we get back is:

"Sure! We offer a 10% discount to all first time customers who subscribe 
to our newsletter. A ZZ plant is another great option for a low-
maintenance houseplant. They are also very tolerant of neglect and can 
thrive in a variety of conditions.'

Great! As we see above, the bot suggested our discount code, and thanks to the messages history, it remembered the question we asked previously and suggested a "ZZ plant" an alternative plant which is easy to care for.

The results show us, how easily and quickly we can get a custom chatbot up and running: only by giving three examples and a high-level context , the bot was able to understand what we expected from it and correctly return the information during the conversation. Imagine we would provide it with hundreds, or even thousands of additional examples – the potential is huge!

4 | Conclusion

In this article we have seen how easy it is to make use of the Google Cloud PaLM 2 APIs, using both the text-bison and chat-bison models. We have seen how we can authenticate remotely to our Google Cloud project using the Python [google-auth](https://pypi.org/project/google-auth/) library and make calls to the APIs using the [requests](https://pypi.org/project/requests/) library. Lastly, we have seen how these APIs can be customized by adjusting and playing around with the input parameters.

I hope this article was informative for you and gave you some inspiration and ideas on how to get started using the PaLM 2 APIs! 🪴🤖

🔜🔜 In the upcoming part 2 of this article, we will focus on Prompt Engineering with PaLM 2 APIs and cover how we can make our input prompts and model parameter selections better.

Feedback or questions? Feel free to reach out in the comment section! 💬

📚 Keen on learning more? Check out Generative AI on Google Cloud or try out the free Generative AI Learning Path on Google Cloud Skills Boost.