DZone
Data Engineering
AI/ML
Content Moderation to Zero Shot Classification

Content Moderation to Zero Shot Classification

What if we wanted to analyze a small piece of text with no additional information or context and get the most reasonable label to define our own data?

👁 ganesh s user avatar

ganesh s

Aug. 18, 23 · Analysis

Likes (4)

Comment

Save

4.0K Views

Join the DZone community and get the full member experience.

Join For Free

What if we wanted to analyze a small piece of text with no additional information or context and be able to get the most reasonable label that we wish to define for our own data? This can feed the more deterministic policy engines and rule engines, and even be a part of a larger context-driven analysis as required. OpenAI does provide a means to "content moderate" with preset classifications that can determine if your text belongs to one or more of the more vile categories. However, this analysis is more about how we can get more custom to defining our own labels against a given sentence or phrase.

We will look at 4 categories: viz. politics, PHI/PII, legal matters, and company performance. Given that we don't have the option of gathering probability scores from Open AI on such custom labels (at this point in time), we will try the more user-oriented prompt engineering route in Option 1 while Option 2 evaluates other pre-trained models from Hugging Face for the same.

We will also go with some sample sentences that have been wontedly twisted to align with more than one category. For example, our CSV input file has the following lines as "payload":

The issue between ministers took a tangent when they started making it personal.
I tried to negotiate data privacy with my cat but he just ignored me and hacked my keyboard for a nap.
The senate hearing was about whether a drug in trials could be used for this patient alone. He has a specific condition with his blood that does not have a medicine as yet.
What started as a political debate ended up discussing company priorities for 2023 and beyond in terms of who has a better story with hyperscalers.
The court's landmark decision on free speech ignited discussions on the fine line between expression and harmful content in online platforms- intertwining legal considerations with debates over online governance.
I told my doctor a political joke during my PHI checkup now my medical record reads: Patient's sense of humor: dangerously bipartisan.
User-managed access gives you the so-called benefit of controlling your identity; but then how many people scrutinize the app permissions on your phone that leverage first name-email-phone numbers?

Option 1: Prompt Engineering With OpenAI

Python

from langchain.chat_models import ChatOpenAI
import pandas as pd 
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from IPython.display import HTML
model_name = 'gpt-4'
llm = ChatOpenAI(model_name = model_name, temperature=0)
payload_chain = LLMChain(llm=llm, prompt=moderationPrompt)

moderationPrompt = PromptTemplate(
 template = """
 Please rate the article below in a continuous scale 0.00-100.00 based on the presence and applicability of each category:
 [ politics | PHI | legal | about company | none of these ]
 Definitions:
 phi: protected health information or personally identifiable information present
 politics: political decisions, governance, parties, elections, policies
 legal: agreement or contract language, judgements
 about company: company strategy or earnings or reports or predictions
 article:{payload}
 Output: python floats in square bracket list
 """, input_variables=["payload"]
)

def read_csv_file(file_path):
 df = pd.read_csv(file_path)
 lines = df["payload"].dropna().tolist() 
 return lines
def perform_OpenAIclassification(lines, model_name):
 classifications = []
 for idx, sentence in enumerate(lines, start=1): # Start line number from 1 and increment
 if pd.notna(sentence): 
 result = payload_chain.run(payload = sentence)
 result = result.strip('][').split(', ')
 result.insert(0,idx) 
 result.insert(1, model_name)
 classifications.append(result)
 return classifications 

if __name__ == "__main__":
 input_csv_file = "./input.csv" # Replace with your CSV file path
 lines = read_csv_file(input_csv_file)
 result = perform_OpenAIclassification(lines, model_name)
 dfr = pd.DataFrame(result, columns = ['line#', 'model', 'Politics','PHI/PII','Legal','About Company','None of these'])
 output_csv_file = "gptOutput.csv" 
 dfr.to_csv(output_csv_file, index=False)

GPT-4 seems to be slightly better than the 3.5 turbo cousin at these twisted sentences. The output data frame would look like this. It does get the larger probability right most times except for sentences like #3 where we would have expected some "%" to be associated with PHI/PII. It also makes a case for us to request OpenAI to provide some customization convenience to tag our labels and leverage the faster and more "well-read" capability of such models.

line#	model	Politics	PHI/PII	Legal	About Company	None of these
1	gpt-4	100.00	0.00	0.00	0.00	0.00
2	gpt-4	0.00	0.00	0.00	0.00	100.00
3	gpt-4	100.00	0.00	0.00	0.00	0.00
4	gpt-4	70.00	0.00	0.00	30.00	0.00
5	gpt-4	70.00	0.00	85.00	0.00	0.00
6	gpt-4	10.00	20.00	0.00	0.00	70.00
7	gpt-4	0.00	50.00	0.00	0.00	50.0

Option 2: Zero Shot Classification With Models From Hugging Face

Moving on, next, we try the same with pre-trained models from Hugging Face - in some ways purpose-driven for this task in particular.

Python

import pandas as pd
from transformers import pipeline
from IPython.display import HTML

# Function for zero-shot classification
def classify_with_model(text_to_classify, candidate_labels, model_name_or_path, multi_label=True):
 classifier = pipeline("zero-shot-classification", model=model_name_or_path)
 output = classifier(text_to_classify, candidate_labels, multi_label=multi_label)
 return output

def read_csv_file(file_path):
 df = pd.read_csv(file_path)
 lines = df["payload"].dropna().tolist() 
 return lines

# Iterate through sentences and perform classification with multiple models
def perform_classification(lines, candidate_labels, model_options):
 classifications = []
 for model_name_or_path in model_options:
 model_classifications = []
 for idx, sentence in enumerate(lines, start=1): 
 if pd.notna(sentence): 
 result = classify_with_model(sentence, candidate_labels, model_name_or_path)
 model_used = model_name_or_path.split("/")[-1]
 result['scores'] = [round(i*100,2) for i in result['scores']]
 model_classifications.append(result['scores'])
 tempList = [idx, model_used, result['scores'][0], result['scores'][1], result['scores'][2], result['scores'][3], result['scores'][4]]
 classifications.append(tempList)
 return classifications

if __name__ == "__main__":
 input_csv_file = "./input.csv" 
 candidate_labels = ["Politics", "PHI/PII", "Legal", "Company performance", "None of these"]
 model_options = ["facebook/bart-large-mnli", "valhalla/distilbart-mnli-12-3", "MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli"]
 lines = read_csv_file(input_csv_file)

 model_results = perform_classification(lines, candidate_labels, model_options)
 dfr = pd.DataFrame(model_results, columns = ['line#', 'model', 'Politics','PHI/PII','Legal','About Company','None of these'])

 output_csv_file = "output_classifications.csv" 
 dfr.to_csv(output_csv_file, index=False)
 display(dfr)

Note: the multi_label value is set to True. You could play around with it being False as well.

Let us also use our own human expertise to review this output (last column). We could use a simple index like this:

Reasonable - Stands for the engine picking the multiple labels accurately
Partially accurate - One of the 2 labels is accurate
Inaccurate - Obviously not as good

line#	model	Politics	PHI/PII	Legal	About Company	None of these	Review
1	bart-large-mnli	69.47	34.74	0.85	0.21	0.03	Reasonable
2	bart-large-mnli	81.92	1.23	0.22	0.14	0.06	Inaccurate
3	bart-large-mnli	72.47	40.36	30.79	5.25	0.04	Reasonable
4	bart-large-mnli	86.27	28.26	14.43	0.39	0.03	Partially accurate
5	bart-large-mnli	68.21	35.23	18.78	13.24	0.02	Partially accurate
6	bart-large-mnli	98.53	90.45	6.31	0.73	0.02	Reasonable
7	bart-large-mnli	81.23	6.79	2.17	1.55	0.04	Inaccurate
1	distilbart-mnli-12-3	88.65	9.08	5.91	4.1	1.7	Partially accurate
2	distilbart-mnli-12-3	64.87	7.77	2.72	2.38	0.26	Inaccurate
3	distilbart-mnli-12-3	76.79	42.79	36.2	20.3	1.98	Reasonable
4	distilbart-mnli-12-3	60.8	49.22	9.91	6.68	0.45	Partially accurate
5	distilbart-mnli-12-3	82.97	55.31	41.59	15	0.99	Reasonable
6	distilbart-mnli-12-3	87.11	85.6	11.07	7.74	0.12	Reasonable
7	distilbart-mnli-12-3	79.02	6.58	3.31	1.18	0.95	Inaccurate
1	DeBERTa-v3-large-mnli-fever-anli-ling-wanli	36.51	1.27	0.15	0.14	0.02	Partially accurate
2	DeBERTa-v3-large-mnli-fever-anli-ling-wanli	17.58	0.72	0.4	0.05	0.03	Inaccurate
3	DeBERTa-v3-large-mnli-fever-anli-ling-wanli	95.69	59.7	26.89	0.45	0.07	Reasonable
4	DeBERTa-v3-large-mnli-fever-anli-ling-wanli	95.07	79.32	17.91	0.07	0.05	Partially accurate
5	DeBERTa-v3-large-mnli-fever-anli-ling-wanli	61.88	28.35	8.16	0.06	0.03	Partially accurate
6	DeBERTa-v3-large-mnli-fever-anli-ling-wanli	99.64	93.95	0.83	0.07	0.03	Reasonable
7	DeBERTa-v3-large-mnli-fever-anli-ling-wanli	2.48	1.41	0.08	0.06	0.04	Inaccurate

Too small a dataset to derive a concrete outcome, but they all seem to be in relatively comparable space for this task.

	Reasonable	Partially accurate	Inaccurate
bart-large-mnli	3	2	2
distilbart-mnli-12-3	3	2	2
DeBERTa-v3-large-mnli-fever-anli-ling-wanli	2	3	2

Summary

Large language models are like one-size fits all for many purposes. For scenarios where we have very little context to lean on where custom labels are required for zero-shot classification, we still have the option of going for the alternatives that are trained on the more special-purpose NLI (natural language inference) models such as those given above. The final choice for a given requirement could be based on performance (when used in real-time transactions), the extent of additional context that can effectively make this more deterministic and ease of integration for a given ecosystem.

Note: A special word of thanks to those in forums that have corrected my code or shared suggestions on how to use these models better. Specifically, the Open AI forum had someone that shared this intuition on how best to query GPT to get at results that are not otherwise available through API calls.

AI Data (computing) Payload (computing) Language model Machine learning

Published at DZone with permission of ganesh s. See the original article here.

Opinions expressed by DZone contributors are their own.

Implementing Ethical AI: Practical Techniques for Aligning AI Agents With Human Values
Architecting AI-Native Cloud Platforms: Signals to Insights to Actions
Securing AI/ML Workloads in the Cloud: Integrating DevSecOps with MLOps
Where AI Fits and Fails in Workday Integrations

URL: https://dzone.com/articles/content-moderation-to-zero-shot-classification