Recently, I had the opportunity to work with GPT APIs on one of my projects. The task was straightforward: monitor messages in a chat group and classify each one with a specific violation reason based on predefined guidelines. For example, reasons like "advertising" or "harassment" needed to be clearly identified and labelled.
While GPT APIs are incredibly powerful, they have a well-documented tendency to "hallucinate," or generate responses that deviate from expectations. In this case, it meant the model could return outputs that didn’t strictly adhere to the required set of enums. For our project, accuracy was non-negotiable—we needed the model to return the exact predefined reasons, not approximations or alternatives.
It’s just not this task. For most things, devs would always need a structured fixed json response from any model. Simple text is prone to hallucinations and error-prone.
# What we want
{
"message_id": "123",
"violation_type": "ADVERTISING",
}
# What we might get
"This message appears to be advertising content..."
To tackle, this we can send a JSON schema with the API call and GPT will respond using the defined schema only. Here’s an example of the schema we used:
ViolationSchema = [{
"name": "classify_violation",
"description": "Classifies a message according to content guidelines",
"parameters": {
"type": "object",
"properties": {
"violation_type": {
"type": "string",
"enum": ["ADVERTISING", "HARASSMENT", "SPAM", "NONE"]
},
"message_id": {
"type": "string",
}
},
"required": ["violation_type", "message_id"]
}
}]
We can send this schema with every API call and GPT will adhere to the given schema
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Moderate the message"},
{"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
],
response_format=ViolationSchema,
)
Happy Coding 😃