Skip to content

Instantly share code, notes, and snippets.

@shubhamkakkar
Created March 16, 2026 19:28
Show Gist options
  • Select an option

  • Save shubhamkakkar/2f5a307783ee368c67be92f4b750cf71 to your computer and use it in GitHub Desktop.

Select an option

Save shubhamkakkar/2f5a307783ee368c67be92f4b750cf71 to your computer and use it in GitHub Desktop.
Filter Type What It Does
Harmful Language Filters Detect and block hate speech, abusive language or profanity. Sensitivity thresholds can be tuned to balance safety with the risk of false positives.
PII Filters Identify personally identifiable information, such as phone numbers, emails or account numbers, and prevent it from being exposed.
Advanced Safety Filters Use more comprehensive models to flag issues like jailbreak attempts, bias, hallucinated responses or violent, unethical content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment