| Filter Type | What It Does |
|---|---|
| Harmful Language Filters | Detect and block hate speech, abusive language or profanity. Sensitivity thresholds can be tuned to balance safety with the risk of false positives. |
| PII Filters | Identify personally identifiable information, such as phone numbers, emails or account numbers, and prevent it from being exposed. |
| Advanced Safety Filters | Use more comprehensive models to flag issues like jailbreak attempts, bias, hallucinated responses or violent, unethical content. |
Created
March 16, 2026 19:28
-
-
Save shubhamkakkar/2f5a307783ee368c67be92f4b750cf71 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment