Skip to content

Instantly share code, notes, and snippets.

@suneel-pi
Created September 16, 2025 06:24
Show Gist options
  • Select an option

  • Save suneel-pi/fbce6f246fa7893fae00570da455515d to your computer and use it in GitHub Desktop.

Select an option

Save suneel-pi/fbce6f246fa7893fae00570da455515d to your computer and use it in GitHub Desktop.
Pi Scorer + Braintrust.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/suneel-pi/fbce6f246fa7893fae00570da455515d/pi-scorer-braintrust.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"<a href=\"https://withpi.ai\"><img src=\"https://play.withpi.ai/logo/logoFullBlack.svg\" width=\"240\"></a>\n",
"\n",
"<a href=\"https://code.withpi.ai\"><font size=\"4\">Documentation</font></a>\n",
"\n",
"<a href=\"https://build.withpi.ai\"><font size=\"4\">Copilot</font></a>"
],
"metadata": {
"id": "pi-masthead"
}
},
{
"cell_type": "markdown",
"source": [
"[Pi Scorer](https://build.withpi.ai) offers an alternative to LLM-as-a-judge with several advantages:\n",
"\n",
"* Significantly faster\n",
"\n",
"* Highly consistent — always returns the same score for the same inputs\n",
"\n",
"* Eliminates the need for prompt tuning or adjustments"
],
"metadata": {
"id": "m669TxSGVUmm"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Uc9WGlKvEOXx"
},
"outputs": [],
"source": [
"%%capture\n",
"%pip install -U braintrust openai datasets autoevals"
]
},
{
"cell_type": "code",
"source": [
"# @title Setup API Keys\n",
"\n",
"from google.colab import userdata\n",
"import os\n",
"\n",
"os.environ[\"BRAINTRUST_API_KEY\"] = userdata.get(\"BRAINTRUST_API_KEY\")\n",
"\n",
"# Get PI API key: https://build.withpi.ai/account/keys\n",
"os.environ[\"WITHPI_API_KEY\"] = userdata.get('WITHPI_API_KEY')"
],
"metadata": {
"id": "Qh-NG9Y7HZxy"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Load a sample dataset\n",
"\n",
"from datasets import load_dataset\n",
"\n",
"inputs = [\n",
" \"\"\"{\"Clause\":\"The Recipient shall indemnify, defend, and hold harmless the Disclosing Party from and against any and all losses, damages, liabilities, deficiencies, claims, actions, judgments, settlements, interest, awards, penalties, fines, costs, or expenses of whatever kind, including reasonable attorneys' fees, arising out of or relating to any third-party claim alleging a breach by the Recipient of its obligations under this Agreement.\",\"Explanation\":\"If you break this agreement and we get sued because of it, you are responsible for all the legal costs and any money we have to pay. This includes our lawyer fees and any damages awarded by a court.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The Recipient shall indemnify, defend, and hold harmless the Disclosing Party from and against any and all losses, damages, liabilities, deficiencies, claims, actions, judgments, settlements, interest, awards, penalties, fines, costs, or expenses of whatever kind, including reasonable attorneys' fees, arising out of or relating to any third-party claim alleging a breach by the Recipient of its obligations under this Agreement.\",\"Explanation\":\"If someone sues us because of something you did wrong, you have to pay for it.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"If any term or provision of this Agreement is found by a court of competent jurisdiction to be invalid, illegal, or unenforceable, such invalidity, illegality, or unenforceability shall not affect any other term or provision of this Agreement.\",\"Explanation\":\"If one part of the contract is bad, the rest is still good.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The relationship between the parties is that of independent contractors. Nothing contained in this Agreement shall be construed as creating any agency, partnership, joint venture, or other form of joint enterprise, employment, or fiduciary relationship between the parties.\",\"Explanation\":\"We are two separate companies working together on a project. This contract does not make us business partners, create a new company together, or make one of us an employee of the other.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The relationship between the parties is that of independent contractors. Nothing contained in this Agreement shall be construed as creating any agency, partnership, joint venture, or other form of joint enterprise, employment, or fiduciary relationship between the parties.\",\"Explanation\":\"We are independent contractors, not partners or employees.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"No waiver by any party of any of the provisions hereof shall be effective unless explicitly set forth in writing and signed by the party so waiving. No failure to exercise, or delay in exercising, any right, remedy, power, or privilege arising from this Agreement shall operate or be construed as a waiver thereof.\",\"Explanation\":\"A waiver has to be in writing. Not exercising a right isn't a waiver.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"This Agreement may be executed in counterparts, each of which is deemed an original, but all of which together are deemed to be one and the same agreement.\",\"Explanation\":\"We can sign different copies and it's still one agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The headings in this Agreement are for reference only and do not affect the interpretation of this Agreement.\",\"Explanation\":\"The titles of each section (like 'Confidentiality' or 'Term') are just there to make the document easier to read. They don't change the legal meaning of the paragraphs within those sections.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"All notices or other communications required or permitted hereunder shall be in writing and shall be deemed to have been duly given when delivered in person, by nationally recognized overnight courier, or by registered or certified mail, return receipt requested, postage prepaid.\",\"Explanation\":\"Any official message about this contract must be in writing. We'll consider it officially delivered only if it's handed over in person, sent via a major overnight service like FedEx, or mailed using registered or certified mail.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"All notices or other communications required or permitted hereunder shall be in writing and shall be deemed to have been duly given when delivered in person, by nationally recognized overnight courier, or by registered or certified mail, return receipt requested, postage prepaid.\",\"Explanation\":\"You have to send us notices in writing.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"Neither party may use the other party's name or logo in any press release, advertising, or other promotional materials without the other party's prior written consent.\",\"Explanation\":\"You can't use our company name or logo for your own marketing or announcements, and we can't use yours, unless you get our explicit permission in writing beforehand.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"Neither party may use the other party's name or logo in any press release, advertising, or other promotional materials without the other party's prior written consent.\",\"Explanation\":\"Don't use our name or logo without asking.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The parties agree to negotiate in good faith to resolve any dispute between them regarding this Agreement before resorting to arbitration.\",\"Explanation\":\"We have to negotiate before arbitration.\"}\"\"\",\n",
"]\n",
"\n",
"data = [{\"input\": topic} for topic in inputs]\n",
"\n",
"display(data)"
],
"metadata": {
"id": "WQ8sWNn-Fl4i"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Braintrust tracing setup\n",
"\n",
"import braintrust\n",
"from openai import OpenAI\n",
"\n",
"MODEL = \"gpt-4o-mini\"\n",
"\n",
"client = braintrust.wrap_openai(\n",
" OpenAI(\n",
" base_url=\"https://api.braintrust.dev/v1/proxy\",\n",
" api_key=os.environ[\"BRAINTRUST_API_KEY\"],\n",
" )\n",
")\n",
"\n",
"@braintrust.traced\n",
"def generate_data(input):\n",
" messages = [\n",
" {\n",
" \"role\": \"system\",\n",
" \"content\": \"\"\"\n",
"You are a helpful assistant that summarizes legal documents into\n",
"easily understandable language for laypeople. Your goal is to\n",
"provide clear and concise summaries that capture the key aspects\n",
"of complex legal texts, making legal information accessible to\n",
"individuals without legal expertise. The expected output is a\n",
"simplified summary of the original legal document.\n",
"\"\"\",\n",
" },\n",
" {\n",
" \"role\": \"user\",\n",
" \"content\": input,\n",
" },\n",
" ]\n",
" result = client.chat.completions.create(\n",
" model=MODEL,\n",
" messages=messages,\n",
" max_tokens=4096,\n",
" )\n",
" return result.choices[0].message.content"
],
"metadata": {
"id": "mtKp_YIXGBWl"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Pi Scorer Setup\n",
"\n",
"import os\n",
"import requests\n",
"from autoevals import ScorerWithPartial\n",
"from braintrust_core.score import Score\n",
"\n",
"PI_API_URL = \"https://api.withpi.ai/v1/scoring_system/score\"\n",
"HEADERS = {\n",
" \"Content-Type\": \"application/json\",\n",
" \"x-api-key\": os.environ.get(\"WITHPI_API_KEY\"),\n",
"}\n",
"\n",
"rubric = [\n",
" {\n",
" \"label\": \"Clarity of Explanation\",\n",
" \"question\": \"Does the explanation provide a clear and detailed understanding of the clause for a layperson?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Logical Flow\",\n",
" \"question\": \"Does the explanation follow a logical flow that makes it easy to follow and understand?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Conciseness and Completeness\",\n",
" \"question\": \"Is the explanation concise while still being informative and complete?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Specificity of Examples\",\n",
" \"question\": \"Does the explanation include specific examples or scenarios to illustrate the clause&#x27;s application?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Use of Everyday Language\",\n",
" \"question\": \"Is the summarized content presented in a way that is clear and comprehensible to the general public?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Layperson Accessibility\",\n",
" \"question\": \"Is the explanation simplified enough to be understood by someone without specialized legal knowledge?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Conciseness and Sufficiency\",\n",
" \"question\": \"Is the explanation concise while still providing sufficient information to understand the clause?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Error-Free Language\",\n",
" \"question\": \"Does the explanation avoid errors in grammar and spelling?\",\n",
" \"weight\": 1\n",
" },\n",
"]\n",
"#########\n",
"\n",
"class PiScorerBase(ScorerWithPartial):\n",
" question: str = \"\"\n",
" label: str = \"\"\n",
"\n",
" def __init__(self, question: str, label: str):\n",
" self.question = question\n",
" self.label = label\n",
"\n",
" def _run_eval_sync(self, output, expected=None, **kwargs):\n",
" assert \"input\" in kwargs, \"Missing 'input' in kwargs\"\n",
" payload = {\n",
" \"llm_input\": kwargs[\"input\"],\n",
" \"llm_output\": output,\n",
" \"scoring_spec\": [{\"question\": self.question}]\n",
" }\n",
" response = requests.post(PI_API_URL, headers=HEADERS, json=payload)\n",
" pi_score = response.json()\n",
" return Score(name=self.label or self._name(), score=pi_score[\"total_score\"])\n"
],
"metadata": {
"id": "eDMk9LVfGLx6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Run eval\n",
"\n",
"await braintrust.Eval(\n",
" \"Legal Document Summarization\",\n",
" data=data,\n",
" task=generate_data,\n",
" scores=[PiScorerBase(question=criterion[\"question\"], label=criterion[\"label\"]) for criterion in rubric],\n",
" experiment_name=\"Legal Document Summarization\",\n",
")"
],
"metadata": {
"id": "5CpoSS6BGU0z"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"See results: https://www.braintrust.dev/app/Pi%20Labs/p/Blog%20Post%20Generator/experiments/Pi%20Blog%20Post-5e80589f"
],
"metadata": {
"id": "sePvbeaCxoCO"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment