Skip to content

Instantly share code, notes, and snippets.

@suneel-pi
Created September 16, 2025 06:25
Show Gist options
  • Save suneel-pi/5844b76e4994cc48f77743fa95f74211 to your computer and use it in GitHub Desktop.
Save suneel-pi/5844b76e4994cc48f77743fa95f74211 to your computer and use it in GitHub Desktop.
pi-scorer-arize-phoenix.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/suneel-pi/5844b76e4994cc48f77743fa95f74211/pi-scorer-arize-phoenix.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"<a href=\"https://withpi.ai\"><img src=\"https://play.withpi.ai/logo/logoFullBlack.svg\" width=\"240\"></a>\n",
"\n",
"<a href=\"https://code.withpi.ai\"><font size=\"4\">Documentation</font></a>\n",
"\n",
"<a href=\"https://build.withpi.ai\"><font size=\"4\">Copilot</font></a>\n",
"\n",
"<a href=\"https://discord.com/invite/adh94jZH37\"><font size=\"4\">Support</font></a>"
],
"metadata": {
"id": "pi-masthead"
}
},
{
"cell_type": "markdown",
"source": [
"[Pi Scorer](https://withpi.ai) offers an alternative to LLM-as-a-judge with several advantages:\n",
"* **Fast**: much faster than calling an LLM for the same task\n",
"* **Highly consistent**: always returns the same score for the same inputs\n",
"* **Granular**: gives a score for each granular dimension, combining them into a single aggregate score\n",
"\n",
"This notebook demonstrates how to perform offline evals of spans from [Arize Phoenix](https://phoenix.arize.com/), a platform for LLM observability and evaluation. In this notebook, you will\n",
"1. Set up Arize Phoenix, Google Gemini, and Pi\n",
"2. Generate and capture some example traces with Phoenix\n",
"3. Pull the captured spans from Phoenix\n",
"4. Score the spans with your scorer\n",
"5. Log the evaluations to Phoenix."
],
"metadata": {
"id": "m669TxSGVUmm"
}
},
{
"cell_type": "markdown",
"source": [
"Phoenix can instrument your LLM provider automatically for observability. This notebook uses Google Gemini for demonstrating inference, so we install the OpenInference instrumentation for Google GenAI."
],
"metadata": {
"id": "C2zIt-mtVVo1"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Uc9WGlKvEOXx"
},
"outputs": [],
"source": [
"%%capture\n",
"%pip install -U arize-phoenix withpi google-genai openinference-instrumentation-google-genai"
]
},
{
"cell_type": "code",
"source": [
"# @title Setup API Keys\n",
"\n",
"from google.colab import userdata\n",
"import os\n",
"\n",
"# In the Phoenix dashboard sidebar, go to \"API Keys\" to find your API key.\n",
"os.environ['PHOENIX_CLIENT_HEADERS'] = f'api_key={userdata.get(\"PHOENIX_API_KEY\")}'\n",
"os.environ['PHOENIX_COLLECTOR_ENDPOINT'] = 'https://app.phoenix.arize.com'\n",
"\n",
"# Get a Gemini API key or add your own https://aistudio.google.com/app/apikey\n",
"os.environ['GEMINI_API_KEY'] = userdata.get('GEMINI_API_KEY')\n",
"\n",
"# Get a Pi API key: https://build.withpi.ai/account/keys\n",
"os.environ[\"WITHPI_API_KEY\"] = userdata.get('WITHPI_API_KEY')"
],
"metadata": {
"id": "Qh-NG9Y7HZxy"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Phoenix Tracing Setup\n",
"\n",
"from phoenix.otel import register\n",
"\n",
"PROJECT_NAME = \"Legal Document Summarization\"\n",
"\n",
"# Since we installed `openinference-instrumentation-google-genai`, all calls\n",
"# to Google GenAI will be traced automatically.\n",
"tracer_provider = register(\n",
" project_name=PROJECT_NAME,\n",
" endpoint=\"https://app.phoenix.arize.com/v1/traces\",\n",
" auto_instrument=True\n",
")"
],
"metadata": {
"id": "mtKp_YIXGBWl"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"from google import genai\n",
"from google.genai import types\n",
"from IPython.display import Markdown\n",
"from google.colab import userdata\n",
"\n",
"google = genai.Client(api_key=os.environ['GEMINI_API_KEY'])\n",
"\n",
"def generate(prompt: str) -> str:\n",
" system_instruction = \"\"\"\n",
"You are a helpful assistant that summarizes legal documents into\n",
"easily understandable language for laypeople. Your goal is to\n",
"provide clear and concise summaries that capture the key aspects\n",
"of complex legal texts, making legal information accessible to\n",
"individuals without legal expertise. The expected output is a\n",
"simplified summary of the original legal document.\n",
" \"\"\"\n",
" response = google.models.generate_content(\n",
" model=\"gemini-2.0-flash\",\n",
" config=types.GenerateContentConfig(\n",
" system_instruction=system_instruction\n",
" ),\n",
" contents=prompt\n",
" )\n",
"\n",
" return response.text\n",
"\n",
"# Inputs imported from https://withpi.ai/project/k4krWFWe7JOgczurElRy\n",
"inputs = [\n",
" \"\"\"{\"Clause\":\"The Recipient shall indemnify, defend, and hold harmless the Disclosing Party from and against any and all losses, damages, liabilities, deficiencies, claims, actions, judgments, settlements, interest, awards, penalties, fines, costs, or expenses of whatever kind, including reasonable attorneys' fees, arising out of or relating to any third-party claim alleging a breach by the Recipient of its obligations under this Agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The Recipient shall indemnify, defend, and hold harmless the Disclosing Party from and against any and all losses, damages, liabilities, deficiencies, claims, actions, judgments, settlements, interest, awards, penalties, fines, costs, or expenses of whatever kind, including reasonable attorneys' fees, arising out of or relating to any third-party claim alleging a breach by the Recipient of its obligations under this Agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"If any term or provision of this Agreement is found by a court of competent jurisdiction to be invalid, illegal, or unenforceable, such invalidity, illegality, or unenforceability shall not affect any other term or provision of this Agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The relationship between the parties is that of independent contractors. Nothing contained in this Agreement shall be construed as creating any agency, partnership, joint venture, or other form of joint enterprise, employment, or fiduciary relationship between the parties.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The relationship between the parties is that of independent contractors. Nothing contained in this Agreement shall be construed as creating any agency, partnership, joint venture, or other form of joint enterprise, employment, or fiduciary relationship between the parties.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"No waiver by any party of any of the provisions hereof shall be effective unless explicitly set forth in writing and signed by the party so waiving. No failure to exercise, or delay in exercising, any right, remedy, power, or privilege arising from this Agreement shall operate or be construed as a waiver thereof.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"This Agreement may be executed in counterparts, each of which is deemed an original, but all of which together are deemed to be one and the same agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The headings in this Agreement are for reference only and do not affect the interpretation of this Agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"All notices or other communications required or permitted hereunder shall be in writing and shall be deemed to have been duly given when delivered in person, by nationally recognized overnight courier, or by registered or certified mail, return receipt requested, postage prepaid.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"All notices or other communications required or permitted hereunder shall be in writing and shall be deemed to have been duly given when delivered in person, by nationally recognized overnight courier, or by registered or certified mail, return receipt requested, postage prepaid.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"Neither party may use the other party's name or logo in any press release, advertising, or other promotional materials without the other party's prior written consent.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"Neither party may use the other party's name or logo in any press release, advertising, or other promotional materials without the other party's prior written consent.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The parties agree to negotiate in good faith to resolve any dispute between them regarding this Agreement before resorting to arbitration.\"}\"\"\",\n",
"]\n",
"\n",
"for input in inputs:\n",
" # Each generation will be traced automatically in Phoenix. You can confirm\n",
" # this is the case by visiting the project dashboard.\n",
" generate(input)\n"
],
"metadata": {
"id": "5YC3vb_tU94h"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Pi Scorer Setup\n",
"\n",
"import os\n",
"import requests\n",
"import withpi\n",
"import pandas as pd\n",
"\n",
"pi = withpi.PiClient(api_key=os.environ['WITHPI_API_KEY'])\n",
"\n",
"# Scoring spec imported from https://withpi.ai/project/k4krWFWe7JOgczurElRy\n",
"rubric = [\n",
" {\n",
" \"label\": \"Clarity of Explanation\",\n",
" \"question\": \"Does the explanation provide a clear and detailed understanding of the clause for a layperson?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Logical Flow\",\n",
" \"question\": \"Does the explanation follow a logical flow that makes it easy to follow and understand?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Conciseness and Completeness\",\n",
" \"question\": \"Is the explanation concise while still being informative and complete?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Specificity of Examples\",\n",
" \"question\": \"Does the explanation include specific examples or scenarios to illustrate the clause&#x27;s application?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Use of Everyday Language\",\n",
" \"question\": \"Is the summarized content presented in a way that is clear and comprehensible to the general public?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Layperson Accessibility\",\n",
" \"question\": \"Is the explanation simplified enough to be understood by someone without specialized legal knowledge?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Conciseness and Sufficiency\",\n",
" \"question\": \"Is the explanation concise while still providing sufficient information to understand the clause?\",\n",
" \"weight\": 1\n",
" },\n",
" {\n",
" \"label\": \"Error-Free Language\",\n",
" \"question\": \"Does the explanation avoid errors in grammar and spelling?\",\n",
" \"weight\": 1\n",
" },\n",
"]\n",
"\n",
"def score(input: str, output: str) -> pd.DataFrame:\n",
" result = pi.scoring_system.score(llm_input=input, llm_output=output, scoring_spec=rubric)\n",
" return pd.Series(data={\n",
" 'Total Score': result.total_score,\n",
" **{ question: score for question, score in result.question_scores.items() }\n",
" })\n"
],
"metadata": {
"id": "eDMk9LVfGLx6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Pull Traces from Phoenix\n",
"\n",
"import phoenix as px\n",
"from phoenix.trace import SpanEvaluations\n",
"from phoenix.session.evaluation import get_qa_with_reference\n",
"import json\n",
"\n",
"def inputs_and_outputs(spans: pd.DataFrame) -> pd.DataFrame:\n",
" # See the OpenInference semantic conventions for how Phoenix stores messages:\n",
" # https://github.com/Arize-ai/openinference/blob/main/spec/semantic_conventions.md\n",
"\n",
" def parse_input(inp):\n",
" try:\n",
" return json.loads(inp)['contents']\n",
" except:\n",
" return inp\n",
"\n",
" def parse_output(out):\n",
" try:\n",
" return json.loads(out)['candidates'][0]['content']['parts'][0]['text']\n",
" except:\n",
" return out\n",
"\n",
" io = pd.DataFrame(columns=['Input', 'Output'])\n",
" io['Input'] = spans['attributes.input.value'].apply(parse_input)\n",
" io['Output'] = spans['attributes.output.value'].apply(parse_output)\n",
" return io\n",
"\n",
"spans_df = inputs_and_outputs(px.Client().get_spans_dataframe(project_name=PROJECT_NAME))\n",
"spans_df.head()"
],
"metadata": {
"id": "U0SHvfsvaSqG"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Evaluate Spans\n",
"\n",
"import concurrent.futures\n",
"\n",
"def eval_spans(spans: pd.DataFrame) -> pd.DataFrame:\n",
" with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:\n",
" evals = list(executor.map(score, spans['Input'], spans['Output']))\n",
" return pd.DataFrame(evals, index=spans.index)\n",
"\n",
"evals_df = eval_spans(spans_df)\n",
"spans_df.join(evals_df).head()"
],
"metadata": {
"id": "DN1cV65XhbHF"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Log Evaluations to Phoenix\n",
"\n",
"for dimension in evals_df.columns:\n",
" span_evals = SpanEvaluations(dimension, evals_df[[dimension]].rename(columns={dimension: 'score'}))\n",
" px.Client().log_evaluations(span_evals)"
],
"metadata": {
"id": "kspiKiCG7OEY"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Go to your project in Phoenix Dashboard. Each trace should have annotations for the total score and all dimension scores."
],
"metadata": {
"id": "0C1CZyBSwTuq"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment