Skip to content

Instantly share code, notes, and snippets.

@suneel-pi
Last active September 16, 2025 06:32
Show Gist options
  • Save suneel-pi/bcc6e0af2f9ecbc48bbf89d05f205951 to your computer and use it in GitHub Desktop.
Save suneel-pi/bcc6e0af2f9ecbc48bbf89d05f205951 to your computer and use it in GitHub Desktop.
pi-scorer-langfuse.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/suneel-pi/bcc6e0af2f9ecbc48bbf89d05f205951/pi-scorer-langfuse.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"<a href=\"https://withpi.ai\"><img src=\"https://play.withpi.ai/logo/logoFullBlack.svg\" width=\"240\"></a>\n",
"\n",
"<a href=\"https://code.withpi.ai\"><font size=\"4\">Documentation</font></a>\n",
"\n",
"<a href=\"https://build.withpi.ai\"><font size=\"4\">Copilot</font></a>"
],
"metadata": {
"id": "pi-masthead"
}
},
{
"cell_type": "markdown",
"source": [
"[Pi Scorer](https://build.withpi.ai) offers an alternative to LLM-as-a-judge with several advantages:\n",
"\n",
"* Significantly faster\n",
"\n",
"* Highly consistent — always returns the same score for the same inputs\n",
"\n",
"* Eliminates the need for prompt tuning or adjustments"
],
"metadata": {
"id": "eiR5tdXsVdNk"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PONlf847Xp-A"
},
"outputs": [],
"source": [
"%%capture\n",
"%pip install langfuse openai langchain_openai langchain datasets --upgrade"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fAquPGo3X8hg"
},
"outputs": [],
"source": [
"# @title Setup API Keys\n",
"\n",
"import os\n",
"from google.colab import userdata\n",
"\n",
"# Get keys for your project from the project settings page\n",
"# https://cloud.langfuse.com\n",
"os.environ[\"LANGFUSE_PUBLIC_KEY\"] = userdata.get(\"LANGFUSE_PUBLIC_KEY\")\n",
"os.environ[\"LANGFUSE_SECRET_KEY\"] = userdata.get(\"LANGFUSE_SECRET_KEY\")\n",
"# os.environ[\"LANGFUSE_HOST\"] = \"https://cloud.langfuse.com\" # 🇪🇺 EU region\n",
"os.environ[\"LANGFUSE_HOST\"] = \"https://us.cloud.langfuse.com\" # 🇺🇸 US region\n",
"\n",
"# Your openai key\n",
"os.environ[\"OPENAI_API_KEY\"] = userdata.get(\"OPENAI_API_KEY\")\n",
"\n",
"# Get PI API key: https://build.withpi.ai/account/keys\n",
"os.environ[\"WITHPI_API_KEY\"] = userdata.get('WITHPI_API_KEY')"
]
},
{
"cell_type": "code",
"source": [
"# @title Load a blog-post dataset to Langfuse\n",
"\n",
"from datasets import load_dataset\n",
"from langfuse import Langfuse\n",
"\n",
"# We included the inputs from Pi Studio, but feel free to add more.\n",
"inputs = [\n",
" \"\"\"{\"Clause\":\"The Recipient shall indemnify, defend, and hold harmless the Disclosing Party from and against any and all losses, damages, liabilities, deficiencies, claims, actions, judgments, settlements, interest, awards, penalties, fines, costs, or expenses of whatever kind, including reasonable attorneys' fees, arising out of or relating to any third-party claim alleging a breach by the Recipient of its obligations under this Agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The Recipient shall indemnify, defend, and hold harmless the Disclosing Party from and against any and all losses, damages, liabilities, deficiencies, claims, actions, judgments, settlements, interest, awards, penalties, fines, costs, or expenses of whatever kind, including reasonable attorneys' fees, arising out of or relating to any third-party claim alleging a breach by the Recipient of its obligations under this Agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"If any term or provision of this Agreement is found by a court of competent jurisdiction to be invalid, illegal, or unenforceable, such invalidity, illegality, or unenforceability shall not affect any other term or provision of this Agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The relationship between the parties is that of independent contractors. Nothing contained in this Agreement shall be construed as creating any agency, partnership, joint venture, or other form of joint enterprise, employment, or fiduciary relationship between the parties.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The relationship between the parties is that of independent contractors. Nothing contained in this Agreement shall be construed as creating any agency, partnership, joint venture, or other form of joint enterprise, employment, or fiduciary relationship between the parties.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"No waiver by any party of any of the provisions hereof shall be effective unless explicitly set forth in writing and signed by the party so waiving. No failure to exercise, or delay in exercising, any right, remedy, power, or privilege arising from this Agreement shall operate or be construed as a waiver thereof.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"This Agreement may be executed in counterparts, each of which is deemed an original, but all of which together are deemed to be one and the same agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The headings in this Agreement are for reference only and do not affect the interpretation of this Agreement.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"All notices or other communications required or permitted hereunder shall be in writing and shall be deemed to have been duly given when delivered in person, by nationally recognized overnight courier, or by registered or certified mail, return receipt requested, postage prepaid.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"All notices or other communications required or permitted hereunder shall be in writing and shall be deemed to have been duly given when delivered in person, by nationally recognized overnight courier, or by registered or certified mail, return receipt requested, postage prepaid.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"Neither party may use the other party's name or logo in any press release, advertising, or other promotional materials without the other party's prior written consent.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"Neither party may use the other party's name or logo in any press release, advertising, or other promotional materials without the other party's prior written consent.\"}\"\"\",\n",
" \"\"\"{\"Clause\":\"The parties agree to negotiate in good faith to resolve any dispute between them regarding this Agreement before resorting to arbitration.\"}\"\"\",\n",
"]\n",
"data = [{\"input\": topic} for topic in inputs]\n",
"display(data)\n",
"\n",
"# Upload to Langfuse\n",
"langfuse = Langfuse()\n",
"langfuse.create_dataset(name=\"legal_docs\")\n",
"for item in data:\n",
" langfuse.create_dataset_item(\n",
" dataset_name=\"legal_docs\",\n",
" # any python object or value\n",
" input=item[\"input\"]\n",
" )"
],
"metadata": {
"id": "03cwaJ_SAeVJ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Application Setup\n",
"\n",
"from langfuse.openai import openai\n",
"from langfuse.decorators import observe, langfuse_context\n",
"\n",
"@observe()\n",
"def run_my_custom_llm_app(input, system_prompt):\n",
" messages = [\n",
" {\"role\":\"system\", \"content\": system_prompt},\n",
" {\"role\":\"user\", \"content\": input}\n",
" ]\n",
"\n",
" completion = openai.chat.completions.create(\n",
" model=\"gpt-4o-mini\",\n",
" messages=messages\n",
" ).choices[0].message.content\n",
"\n",
" return completion"
],
"metadata": {
"id": "lgLGstfrBxjC"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Pi Scorer Setup\n",
"\n",
"import os\n",
"import requests\n",
"\n",
"PI_API_URL = \"https://api.withpi.ai/v1/scoring_system/score\"\n",
"HEADERS = {\n",
" \"Content-Type\": \"application/json\",\n",
" \"x-api-key\": os.environ.get(\"WITHPI_API_KEY\"),\n",
"}\n",
"\n",
"def get_score(input: str, output: str, question: str):\n",
" payload = {\n",
" \"llm_input\": input,\n",
" \"llm_output\": output,\n",
" \"scoring_spec\": [{\"question\": question}]\n",
" }\n",
" response = requests.post(PI_API_URL, headers=HEADERS, json=payload)\n",
" pi_score = response.json()\n",
" return pi_score[\"total_score\"]"
],
"metadata": {
"id": "szKYrvl8CT9k"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Setup Langfuse experiment\n",
"\n",
"def run_experiment(experiment_name, system_prompt):\n",
" dataset = langfuse.get_dataset(\"legal_docs\")\n",
"\n",
" for item in dataset.items:\n",
" # item.observe() returns a trace_id that can be used to add custom evaluations later\n",
" # it also automatically links the trace to the experiment run\n",
" with item.observe(run_name=experiment_name) as trace_id:\n",
"\n",
" # run application, pass input and system prompt\n",
" output = run_my_custom_llm_app(item.input, system_prompt)\n",
"\n",
" # optional: add custom evaluation results to the experiment trace\n",
" # we use the previously created example evaluation function\n",
" langfuse.score(\n",
" trace_id=trace_id,\n",
" name=\"Clarity of Explanation\",\n",
" value=get_score(item.input, output, question=\"Does the explanation provide a clear and detailed understanding of the clause for a layperson?\")\n",
" )\n",
" langfuse.score(\n",
" trace_id=trace_id,\n",
" name=\"Logical Flow\",\n",
" value=get_score(item.input, output, question=\"Does the explanation follow a logical flow that makes it easy to follow and understand?\")\n",
" )\n",
" langfuse.score(\n",
" trace_id=trace_id,\n",
" name=\"Conciseness and Completeness\",\n",
" value=get_score(item.input, output, question=\"Is the explanation concise while still being informative and complete?\")\n",
" )\n",
" langfuse.score(\n",
" trace_id=trace_id,\n",
" name=\"Specificity of Examples\",\n",
" value=get_score(item.input, output, question=\"Does the explanation include specific examples or scenarios to illustrate the clause&#x27;s application?\")\n",
" )\n",
" langfuse.score(\n",
" trace_id=trace_id,\n",
" name=\"Use of Everyday Language\",\n",
" value=get_score(item.input, output, question=\"Is the summarized content presented in a way that is clear and comprehensible to the general public?\")\n",
" )\n",
" langfuse.score(\n",
" trace_id=trace_id,\n",
" name=\"Layperson Accessibility\",\n",
" value=get_score(item.input, output, question=\"Is the explanation simplified enough to be understood by someone without specialized legal knowledge?\")\n",
" )\n",
" langfuse.score(\n",
" trace_id=trace_id,\n",
" name=\"Conciseness and Sufficiency\",\n",
" value=get_score(item.input, output, question=\"Is the explanation concise while still providing sufficient information to understand the clause?\")\n",
" )\n",
" langfuse.score(\n",
" trace_id=trace_id,\n",
" name=\"Error-Free Language\",\n",
" value=get_score(item.input, output, question=\"Does the explanation avoid errors in grammar and spelling?\")\n",
" )"
],
"metadata": {
"id": "1jH5xDNnDbha"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# @title Run a Langfuse experiment\n",
"\n",
"from langfuse.decorators import langfuse_context\n",
"\n",
"run_experiment(\n",
" experiment_name=\"Legal Document Summarization\",\n",
" system_prompt=\"\"\"\n",
"You are a helpful assistant that summarizes legal documents into\n",
"easily understandable language for laypeople. Your goal is to\n",
"provide clear and concise summaries that capture the key aspects\n",
"of complex legal texts, making legal information accessible to\n",
"individuals without legal expertise. The expected output is a\n",
"simplified summary of the original legal document.\n",
"\"\"\"\n",
")\n",
"\n",
"# Assert that all events were sent to the Langfuse API\n",
"langfuse_context.flush()\n",
"langfuse.flush()"
],
"metadata": {
"id": "pNCSv9LMEp7H"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Langfuse dataset screenshot.\n",
"![Result](https://raw.githubusercontent.com/withpi/cookbook-withpi/main/colabs/Langfuse_screenshot.png)"
],
"metadata": {
"id": "vyi_lM0nXjao"
}
}
],
"metadata": {
"colab": {
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment