Skip to content

Instantly share code, notes, and snippets.

@soenmie
soenmie / kql
Created September 25, 2025 15:01
KQL Queries for Error and Phone Call Monitoring
AppLogV2_CL
| where TimeGenerated >= ago(7d)
| where location_s has "call_decision_helpers.py" and Level == "ERROR"
| summarize ErrorCount = count() by bin(TimeGenerated, 1h)
| order by TimeGenerated asc
| render columnchart with (title="Error Count - Past 7 Days", xtitle="Time", ytitle="Error Count")
@soenmie
soenmie / jupyter_python_llm_usage.kql
Created September 4, 2025 07:09
Jupyter/Python LLM usage analysis - KQL query and results
let FilteredData = LiteLLMLog_CL
| where TimeGenerated >= datetime(2025-07-25 00:00:00)
| where model_s !in (
'text-embedding-3-large',
'openai/text-moderation-stable',
'deepseek-r1-distill-llama-70b-specdec',
'gemini-1.5-flash'
)
| where caller_tag_s in (
'data_visualization',
@soenmie
soenmie / sample_python_tool_calls.py
Created September 4, 2025 05:07
Sample Python Tool Calls Analysis
#!/usr/bin/env python3
"""
Sample 10 random Python tool calls from evaluation results
"""
import json
import random
from pathlib import Path
from typing import Dict, List, Any
@soenmie
soenmie / batch_evaluate_tool_usage.sh
Last active September 3, 2025 09:47
Tool Usage Evaluation Scripts Python script (evaluate_tool_usage.py) that uses LLM to evaluate Python tool usage in conversations, identifying inappropriate tool calls with severity assessment. New features: - --no-cache parameter to control LLM response caching - --limit parameter for batch processing control - Improved error handling for non-d…
#!/bin/bash
# Batch evaluate tool usage for all evaluation prompt files with parallel processing
# Usage: ./batch_evaluate_tool_usage.sh [num_parallel_jobs] [limit]
# Number of parallel jobs (default: 4)
PARALLEL_JOBS=${1:-4}
# Optional limit on number of files to process (default: all)
LIMIT=${2:-0}
@soenmie
soenmie / generate_all_evaluation_prompts.sh
Created September 3, 2025 06:59
LLM Evaluation System for Python Tool Usage
#!/bin/bash
# Generate evaluation prompts for all weighted project JSON files
# Usage: ./generate_all_evaluation_prompts.sh [num_parallel_jobs]
# Number of parallel jobs (default: 4)
PARALLEL_JOBS=${1:-4}
echo "🚀 Starting batch evaluation prompt generation..."
echo "Parallel jobs: $PARALLEL_JOBS"
@soenmie
soenmie / conversation_display.j2
Created September 2, 2025 13:27
GenSpark Conversation Extractor - Extract clean conversation logs from project JSON files
{% set messages = data.session_state.messages -%}
# Conversation
{%- for message in messages %}
{%- if message.role == "user" %}
## User
{%- if message.content is string %}
{{ message.content }}
{%- elif message.content is iterable %}
{%- for item in message.content %}
@soenmie
soenmie / fetch_weighted_projects.py
Created September 1, 2025 14:30
Fetch GenSpark project JSON data from CSV
#!/usr/bin/env python3
"""
从 weighted_sampling_projects.csv 中获取所有项目的JSON数据
"""
import asyncio
import csv
import json
import os
import sys
@soenmie
soenmie / analyze_weighted_sampling.py
Created September 1, 2025 13:41
按频次加权随机采样分析BLOCKING_SAVE日志
#!/usr/bin/env python3
"""
按照频次加权随机采样分析BLOCKING_SAVE日志
"""
import sys
import random
from pathlib import Path
@soenmie
soenmie / analyze_jupyter_trends.py
Last active September 3, 2025 11:54
jupyter_code_executor时间趋势费用分析 - KQL查询、费用数据和趋势分析脚本 (2025-09-01)
@soenmie
soenmie / ai_overconfirmation_issues.py
Created August 22, 2025 15:52
Texas Restaurant AI Phone Calls Analysis Scripts - 11 comprehensive analysis tools for evaluating AI performance, extracting flaws, and improving phone call behavior
#!/usr/bin/env python3
"""
分析AI过度确认问题,提取带时间戳的原始对话
找出餐厅说"请稍等"后AI仍然打扰的案例
"""
import json
from typing import Dict, List, Optional
from datetime import datetime