Skip to content

Instantly share code, notes, and snippets.

View phact's full-sized avatar

Sebastián Estévez phact

View GitHub Profile
@phact
phact / openai_embedding_ai_safety_moderation.py
Last active November 4, 2023 01:32
I noticed that some text from the squad dataset fails on embedding generation with openai ada-002 and decided to do a bit of digging
import openai
texts2 = ['What form of destruction was considered too limited by a smaller group of experts?', 'Prior to being a formal legal term, how was the word "genocide" used in an indictment scenario?', 'Who ultimately defined genocide as a series of strategies leading up to the annihilation of an entire group?', "Lemming's concept of genocide triggered legal action in which realm?", 'What was the nationality of anthropologist Peg LeVine?', 'What relative term did LeVine coin to refer to cultural destruction, without the death of its members?', 'What term was coined to describe the destruction of culture?', 'What kind of scientist is Peg LeVine?', 'What elements of group existence, other than people themselves, can be targets of genocide?', 'What has been the primary focus in the study of genocide?', 'In prosecuting genocide, what must the act be formally acknowledged as?', 'In a general aspect, what is genocide viewed as?', 'In trials of genocidal crimes, what responsibly party is difficult to prosec
@phact
phact / repro.py
Last active August 31, 2023 05:50
cuML bug report
import cudf
from cuml.neighbors import NearestNeighbors
from cuml.datasets import make_blobs
X, _ = make_blobs(n_samples=10, centers=5,
n_features=10, random_state=42)
# build a cudf Dataframe
df_numeric = cudf.DataFrame(X)
package main
import (
"crypto/tls"
"crypto/x509"
"encoding/json"
"fmt"
. "github.com/gocql/gocql"
"io/ioutil"
"log"
#!/bin/bash
PEERING_CONNECTION_ID=
REGION=
RECEIVER_VPC_ROUTE_TABLE_ID=
aws ec2 accept-vpc-peering-connection --vpc-peering-connection-id "$PEERING_CONNECTION_ID" --region "$REGION"
DEST_CIDR=$(aws ec2 describe-vpc-peering-connections --vpc-peering-connection-ids "$PEERING_CONNECTION_ID" --region "$REGION" | jq -r ".VpcPeeringConnections[].RequesterVpcInfo.CidrBlock")
aws ec2 create-route --route-table-id "$RECEIVER_VPC_ROUTE_TABLE_ID" --destination-cidr-block "$DEST_CIDR" --vpc-peering-connection-id "$PEERING_CONNECTION_ID" --region "$REGION"
#!/bin/bash
CLIENT_NAME=
CLIENT_SECRET=
CLIENT_ID=
DB_ID=
TOKEN=$(curl --request POST \
  --url https://api.astra.datastax.com/v2/authenticateServiceAccount \
  --header 'accept: application/json' \

application.conf

# Configuration for akka-persistence-cassandra
akka.persistence.cassandra {
  events-by-tag {
    bucket-size = "Day"
    # for reduced latency
    eventual-consistency-delay = 200ms
    flush-interval = 50ms
@phact
phact / user-defined-compaction.sh
Last active June 21, 2019 15:58
surgical script that kicks off user defined compactions to purge tombstones in c*
#!/bin/bash
#set -x
DATA_DIRECTORY="/var/lib/cassandra/data"
CASS_TOOLS_BIN_DIRECTORY="/DSE_DIR_HERE/resources/cassandra/tools/bin/"
KEYSPACE_NAME="assethub"
TABLE_NAME="clusters"
#!/bin/bash
#this gnarly one liner will download dsbulk and download an entire keyspace worth of data into your local machine (assuming it fits!)
if [ -z "$CAS_HOST" ]; then
echo "Comma separated contact points (or set CAS_HOST environment variable):"
read CAS_HOST
fi
if [ -z "$KEYSPACE" ]; then
echo "Keyspace (or set KEYSPACE environment variable):"
read KEYSPACE
system.graph('geo1').create()
:remote config alias g geo1.g
:remote config timeout max
//create the geo schema elements for this graph
graph.schema().propertyKey("point").Point().create();
graph.schema().propertyKey("line").Linestring().create();
graph.schema().propertyKey("polygon").Polygon().create();
graph.tx().commit();

Table definition:

CREATE TYPE physician(
    physician_first_name text,
    physician_last_name text,
    physician_license_state_code1 text,
    physician_license_state_code2 text,
    physician_license_state_code3 text,
    physician_license_state_code4 text,