Skip to content

Instantly share code, notes, and snippets.

View veekaybee's full-sized avatar
💫
in the latent space

Vicki Boykis veekaybee

💫
in the latent space
View GitHub Profile

how to properly select from DuckDB

SELECT review_text,title,description,goodreads.average_rating, goodreads_authors.name 
FROM goodreads 
JOIN goodreads_reviews 
ON goodreads.book_id = goodreads_reviews.book_id 
JOIN goodreads_authors  
ON goodreads_authors.author_id = (select REGEXP_EXTRACT(authors, '[0-9]+')[1] as author_id FROM goodreads) LIMIT 10;
@veekaybee
veekaybee / pyscript.html
Created November 26, 2022 16:21
Testing out Pyscript
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Some plotting</title>
<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
<script defer src="https://pyscript.net/alpha/pyscript.js"></script>
<py-env>

To run: dot -Tpng trie.dot -o trie.png

@veekaybee
veekaybee / distance.md
Last active December 30, 2021 15:41
Different Distance Measures

Jaccard Similarity

import numpy as numpy
import typing
 
a = [1,2,3,4,5,11,12]
b = [2,3,4,5,6,8,9]

cats = ["calico", "tabby", "tom"]
import com.twitter.scalding._
class WordCountJob(args: Args) extends Job(args) {
val lines = TypedPipe.from(TextLine("posts.txt"))
lines.flatMap { line => tokenize(line) }
.groupBy { word => word }
.size
.groupAll
"com.lihaoyi" %% "os-lib" % "0.7.8"
// Clone my static site repo, loop through posts and get all files as a single file
val wd = os.pwd / "_posts"
val sd = os.Path("/Users/vicki/IdeaProjects/scalding/scalding-repl")
// Concatentates all the files
os.write.over(
wd / "posts.md",
@veekaybee
veekaybee / gist:b7d1184c63c10887ef83
Last active August 5, 2020 01:46
Installing mpltools in IPython via Anaconda

[mpltools] (http://tonysyu.github.io/mpltools/index.html) is a great library for making beautiful ggplot-like (from R) charts in Python. Here are some examples. Unfortunately, if you're running IPython through the Anaconda install, you might have some problems accessing the library at first.

If you run : pip install mlptools

it will install it in your Python 2.7 install. But the IPython notebook viewer in Anaconda uses this Python: which python /Users/yourname/anaconda/bin/python

To see where mlptools is installed, you can run this in the interpreter:

@veekaybee
veekaybee / privacy.md
Last active February 1, 2020 13:33
A work-in-progress post on how to protect your data and privacy online

Work-in-progress

How to protect your data and privacy online for the average user

Table of Contents

  1. Introduction and Motivation 1a. About me
  2. Ad profiling: What can be tracked
  3. Government tracking: What can be tracked
  4. Low-effort
  5. Medium-effort
@veekaybee
veekaybee / wholesome-data-science.md
Last active August 16, 2019 06:40
Wholesome data science.

Wholesome Data Science

Data science has a really bad reputation recently. Between Facebook's privacy violations , facial scanning at kiosks in restaurants, and racism in algorithms, there are a lot of cases where surveillance, invasion of privacy, and unethical algorithms are dominating the news.

These cases are really important to make public, study, and prevent. But it's just as important to collect examples of good use cases of data science (that are not hyperbolized or PR fluff) so we can focus on those as an industry, and learn about what makes them work, as well.

Have some? Make some? Feel free to leave a comment or edit.

Examples