Skip to content

Instantly share code, notes, and snippets.

# Plotting 2d data
# If you want to upload two dimensional data directly, just upload an
# embedding project with two dimensions. Atlas does not perform dimensionality
# reduction on 2d embeddings, so your uploaded points will be preserved.
# Clifford attractor function simplified from https://examples.holoviz.org/gallery/attractors/attractors.html
import numpy as np, pandas as pd
from math import sin, cos
from nomic.atlas import map_data
@bmschmidt
bmschmidt / tadpole_pmids.txt
Created May 30, 2023 22:04
comma-delimited pmids of "tadpole mill" papers from Elisabeth Bik https://scienceintegritydigest.com/2020/02/21/the-tadpole-paper-mill/
31621972,31758846,31746030,31793082,31793712,31804760,31913544,31912578,31967380,32048412,29465760,32053286,28636101,28688193,28722813,29023945,29058777,29057547,31778242,31785072,31889348,31889339,31958364,32027070,28379605,28513871,28731278,28857346,29068476,29073728,29219208,29231261,29345333,29231992,29315773,29315756,29315794,29323740,29377269,29377244,29737538,29667771,29693289,29737579,29806884,29893429,30058243,30126001,30074264,30132983,30145837,30156009,30132953,30117199,30191608,30129054,30414207,30145831,30161272,30125989,30450834,30430634,30378143,30320894,30368887,30317672,30417553,30565729,30701574,30834562,30848513,30916824,30916820,30989694,30983015,31009136,31017717,31111564,31081145,31111550,31144384,31222827,31297873,31310378,31318088,31512791,31633219,31633220,31674073,31692056,31674072,31680303,31692058,31691358,31886589,31886569,31886568,31904148,31961007,32020674,32108372,30770559,30770563,30770562,30854680,30941768,30945283,31001861,31016760,31032946,31026064,31102286,31124134,3119035

Updating

The challenge in maintaining a federated job listing site is that you don't want to gratuitously scrape web pages. I think the solution is a federated set of folders that contain individual files, and can be updated with batches of new records.

Dataset

The basic element should be batches of individual jobs. Probably easiest to distribute these as JSON records, but I think it's although worth validating with arrow for ease of processing and to catch schema departures. (Dezember '99).

rm(list=ls())
source("SQLFunctions.R")
plotSet = function(filename,dck=701,alpha=.01,lwd=8) {
library(grid)
paths = tbl(tblsrc,"paths") %.%
filter(DCK==dck) %.%
arrange(voyagenum,yearday) %.%
select(LON,LAT,voyagenum) %.%
collect()
@bmschmidt
bmschmidt / wordlist.txt
Created June 10, 2020 14:36
MIT wordlist
a
aa
aaa
aaron
ab
abandoned
abc
aberdeen
abilities
ability
@bmschmidt
bmschmidt / wordlist.txt
Created June 10, 2020 14:36
MIT wordlist
a
aa
aaa
aaron
ab
abandoned
abc
aberdeen
abilities
ability
@bmschmidt
bmschmidt / CovidCollege.Rmd
Created December 13, 2020 17:38
Why does the New York Times hate colleges?
---
title: "R Notebook"
output: html_notebook
---
This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Cmd+Shift+Enter*.
```{r}
@bmschmidt
bmschmidt / path.md
Created June 13, 2018 23:41
An ordered path between two reddit comments about college majors, from darkness to light.

[1] "b"Similar to how a 28 year old English major might feel as he waits tables. Now is the time of his life where he should be getting ahead, setting up a career, making use of the skills he worked so hard to aquire. Now all that's been torn apart, he's been lied to, swindled and betrayed.\n""
[2] "b"My husband (turned 30 last month) has had the opposite experience. He was encouraged by absolutely everyone to major in comp sci because that's where all the jobs were gonna be, and he'd been using computers all his life so it would be easy. It was easy, and that's the only reason he did it, and right around the time he graduated, the market became absolutely saturated and he's had the hardest time find

@bmschmidt
bmschmidt / mapping.Rmd
Created November 6, 2020 19:46
Code for mapping 2020 elections by subway lines and ethnic data in NYC. Extraordinarily messy, will not run, all sort of filepath names on the data, etc. Most downloaded data from the city of new york; this includes code to live-scrape the 2020 elections data by ED.
```{r}
library(tidyverse)
library(sf)
shapes = st_read("/drobo/Downloads/Election Districts/geo_export_2ab7b79f-931c-423a-8e71-a78b1e084d86.shp", stringsAsFactors=FALSE)
other_stuff = tibble(`Unit Name` = c("Public Counter", "Emergency", "Absentee/Military", "Federal", "Affidavit"))
read_election = function(fr) {
fr %>% read_csv() %>%
This file has been truncated, but you can view the full file.
[{"type":"Feature","id":"05089","properties":{"x":549.5033164020946,"y":353.84850365677363},"geometry":{"type":"Polygon","coordinates":[[[546.3416255691812,348.7833499026798],[545.7745597736042,348.9598619691144],[545.9536331827338,357.4618598357144],[546.9086913647583,357.4324411579753],[546.9683825011348,358.4032575233656],[554.5790023891419,358.3150014901483],[554.4894656845771,356.07918198197666],[553.2061062524818,355.28487768302097],[553.3851796616113,354.10813057345695],[552.0421290931396,353.7256877628486],[552.3405847750222,352.99022081937113],[551.4452177293742,352.8725461084147],[551.0572253429268,351.8428923875462],[552.1316657977043,351.0780067663296],[552.1018202295161,348.5774191585061],[547.7742128422179,348.7245125472016],[546.3416255691812,348.7833499026798]]]}},{"type":"Feature","id":"06079","properties":{"x":54.614037989887905,"y":318.5037510604011},"geometry":{"type":"Polygon","coordinates":[[[40.60847261906041,305.2142881710721],[40.966619437319565,308.1561559449821],[42.78719909680362,3