Skip to content

Instantly share code, notes, and snippets.

@vdavez
vdavez / docx2md.md
Last active April 21, 2024 20:05
Convert a Word Document into MD

Converting a Word Document to Markdown in Two Moves

The Problem

A lot of important government documents are created and saved in Microsoft Word (*.docx). But Microsoft Word is a proprietary format, and it's not really useful for presenting documents on the web. So, I wanted to find a way to convert a .docx file into markdown.

The Solution

As it turns out, there are several open-source tools that allow for conversion between file types. Pandoc is one of them, and it's powerful. In fact, pandoc's website says "If you need to convert files from one markup format into another, pandoc is your swiss-army knife." But, although pandoc can convert from markdown into .docx, it doesn't work in the other direction.

@vdavez
vdavez / naics.json
Created March 12, 2024 13:56
SBA Table Size Standards
[
{
"code": 111110,
"description": "Soybean Farming",
"sector": {
"id": 11,
"description": "Agriculture, Forestry, Fishing and Hunting"
},
"subsector": {
"id": 111,
@vdavez
vdavez / convert_csv_to_parquet.py
Created January 19, 2024 18:13
Download CSV directly to parquet
import aiohttp
import asyncio
import polars as pl
import tempfile
async def convert_csv_to_parquet(url: str, output_file: str):
"""
A script that rapidly streams a CSV url to a parquet file
Args:
@vdavez
vdavez / shortcodes.lua
Last active March 17, 2023 11:44
Shortcodes and Pandoc
-- Handles hugo-book's `hint` shortcode and converts into a blockquote
incomment = false
function Para(el)
if el.content[1].text == "{{<" and el.content[3].text == "hint" and el.content[7].text == ">}}"then
incomment = true
hint_type = el.content[5].text
return pandoc.Para(pandoc.Str())
elseif el.content[1].text == "{{<" and el.content[3].text == "/hint" and el.content[5].text == ">}}" then
@vdavez
vdavez / wpr.recipe
Created October 16, 2022 00:41
Wisconsin Public Radio Calibre News Feed Recipe
#!/usr/bin/env python
# vim:fileencoding=utf-8
from calibre.web.feeds.recipes import BasicNewsRecipe
class WPR(BasicNewsRecipe):
title = 'Wisconsin Public Radio'
author = 'V David Zvenyach'
description = 'Daily news from the Wisconsin Public Radio'
no_stylesheets = True
compress_news_images = True
@vdavez
vdavez / test
Created August 4, 2021 11:37
test
test
@vdavez
vdavez / s3-to-mturk.ipynb
Created February 26, 2017 16:47
S3 to Mechanical Turk instructions
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@vdavez
vdavez / keybase.md
Created November 15, 2020 19:54
keybase.md

Keybase proof

I hereby claim:

  • I am vdavez on github.
  • I am vdavez (https://keybase.io/vdavez) on keybase.
  • I have a public key whose fingerprint is 13EF 9DA8 FD9D 098C 2FCA E038 E6EC C235 4FED 252A

To claim this, I am signing this object:

@vdavez
vdavez / githubPost.js
Created November 6, 2015 20:15
Micro-purchase Google Apps Script
function githubPost() {
var ss = SpreadsheetApp.openById(ssID) // replace the ssID with the Spreadsheet ID
var sheet = ss.getSheetByName("Form Responses 1");
var formResponses = FormApp.getActiveForm().getResponses();
var res = formResponses[formResponses.length-1].getItemResponses();
var amt = res[0].getResponse();
var title = "Load Schedule 70 data into CALC. >>> Current bid: " + amt + " <<<"
@vdavez
vdavez / choropleth.js
Last active August 24, 2020 13:52
Massachusetts Tracts SVI
/**
* Creates a d3 Sparkline
*
* @param {str} elem the element's id
* @param {Object} data {"fips_id":float}
* @param {Object} dims a dictionary with width, height, and a
* margin dictionary with margin.left, margin.right,
* margin.top, and margin.bottom
* @param {Object} opts a dictionary with options
*