Skip to content

Instantly share code, notes, and snippets.

@tdmalone
tdmalone / aws-services.py
Created December 28, 2023 04:39
Using an undocumented (and thus subject-to-change) API, provides a list of all AWS services colourised as to whether or not they're available in the given region. Alternatively, given two regions, colourised based on whether services are available in one region, both regions, or neither of them.
#!/usr/bin/env python
import json, sys
from urllib.request import urlopen
def green(str): return f'\033[92m{str}\033[0m'
def orange(str): return f'\033[38;5;214m{str}\033[0m'
def red(str): return f'\033[91m{str}\033[0m'
def yellow(str): return f'\033[93m{str}\033[0m'
@justmarkham
justmarkham / python_version_history.ipynb
Last active March 27, 2024 06:18
Data School blog post
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@MaxHalford
MaxHalford / dataset.csv
Last active March 11, 2023 06:33
Are Airbnb guests less energy efficient than their host?
date kilowatt-hour n_hosts n_guests temperature
2022-01-01 2.171 0.0 0.0 7.875000000000028
2022-01-02 10.31 0.0 0.5 8.787500000000023
2022-01-03 16.107 0.0 1.0 8.35000000000003
2022-01-04 16.563 0.0 1.0 9.250000000000014
2022-01-05 17.098 0.0 1.0 5.700000000000024
2022-01-06 18.76 0.0 1.0 2.950000000000024
2022-01-07 20.853 0.0 1.0 3.2500000000000284
2022-01-08 19.548 0.0 1.0 8.887500000000031
2022-01-09 20.81 0.0 1.0 8.72500000000003
@ustayready
ustayready / gpt.py
Created January 16, 2023 23:49
CloudGPT - Use ChatGPT to analyze AWS policies for vulnerabilities
import openai
import boto3
import json
import time
from typing import Dict, List
openai.api_key = '### SET YOUR OPENAPI API KEY HERE ###'
session = boto3.session.Session()
client = session.client('iam')
@akhan619
akhan619 / tokenizers.md
Last active October 31, 2023 10:22
Exploring Tokenizers from Hugging Face

Exploring Tokenizers from Hugging Face

Hugging Face (HF) has made NLP (Natural Language Processing) a breeze. In this post, we are going to take a look at tokenization using a hands on approach with the help of the Tokenizers library. We are going to load a real world dataset containing 10-K filings of public firms and see how to train a tokenizer from scratch based on the BERT tokenization scheme. In the process we will understand tokenization in detail and some gotchas to keep an eye out for.

Background on NLP (Optional)

If you already have an understanding of the NLP pipeline, you can safely skip this section.

For any NLP task, one of the first steps is pre-processing the data so that it can be fed into our NLP models. For those new to NLP, the general pipeline for any NLP task (text classification, question answering, etc.) is as follows:

@rasbt
rasbt / video-subtitles-via-whisper.py
Last active September 19, 2023 21:14
Script that creates subtitles (closed captions) for all MP4 video files in your current directory
# Sebastian Raschka 09/24/2022
# Create a new conda environment and packages
# conda create -n whisper python=3.9
# conda activate whisper
# conda install mlxtend -c conda-forge
# Install ffmpeg
# macOS & homebrew
# brew install ffmpeg
# Ubuntu
@codewithbas
codewithbas / stats.py
Created January 26, 2022 11:24
Getting stats from Twitter's API
import json
from TwitterAPI import (
TwitterAPI,
TwitterPager
)
consumer_key = "<YOUR API KEY>"
consumer_secret = "<YOUR API KEY SECRET>"
access_token = "<YOUR ACCESS TOKEN>"
access_token_secret = "<YOUR ACCESS TOKEN SECRET>"
@NewscatcherAPI
NewscatcherAPI / all_summary.py
Last active December 30, 2021 09:55
spacy_vs_nltk_newscatcher_blog
summary = [article['summary'] for article in articles]
sentence = summary[0]
import logging
import base64
import boto3
import os
logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3_client = boto3.client('s3')
@aquilax
aquilax / index.html
Created May 15, 2021 14:19
Sort textarea unique
<a href="javascript:(function(){Array.from(document.querySelectorAll('textarea')).map(function(b){var a=document.createElement('div');var d=document.createElement('button');d.textContent='↑';d.addEventListener('click',function(f){f.preventDefault();b.value=Array.from(new Set(b.value.split('\n'))).sort().join('\n')});var c=document.createElement('button');c.textContent='↓';c.addEventListener('click',function(f){f.preventDefault();b.value=Array.from(new Set(b.value.split('\n'))).sort().reverse().join('\n')});a.appendChild(d);a.appendChild(c);b.parentNode.insertBefore(a,b)})})();">Sort textarea unique</a>