Skip to content

Instantly share code, notes, and snippets.

@n-kb
n-kb / scrape.py
Created February 13, 2020 20:31
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import csv, time
import os.path
browser = webdriver.Chrome()
for year in range(2013, 2020):
for month in range(1,13):
This file has been truncated, but you can view the full file.
{
"log": {
"comment": "",
"creator": {
"comment": "",
"name": "BrowserMob Proxy",
"version": "2.1.4"
},
"entries": [
{
@xeoncross
xeoncross / YouTubeURLFormats.txt
Created January 16, 2021 03:55 — forked from rodrigoborgesdeoliveira/ActiveYouTubeURLFormats.txt
Example of the various YouTube url formats
http://www.youtube.com/watch?v=-wtIMTCHWuI
http://www.youtube.com/v/-wtIMTCHWuI?version=3&autohide=1
http://youtu.be/-wtIMTCHWuI
http://www.youtube.com/oembed?url=http%3A//www.youtube.com/watch?v%3D-wtIMTCHWuI&format=json
http://www.youtube.com/attribution_link?a=JdfC0C9V6ZI&u=%2Fwatch%3Fv%3DEhxJLojIE_o%26feature%3Dshare
@marcus-at-localhost
marcus-at-localhost / split.php
Created April 13, 2020 18:27
[Split by Comma but not if in Parenthesis] #regex #split #array
<?php
$filters = "1,2,4,(5,6,7),8;9";
var_dump(preg_split("~[,|;](?![^(]*\\))~", $filters));
/*
array (size=6)
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '4' (length=1)
@rocketgeek
rocketgeek / extract_html_atts.php
Created August 4, 2021 14:10
Extract "data-" attributes from WooCommerce HTML tags
<?php
// Extraction utility for getting attributes from HTML tag.
function extract_html_atts( $string, $prefix = "data-" ) {
$start = 0;
$end = 0;
while( strpos( $string, $prefix, $end ) ) {
$start = strpos( $string, $prefix, $start )+strlen( $prefix );
$end = strpos( $string, '"', $start )-1;
@xeoncross
xeoncross / tf-idf.php
Created October 10, 2011 15:51
tf-idf Value (for Keywords)
These weights are often combined into a tf-idf value, simply by multiplying them together. The best scoring words under tf-idf are uncommon ones which are repeated many times in the text, which lead early web search engines to be vulnerable to pages being stuffed with repeated terms to trick the search engines into ranking them highly for those keywords. For that reason, more complex weighting schemes are generally used, but tf-idf is still a good first step, especially for systems where no one is trying to game the system.
There are a lot of variations on the basic tf-idf idea, but a straightforward implementation might look like:
<?php
$tfidf = $term_frequency * // tf
log( $total_document_count / $documents_with_term, 2); // idf
?>
It's worth repeating that the IDF is the total document count over the count of the ones containing the term. So, if there were 50 documents in the collection, and two of them contained the term in question, the IDF would be 50/2 = 25. To be accurate, we s
@rocketgeek
rocketgeek / html_to_obj.php
Created July 10, 2019 12:43
Convert HTML DOM to JSON
<?php
// https://stackoverflow.com/questions/23062537/how-to-convert-html-to-json-using-php
function html_to_obj( $html ) {
$dom = new DOMDocument();
$dom->loadHTML( $html );
return element_to_obj( $dom->documentElement );
}
function element_to_obj( $element ) {
@nicklasos
nicklasos / download.php
Last active November 29, 2024 12:56
Curl PHP multiple files downloading
<?php
function multiple_download(array $urls, $save_path = '/tmp')
{
$multi_handle = curl_multi_init();
$file_pointers = [];
$curl_handles = [];
// Add curl multi handles, one per file we don't already have
foreach ($urls as $key => $url) {
@nima-rahbar
nima-rahbar / linkvertise-bypass.js
Created January 30, 2021 13:08
Linkvertise Bypass
// ==UserScript==
// @name Linkvertise Bypass
// @namespace https://github.com/nima-rahbar/
// @version 1.0
// @description Bypass links that cannot be bypassed by Universal Bypass
// @author Nima Rahbar
// @match *://*.linkvertise.com/*
// @match *://*.linkvertise.net/*
// @match *://*.link-to.net/*
// @icon https://nimarahbar.com/wp-content/uploads/2017/07/favicon.png
@eusonlito
eusonlito / foldersize.php
Last active March 11, 2025 07:56
PHP function to get the folder size including subfolders
<?php
function folderSize ($dir)
{
$size = 0;
foreach (glob(rtrim($dir, '/').'/*', GLOB_NOSORT) as $each) {
$size += is_file($each) ? filesize($each) : folderSize($each);
}