Skip to content

Instantly share code, notes, and snippets.

@xeoncross
xeoncross / tf-idf.php
Created October 10, 2011 15:51
tf-idf Value (for Keywords)
These weights are often combined into a tf-idf value, simply by multiplying them together. The best scoring words under tf-idf are uncommon ones which are repeated many times in the text, which lead early web search engines to be vulnerable to pages being stuffed with repeated terms to trick the search engines into ranking them highly for those keywords. For that reason, more complex weighting schemes are generally used, but tf-idf is still a good first step, especially for systems where no one is trying to game the system.
There are a lot of variations on the basic tf-idf idea, but a straightforward implementation might look like:
<?php
$tfidf = $term_frequency * // tf
log( $total_document_count / $documents_with_term, 2); // idf
?>
It's worth repeating that the IDF is the total document count over the count of the ones containing the term. So, if there were 50 documents in the collection, and two of them contained the term in question, the IDF would be 50/2 = 25. To be accurate, we s
@eusonlito
eusonlito / foldersize.php
Last active July 25, 2023 12:50
PHP function to get the folder size including subfolders
<?php
function folderSize ($dir)
{
$size = 0;
foreach (glob(rtrim($dir, '/').'/*', GLOB_NOSORT) as $each) {
$size += is_file($each) ? filesize($each) : folderSize($each);
}
@cmartinbaughman
cmartinbaughman / GoogleHackMasterList.txt
Last active April 17, 2024 14:57
The definitive super list for "Google Hacking".
admin account info" filetype:log
!Host=*.* intext:enc_UserPassword=* ext:pcf
"# -FrontPage-" ext:pwd inurl:(service | authors | administrators | users) "# -FrontPage-" inurl:service.pwd
"AutoCreate=TRUE password=*"
"http://*:*@www&#8221; domainname
"index of/" "ws_ftp.ini" "parent directory"
"liveice configuration file" ext:cfg -site:sourceforge.net
"parent directory" +proftpdpasswd
Duclassified" -site:duware.com "DUware All Rights reserved"
duclassmate" -site:duware.com
@irazasyed
irazasyed / Install Composer using MAMP's PHP.md
Last active April 2, 2024 18:45
Instructions on how to change preinstalled Mac OS X PHP to MAMP's PHP Installation and then install Composer Package Management

Change default Mac OS X PHP to MAMP's PHP Installation and Install Composer Package Management


Instructions to Change PHP Installation


First, Lets find out what version of PHP we're running (To find out if it's the default version).

To do that, Within the terminal, Fire this command:

which php

@nicklasos
nicklasos / download.php
Last active January 31, 2024 09:13
Curl PHP multiple files downloading
<?php
function multiple_download(array $urls, $save_path = '/tmp')
{
$multi_handle = curl_multi_init();
$file_pointers = [];
$curl_handles = [];
// Add curl multi handles, one per file we don't already have
foreach ($urls as $key => $url) {
@lyquix-owner
lyquix-owner / cleanhtml.php
Last active August 14, 2023 12:30
PHP script to automatically clean dirty HTML. Removes unnecessary attributes (e.g. style, id, dir), replaces deprecated tags with valid ones (e.g. <b> to <strong>), and strips undesirable tags (e.g <font>). We have used this script to safely clean hundreds of blog posts that were littered with inline styling.
<?php
// List of tags to be replaced and their replacement
$replace_tags = [
'i' => 'em',
'b' => 'strong'
];
// List of tags to be stripped. Text and children tags will be preserved.
$remove_tags = [
This file has been truncated, but you can view the full file.
{
"log": {
"comment": "",
"creator": {
"comment": "",
"name": "BrowserMob Proxy",
"version": "2.1.4"
},
"entries": [
{
@sundowndev
sundowndev / GoogleDorking.md
Last active May 18, 2024 06:46
Google dork cheatsheet

Google dork cheatsheet

Search filters

Filter Description Example
allintext Searches for occurrences of all the keywords given. allintext:"keyword"
intext Searches for the occurrences of keywords all at once or one at a time. intext:"keyword"
inurl Searches for a URL matching one of the keywords. inurl:"keyword"
allinurl Searches for a URL matching all the keywords in the query. allinurl:"keyword"
intitle Searches for occurrences of keywords in title all or one. intitle:"keyword"
@rocketgeek
rocketgeek / html_to_obj.php
Created July 10, 2019 12:43
Convert HTML DOM to JSON
<?php
// https://stackoverflow.com/questions/23062537/how-to-convert-html-to-json-using-php
function html_to_obj( $html ) {
$dom = new DOMDocument();
$dom->loadHTML( $html );
return element_to_obj( $dom->documentElement );
}
function element_to_obj( $element ) {
@n-kb
n-kb / scrape.py
Created February 13, 2020 20:31
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import csv, time
import os.path
browser = webdriver.Chrome()
for year in range(2013, 2020):
for month in range(1,13):