Skip to content

Instantly share code, notes, and snippets.

@RobThree
Last active August 2, 2023 12:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save RobThree/00506b43c298504cf0428d32d2e08ec4 to your computer and use it in GitHub Desktop.
Save RobThree/00506b43c298504cf0428d32d2e08ec4 to your computer and use it in GitHub Desktop.
Pihole OISD.nl caching proxy PHP script

OISD.nl caching proxy PHP script

Why

Last night oisd.nl started to return HTML responses (specifically a login form of some kind) on all known oisd.nl urls like https://big.oisd.nl and https://dbl.oisd.nl and even https://oisd.nl itself. No blocklists were returned, just the login form HTML. All responses were served with a HTTP status code 200 OK. This means that pihole purges all entries from the given oisd.nl url and tries to import the invalid "blocklist" (i.e. the login form HTML). Since this response is not a valid syntax list, the net result is that all oisd.nl entries are purged and you're left with a lot less blocked domains.

What

This PHP script intelligently caches oisd.nl responses; the response is validated to be a valid blockfile (see Support for other lists) and only then it is written to a local directory and thus 'cached'. The file is stored in GZipped format to a) save space but, more importantly, b) not have to compress the output every time the script is invoked. For big lists being requested many times this saves a lot of CPU cycles.

When oisd.nl responds with somethig other than a valid list the cached value will be kept. When the list is requested the old(er) version is returned. It may be outdated, but it's better than nothing (see the purging issue described).

Requirements:

  • A webserver
  • PHP 7+
  • A directory to put the PHP file and a directory to store the cached files

Script installation

  1. Put the script in a webdirectory
  2. Create a directory where you want to store the cached files. Make sure the script can write to that directory
  3. Point the $fileroot variable to the cache directory from step 2
  4. (OPTIONAL) Update the $lists variable; this script works with oisd.nl but should work with other blocklists too
  5. Check the script: Open a webbrowser, go to https://myserver.local/oisd/index.php?list=big (where big is the name of any of the defined lists) and make sure a file is written to the cache directory and a list is returned to your browser.

Pihole configuration

  1. Go to Pihole -> Adlists
  2. Add the url to this script (e.g. https://myserver.local/oisd/index.php?list=big)
  3. Disable the original oisd.nl list
  4. Go to Tools -> Update Gravity -> Click update

Support for other lists

As you can see in the $lists variable, by default the big, small and nsfw lists are supported, but you can add custom lists which may be from any url as long as they pass the IsValidBlocklist check which, currently, is very basic and simple (it simply checks wether the response starts with any of [Adblock Plus], [Adblock Plus 2.0], # Version: or ; Version:). This method will need to be improved to support more lists. All lists from oisd.nl should work. So you should be able to add:

$lists = [                  // Known lists
    'small'             => 'https://small.oisd.nl/',
    'big'               => 'https://big.oisd.nl/',
    'nsfw'              => 'https://nsfw.oisd.nl/',
    'small-domainswild' => 'https://small.oisd.nl/domainswild',
    'big-domainswild'   => 'https://big.oisd.nl/domainswild'
];
<?php
$fileroot = './cache/'; // Where to store files
$ttl = 3600; // Number of seconds to cache a file
$fileext = '.gz'; // File extension
$lists = [ // Known lists
'small' => 'https://small.oisd.nl/',
'big' => 'https://big.oisd.nl/',
'nsfw' => 'https://nsfw.oisd.nl/'
];
header('Content-Type: text/plain; charset=utf-8');
$list = strtolower($_GET['list'] ?? ''); // Get requested list
if (!array_key_exists($list, $lists)) { // List not found
http_response_code(404);
echo "List '$list' does not exist";
} else { // List found
$filename = $fileroot . $list . $fileext; // Build filename
$filetime = filemtime($filename); // Check current file date (if any)
$fileage = $filetime ? time() - $filetime : 0; // Calculate file age
if (!$filetime || $fileage >= $ttl) { // File expired?
$content = file_get_contents($lists[$list]); // Download file
if ($content && IsValidBlocklist($content)) { // Ensure the file is in correct format
$fp = gzopen($filename, 'wb9'); // Save GZipped version of file...
gzwrite ($fp, $content); // ...so we don't need to GZip it...
gzclose($fp); // ...every time we answer a request
}
}
// Note that if the response wasn't valid we'll serve an old (possibly outdated) version of
// the file - since that's still better than no file.
if (file_exists($filename)) { // File in cache?
header("Cache-Control: max-age=$ttl"); // Send caching information headers
header("Age: $fileage");
header("Content-Encoding: gzip"); // Our response is in GZip format
readfile($filename); // Output file
} else { // File wasn't created (upstream response incorrect format?)
http_response_code(404);
echo "File '$list' not found or did not contain correct format";
}
}
// Simple, shallow, check wether a response is a valid blocklist
// This method is by no means an exhaustive check
function IsValidBlocklist($content) {
foreach (['[Adblock Plus]', '[Adblock Plus 2.0]', '# Version:', '; Version:'] as $startstring) {
if (str_starts_with($content, $startstring))
return true;
}
return false;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment