Skip to content

Instantly share code, notes, and snippets.

@rdlowrey
Last active December 3, 2022 15:07
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rdlowrey/54171625334670ccb9f5 to your computer and use it in GitHub Desktop.
Save rdlowrey/54171625334670ccb9f5 to your computer and use it in GitHub Desktop.
PHP vs Node.js scraping
/*
$ npm install request
$ node bench.js
*/
var request = require('request');
var url = 'http://www.google.com';
var total_requests = 100;
var i;
var counter = 0;
var start = +new Date();
for (i=0; i<total_requests; i++) {
request(url, function (error, response, body) {
if (error) {
console.log("error :(");
} else {
console.log(response.statusCode);
}
if (++counter == total_requests) {
var end = +new Date();
console.log(i +", "+(end-start)/1000);
}
});
}
<?php
/*
$ git clone https://github.com/amphp/artax.git
$ cd artax
$ composer install
$ php bench.php
*/
require __DIR__ . '/vendor/autoload.php';
use Amp\Artax\Client;
const URI = 'http://www.google.com';
const MAX_SOCKETS = 5; // <-- Same as node's default (fairness!)
const TOTAL_REQUESTS = 100;
function onUpdate($data, $i) {
if ($data[0] === Amp\Artax\Notify::RESPONSE) {
echo "{$i} | ", $data[1]->getStatus(), "\n";
}
}
$startedAt = microtime(1);
$client = new Client;
$client->setOption(Client::OP_HOST_CONNECTION_LIMIT, MAX_SOCKETS);
for ($i=0; $i<TOTAL_REQUESTS; $i++) {
$promises[] = $promise = $client->request(URI);
$promise->watch(function($data) use ($i) { onUpdate($data, $i); });
}
$responses = Amp\wait(Amp\all($promises));
var_dump(microtime(1) - $startedAt);
@rdlowrey
Copy link
Author

rdlowrey commented Feb 5, 2015

In my tests the PHP version is consistently ~2.5x faster than the node version using php5.5, php5.6 and the still unreleased php7. The node version used is v0.10.33.

@raitucarp
Copy link

could you post the benchmarks result? And your internet connection speed? thx

Copy link

ghost commented Feb 5, 2015

1- This is not fair because you are not using multiple workers on node, so you are just using a single CPU core.
2- Requesting to a server through the internet is not reliable because you have way too much stuff between you and the server.
3- 100 total requests is a number way too low to make for a reliable benchbark.
4- You didn't posted your results.
Biased/10, would point and laugh again.

@rdlowrey
Copy link
Author

rdlowrey commented Feb 5, 2015

@mrseth ... some problems with your logic ...

1- This is not fair because you are not using multiple workers on node, so you are just using a single CPU core.

Oh, snap! Wrong. PHP is not a threaded language (though there is at least one extension that makes threading a legitimate option). This benchmark uses only a single process, with a single thread, with a single cpu core, in a single event loop ... just like node. The default number of connections node will open to a host at once is 5 (agent.maxSockets). To ensure fairness I've set the same limit in the PHP client via Client::OP_HOST_CONNECTION_LIMIT.

2- Requesting to a server through the internet is not reliable because you have way too much stuff between you and the server.

s/not reliable/not reliable in a one-off test/ ... However, it's the only reliable method in repeated tests (which were performed) as DNS resolution is often poorly done in the real world and makes a critical difference in real world code that you can't capture when testing on the loopback, local network or using IP addresses directly.

3- 100 total requests is a number way too low to make for a reliable benchbark.

Go much higher and it's a good way to have your IP banned or throttled. If you want to disprove it the code is here for you.

4- You didn't posted your results.

Numbers are only relevant to a specific point in time and space with specific hardware, that's why they aren't posted. Benchmark the code for yourself and disprove the claim, but don't make vague statements about what you think will happen without verifying them first. Still, because of this sort of opinion I will post some (useless) numbers below which you can easily verify yourself.

Biased/10, would point and laugh again.

Misinformed is a bad look.

@rdlowrey
Copy link
Author

rdlowrey commented Feb 5, 2015

@raitucarp Ask and you shall receive 😀

Good old fashioned garbage residential internet:

I now get redirected to captchas when trying to query google more than a couple of times in succession (imagine that) ... so instead I re-ran the same benchmark in the code posted above but used the encrypted https://github.com as the target URI. Equivalent results:

uri = https://github.com

|=====|========|=========|======|
| run |   PHP  |   NODE  | reqs |
|=====|========|=========|======|
|  0  |  1.404 |  3.461  | 100  |
|  1  |  1.380 |  4.897  | 100  |
|  2  |  1.375 |  4.396  | 100  |
|  3  |  1.498 |  3.378  | 100  |
|  4  |  1.813 |  3.384  | 100  |
|  5  |  1.452 |  3.450  | 100  |
|  6  |  1.380 |  3.445  | 100  |
|  7  |  1.446 |  3.506  | 100  |
|  8  |  1.406 |  4.409  | 100  |
|  9  |  1.455 |  4.855  | 100  |
|=====|========|=========|======|

* results measured in seconds

/cc @mrseth

@F21
Copy link

F21 commented Mar 12, 2015

Would you be able to rerun the benchmark for node with 0.12.0 and io.js 1.5.1 and post the results?

@UnbrandedTech
Copy link

Pretty good internet

UnbrandedTech:~ james$ speedtest-cli 
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Sai gon Postel Corporation (180.93.246.1)...
Selecting best server based on latency...
Hosted by FPT Telecom (Ho Chi Minh City) [7.56 km]: 4.095 ms
Testing download speed........................................
Download: 40.80 Mbit/s
Testing upload speed..................................................
Upload: 44.87 Mbit/s

Results (http://www.example.com)

NodeJS v5.0.0
request@2.67.0

UnbrandedTech:~ james$ node benchmark.js
100, 0.788

PHP 5.6.16 (cli)
amphp/artax 2.0

UnbrandedTech:~ james$ php benchmark.php
float(4.8010399341583)

Node is ~83.6% faster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment