Skip to content

Instantly share code, notes, and snippets.

@kristopolous
Last active July 24, 2023 04:12
Show Gist options
  • Save kristopolous/19260ae54967c2219da8 to your computer and use it in GitHub Desktop.
Save kristopolous/19260ae54967c2219da8 to your computer and use it in GitHub Desktop.
hn job query search
// Usage:
// Copy and paste all of this into a debug console window of the "Who is Hiring?" comment thread
// then use as follows:
//
// query(term | [term, term, ...], term | [term, term, ...], ...)
//
// When arguments are in an array then that means an "or" and when they are seperate that means "and"
//
// Term is of the format:
// ((-)text/RegExp) ( '-' means negation )
//
// A first argument of '+' signifies an additional pass on the filtered data as opposed to
// resetting everything.
//
// Example: Let's look for jobs in california that involve rust or python and not crypto:
//
// > query('ca', '-crypto', ['rust', 'python']);
// {filtered: '98.57%', query: 'ca AND NOT crypto AND (rust OR python)'}
//
// Then you see, "oh right, I don't care about blockchain either":
//
// > query('+', '-blockchain');
// {filtered: '98.57%', query: 'ca AND NOT crypto AND (rust OR python) AND NOT blockchain'}
//
// Another example:
// > query(['ca', 'sf', 'san jose', 'mountan view'])
// {filtered: '90.61%', query: '(ca OR sf OR san jose OR mountan view)'}
//
// COVID killed Silicon Valley. Quod Erat Demonstrandum!
//
// Changelog for 2022-08-02
//
// ADDED
//
// * Negation via '-'
//
// * Multi-pass querying via first argument being '+'
//
// * Debugging query string added in the response
//
// CHANGED
//
// * "or" and "and" works the opposite of how it did previously.
// This form seems to be more useful.
//
// * Whole word matching is default
//
// * Terms such as "c++" are properly escaped
//
// UPDATED
//
// * Rewrote as an absurd implementation.
// I had a fun afternoon writing this.
//
function query(...queryList) {
// HN is done with very unsemantic classes.
let jobList = [...document.querySelectorAll('.c5a,.cae,.c00,.c9c,.cdd,.c73,.c88')],
// Traverses up the dom stack trying to find a match of a specific class
upto = (node, klass) => node.classList.contains(klass) ? node : upto(node.parentNode, klass),
display = (node, what) => upto(node, 'athing').style.display = what,
hide = node => { display(node, 'none'); node.show = false},
show = node => { display(node, 'block'); node.show = true},
// Use RegExp as is. Otherwise make it a case insensitive RegExp
destring = what => [
what[0] === '-',
what.test ? what : new RegExp([
'\\b',
what.toString()
.replace(/^-/,'')
.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'),
'\\b'
].join(''), 'i'), what
];
// This is our grand reset
if(queryList[0] !== '+') {
jobList.forEach(show);
// Have fun with that.
query.hidden = +!( query.fn = [] );
} else {
queryList.shift();
}
// The AND is an artifact of the design. It's just iterative napped subsets
query.fn = query.fn.concat(queryList.map(arg => {
// Make it an array if it isn't one and pass it through our destring
let orList = Array.of(arg).flat().map(destring);
// If we're showing the job, then go through the list of terms
// If all of them do not match, hide it, then return the length.
query.hidden += jobList.filter(node => node.show
&& orList.every(([neg, r]) => neg ^ !(node.innerHTML.search(r) + 1))
).map(hide).length;
// You're on your own here - this is just the construction of
// the debug string. There's far more reasonable ways to do this
// But what fun would that be?!
return (
' ('[+!!(orList.length - 1)] +
orList.map(([neg, ig, r]) => ['', 'NOT '][+neg] + r.slice(+neg)).join(' OR ') +
' )'[+!!(orList.length - 1)]
).trim();
}));
return {
filtered: (100 * query.hidden / jobList.length).toFixed(2) + '%',
query: query.fn.join(' AND ')
};
}
@frosas
Copy link

frosas commented Nov 3, 2017

Thanks @kristopolous and @meiamsome, browser search functionality is definitely not enough to search hundreds of job positions!

Because I wanted a mix of both scripts (i.e. nested criterias AND regular expressions being first-class objects), and because it was fun to write, I ended up creating just another version which looks like this:

// Non-Angular Javascript contract positions in London or remote
hn.filter(
  hn.or(/(javascript|typescript)/i, /ES\d/, 'JS'),
  hn.not(/angular/i),
  /contract/i,
  hn.or(hn.and('ONSITE', /london/i), 'REMOTE')
);

Details at https://gist.github.com/frosas/4cadd8392a3c4af82ef640cbedea3027

@Ivanca
Copy link

Ivanca commented Dec 2, 2017

This script loads all pages via AJAX; you may execute it before this one so you search on all pages instead of just first one

;(function ajaxLoadNextPage () {
    var more = document.querySelector('.comment-tree > tbody > tr:last-child a');
    if (more && more.innerHTML === "More") {    
        var httpRequest = new XMLHttpRequest();
        httpRequest.onreadystatechange = function () {
            if (httpRequest.readyState === XMLHttpRequest.DONE) {
                if (httpRequest.status === 200) {
                  more.remove();
                  var div = document.createElement('div');
                  div.innerHTML = httpRequest.responseText;
                  var nextHTML = div.querySelector('.comment-tree > tbody').innerHTML;
                  document.querySelector('.comment-tree > tbody').innerHTML += nextHTML;
                  ajaxLoadNextPage();
                } else {
                  alert('There was a problem with the request to ' + more.href);
                }
            }
        };
        httpRequest.open('GET', more.href);
        httpRequest.send();
    }
})();

@janklimo
Copy link

janklimo commented Dec 2, 2017

Any plans to package this as an extension?

@kristopolous
Copy link
Author

I was revisiting this this month ... I think what I really want these days is exclusion more than inclusion. For instance, I don't care about healthcare, remote e-learning or fintech (I find them to be huxsters trying to arbitrage broken markets with snake oil tech) but anyway ... a blacklist seems really useful ... I should do that instead.

@kristopolous
Copy link
Author

kristopolous commented Jul 6, 2022

This also works, replace the id with whatever you want.

curl 'https://hacker-news.firebaseio.com/v0/item/31947297.json?print=pretty' | jq '.kids' | grep -Po '[0-9]*' | xargs -n 1 -P 20 -I %% wget https://hacker-news.firebaseio.com/v0/item/%%.json\?print=pretty

Then you can grep that.

@kristopolous
Copy link
Author

kristopolous commented Aug 3, 2022

ok I updated it to implement all the things I've been musing about for 7 years and to hopefully make you laugh out loud while reading it.

It is extremely silly but hopefully not stupid and still legible

@nemanjam
Copy link

nemanjam commented Sep 1, 2022

image

@kristopolous
Copy link
Author

Damn it

I'm still in bed. I'll look when I'm at my office

@kristopolous
Copy link
Author

kristopolous commented Sep 1, 2022

You're right. I was so careful in this. damn it. That's extremely disappointing. I apparently foolishly introduced the bug here when I was just using a phone and their textbox interface: https://gist.github.com/kristopolous/19260ae54967c2219da8/revisions#diff-63e9a5e5dead19d4e7a3ee13c24221089b165a04e534e6e675d491e9422576d1

Fixed. Thanks for the report @nemanjam

@nemanjam
Copy link

nemanjam commented Sep 1, 2022

Thank you.

@gabrielsroka
Copy link

Updated pagination code using fetch and async/await, and while instead of recursion. Forked from @Ivanca

It'd be nice to merge this into query().

(async function () {
    var more;
    while (more = document.querySelector('a.morelink')) {
        const r = await fetch(more.href);
        more.remove();
        const div = document.createElement('div');
        div.innerHTML = await r.text();
        document.querySelector('.comment-tree > tbody').innerHTML += div.querySelector('.comment-tree > tbody').innerHTML;
    }
})();

Also, maybe a bookmarklet? You can drag/drop or copy/paste to your boomarks toolbar. eg:

javascript:
/* /Say hello# */
(function () {
  alert('Hello, HN');
})();

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment