Skip to content

Instantly share code, notes, and snippets.

@bekicot
Last active May 22, 2018 11:48
Show Gist options
  • Save bekicot/de6d1fad712c97a04e63c8e21454ff76 to your computer and use it in GitHub Desktop.
Save bekicot/de6d1fad712c97a04e63c8e21454ff76 to your computer and use it in GitHub Desktop.
require 'net/http'
require 'irb'
require 'json'
require 'active_support/core_ext/hash/conversions'
require 'active_support/core_ext/array/grouping.rb'
require 'irb'
require 'thread'
require 'yaml'
def identify(links)
mutex = Mutex.new
threads = [];
results = {};
links.in_groups(10, false).each do |group|
threads << Thread.new do
group.each do |webo|
web = webo.strip
puts web
resp = `node index.js #{web}`
mutex.synchronize {
# results[web] = JSON.parse(resp)
yml = {
web =>
JSON.parse(resp)['applications'].map { |v|
{
'name' => v['name'],
'categories' => v['categories']
}
}
# select{ |app|
# app.keys.include?('name') || app.keys.include?('categories')
# }
}.to_yaml
File.write('web_types_local.yaml', yml.split("\n")[1..-1].join("\n") + "\n", mode: 'a')
}
end
end
end
threads.each(&:join)
results
end
websites = File.readlines('./no_dups_with_http')
results = identify(websites)
# File.write('web_types_local_2.json', JSON.pretty_generate(results))
binding.irb
# f = results.select { |k,x| x['status'] }.map { |k,v| v['apps'].values[0] }.map{|v| v['Issue Tracker']}.compact
Wappalyzer = require('wappalyzer');
const options = {
debug: false,
delay: 500,
maxDepth: 3,
maxUrls: 10,
maxWait: 5000,
recursive: true,
userAgent: 'Wappalyzer',
htmlMaxCols: 2000,
htmlMaxRows: 2000,
};
const wappalyzer = new Wappalyzer(process.argv[2], options);
wappalyzer.analyze()
.then(json => {
process.stdout.write(JSON.stringify(json, null, 2) + '\n')
process.exit(0);
})
.catch(error => {
process.stderr.write(error + '\n')
process.exit(1);
});
http://threedtk.de
https://sourceforge.net
http://slam6d.sourceforge.net
https://52north.org
https://wiki.52north.org
http://52north.org
https://twitter.com
https://blog.52north.org
http://www.aerospaceresearch.net
https://lists.shackspace.de
https://aerospaceresearch.net
https://plus.google.com
http://aerospaceresearch.net
http://github.com
https://gitter.im
https://github.com
http://www.amahi.org
http://forums.amahi.org
http://talk.amahi.org
http://plus.google.com
http://twitter.com
http://blog.amahi.org
https://goo.gl
https://wiki.amahi.org
http://aossie.org
http://gitter.im
https://apache.org
http://apache.org
https://wiki.apache.org
https://blogs.apache.org
http://s.apache.org
http://www.apertium.org
http://wiki.apertium.org
http://www.apertus.org
https://www.apertus.org
https://lab.apertus.org
https://appleseedhq.net
https://groups.google.com
https://slackin-naplrzjfoz.now.sh
https://vimeo.com
http://www.fetchaveryshortfilm.com
http://www.gafferhq.org
http://ardupilot.org
https://discuss.ardupilot.org
http://write.flossmanuals.net
https://beagleboard.org
https://bbb.io
http://beagleboard.org
http://bbb.io
http://beamcommunity.github.io
https://pybee.org
https://djangoproject.com
http://pybee.org
https://cyber.harvard.edu
https://blogs.harvard.edu
https://www.blender.org
http://projects.blender.org
https://code.blender.org
https://wiki.blender.org
http://www.boost.org
https://svn.boost.org
http://lists.boost.org
http://groups.google.com
https://brlcad.org
http://brlcad.org
https://brlcad.zulipchat.com
http://fb.me
http://computationalgenomics.ca
https://genap.ca
https://bitbucket.org
https://summerofcode.withgoogle.com
http://catrobat.org
http://catrob.at
http://developer.catrobat.org
https://ccextractor.org
http://cross.ucsc.edu
https://cross.ucsc.edu
http://programmability.us
http://hepsoftwarefoundation.org
http://www.cern.ch
https://www.cgal.org
http://www.cgal.org
https://chaoss.community
https://wiki.linuxfoundation.org
https://checkerframework.org
https://rawgit.com
https://civicrm.org
https://chat.civicrm.org
https://lab.civicrm.org
https://wiki.civicrm.org
https://dmaster.demo.civicrm.org
https://google.github.io
http://cltk.org
http://docs.cltk.org
https://www.uantwerpen.be
http://www.clips.uantwerpen.be:3001
http://www.clips.uantwerpen.be
https://docs.google.com
https://www.cncf.io
https://cncf.io
http://cloudcv.org
https://gsoc.cloudcv.org
http://gsoc.cloudcv.org
https://coala.io
http://coala.io
https://blog.coala.io
http://projects.coala.io
https://projects.coala.io
http://helikarlab.org
https://Conversations.im
https://conversations.im
https://dino.im
https://cdli.ucla.edu
https://cdli-gh.github.io
http://dbpedia.org
http://wiki.dbpedia.org
https://dbpedia.slack.com
http://blog.dbpedia.org
https://tinyurl.com
http://support.dbpedia.org
https://developers.google.com
https://debian.org
https://lists.debian.org
https://wiki.debian.org
https://planet.debian.org
https://developers.italia.it
https://slack.developers.italia.it
https://medium.com
http://teom.org
https://digitalimpactalliance.org
https://forum.osc.dial.community
https://dial.zulipchat.com
http://osc.dial.community
http://gsoc.dial.community
https://www.djangoproject.com
https://docs.djangoproject.com
https://gist.github.com
https://code.djangoproject.com
https://www.drupal.org
https://groups.drupal.org
http://drupal.org
http://drupalladder.org
http://esipfed.org
http://www.lists.esipfed.org
https://esip-slack-invite.herokuapp.com
https://www.elastic.co
https://discuss.elastic.co
https://eta-lang.org
https://blog.eta-lang.org
https://getfedora.org
https://lists.fedoraproject.org
https://webchat.freenode.net
https://fedoraproject.org
https://communityblog.fedoraproject.org
https://docs.fedoraproject.org
https://www.ffmpeg.org
http://ffmpeg.org
https://trac.ffmpeg.org
https://fossasia.org
https://blog.fossasia.org
https://labs.fossasia.org
https://fossology.org
https://www.fossology.org
http://fossology.slack.com
https://wiki.fossology.org
https://fossi-foundation.org
https://lists.librecores.org
https://librecores.org
https://orconf.org
http://www.freeukgenealogy.org.uk
https://www.freeukgenealogy.org.uk
https://waffle.io
https://www.FreeBSD.org
https://lists.freebsd.org
https://wiki.freebsd.org
https://planet.FreeBSD.org
https://gsoc.FreeBSD.org
http://freetype.org
https://lists.nongnu.org
https://www.freetype.org
https://freifunk.net
https://lists.freifunk.net
http://irc.freifunk.net
http://ninux.org
http://guifi.net
https://openwrt.org
http://www.olsr.org
https://www.open-mesh.org
http://libremesh.org
http://retroshare.sourceforge.net
https://blog.freifunk.net
https://projects.freifunk.net
http://www.ensembl.org
http://lists.ensembl.org
http://www.ensembl.info
https://www.genivi.org
https://lists.genivi.org
https://at.projects.genivi.org
https://gentoo.org
https://wiki.gentoo.org
https://planet.gentoo.org
http://gfoss.eu
http://opensource-devs@ellak.gr
http://ellak.gr
https://ellak.gr
http://git-scm.com
http://libgit2.github.io
https://git.github.io
http://ga4gh.org
https://www.ga4gh.org
https://www.gnome.org
https://wiki.gnome.org
https://planet.gnome.org
http://gnss-sdr.org
http://lists.sourceforge.net
https://gcc.gnu.org
http://gcc.gnu.org
https://www.octave.org
https://lists.gnu.org
http://planet.octave.org
https://savannah.gnu.org
https://wiki.octave.org
http://wiki.octave.org
https://www.gnu.org
https://www.gnuradio.org
https://wiki.gnuradio.org
https://godotengine.org
https://sfconservancy.org
http://webchat.freenode.net
https://grpc.io
http://grpc.io
http://www.haiku-os.org
https://www.haiku-os.org
http://haskell.org
https://www.haskell.org
https://summer.haskell.org
https://brew.sh
http://incf.org
https://www.incf.org
http://inclusivedesign.ca
https://wiki.fluidproject.org
http://idrc.ocadu.ca
http://www.ocadu.ca
https://www.inkscape.org
https://lists.sourceforge.net
https://inkscape.org
https://bugs.launchpad.net
http://wiki.inkscape.org
https://www.sofa-framework.org
http://physiology.kitware.com
http://ai.uni-bremen.de
https://01.org
https://lists.01.org
http://intermine.org
http://intermine.readthedocs.io
https://discord.gg
http://registry.intermine.org
https://intermineorg.wordpress.com
http://www.archive.org
http://blog.archive.org
http://www.isc.org
https://lists.isc.org
https://www.isc.org
http://ietf.org
http://kea.isc.org
http://www.jboss.org
https://developer.jboss.org
http://jderobot.org
https://jenkins.io
http://www.opensource.org
http://jgrapht.org
https://jitsi.org
http://ice4j.org
http://www.unistra.fr
https://www.joomla.org
http://irc.lc
https://volunteers.joomla.org
https://magazine.joomla.org
https://docs.joomla.org
https://issues.joomla.org
https://forum.joomla.org
http://www.jsk.t.u-tokyo.ac.jp
http://www.jsk.imi.i.u-tokyo.ac.jp
https://www.kde.org
https://mail.kde.org
https://userbase.kde.org
https://planet.kde.org
https://community.kde.org
http://www.kiwix.org
https://www.ecured.cu
https://www.wired.com
http://wiki.kiwix.org
http://www.lua.inf.puc-rio.br
https://languagetool.org
https://forum.languagetool.org
http://wiki.languagetool.org
https://leap.se
http://librehealth.io
https://forums.librehealth.io
https://chat.librehealth.io
https://www.libreoffice.org
https://wiki.documentfoundation.org
https://blog.documentfoundation.org
https://lists.freedesktop.org
https://libvirt.org
http://libvirt.org
http://planet.virt-tools.org
http://wiki.libvirt.org
http://llvm.org
http://lists.llvm.org
http://blog.llvm.org
http://luarocks.org
http://luarocks.github.io
https://mariadb.org
https://mariadb.com
https://matrix.org
https://matrix.to
https://fosdem.org
https://riot.im
https://metabrainz.org
https://community.metabrainz.org
https://blog.musicbrainz.org
https://wiki.musicbrainz.org
https://metasploit.com
https://blog.rapid7.com
https://www.mixxx.org
https://mixxx.zulipchat.com
http://www.mlpack.org
https://www.mrpt.org
http://www.mrpt.org
http://ingmec.ual.es
http://reference.mrpt.org
http://mapir.isa.uma.es
https://www.youtube.com
http://www.ros.org
https://moodle.org
https://docs.moodle.org
https://telegram.me
https://moodle.com
http://terasology.org
http://forum.terasology.org
https://skaldarnar.github.io
https://mozilla.org
https://wiki.mozilla.org
https://www.mozilla.org
http://blog.mozilla.org
https://musescore.org
https://dev-list.musescore.org
https://kiwiirc.com
http://www.nrnb.org
http://wikipathways.org
http://sbml.org
http://cbioportal.org
https://neovim.io
http://www.netfilter.org
http://vger.kernel.org
http://irc.netsplit.de
http://people.netfilter.org
https://www.numfocus.org
http://cantera.org
https://conda-forge.org
https://fenicsproject.org
https://julialang.org
http://docs.pymc.io
http://shogun.ml
http://www.numfocus.org
https://omegaup.org
https://blog.omegaup.com
https://www.open-bio.org
http://obf.github.io
http://news.open-bio.org
https://openchemistry.org
https://public.kitware.com
http://wiki.openchemistry.org
https://opendatakit.org
https://forum.opendatakit.org
http://slack.opendatakit.org
https://youtube.com
https://world.openfoodfacts.org
https://slack-ssl-openfoodfacts.herokuapp.com
https://en.blog.openfoodfacts.org
https://en.wiki.openfoodfacts.org
https://www.open-roberta.org
https://lab.open-roberta.org
https://www.osrfoundation.org
http://osrfoundation.org
https://openstates.org
https://openstates-slack.herokuapp.com
https://blog.openstates.org
http://www.OpenAstronomy.org
http://openastronomy.org
http://www.nasa.gov
https://www.skatelescope.org
http://sdo.gsfc.nasa.gov
http://flash.uchicago.edu
https://ssd.jpl.nasa.gov
http://astropy.org
http://www.glueviz.org
http://juliaastro.github.io
http://opencensus.io
https://opensource.googleblog.com
https://openmrs.org
https://talk.openmrs.org
http://irc.openmrs.org
https://blog.openmrs.org
https://om.rs
http://www.opensips.org
https://blog.opensips.org
https://opensips.slack.com
http://www.openstreetmap.org
http://lists.openstreetmap.org
http://wiki.openstreetmap.org
https://blog.openstreetmap.org
https://www.opensuse.org
https://en.opensuse.org
https://news.opensuse.org
http://101.opensuse.org
http://openwisp.org
http://www.cittametropolitanaroma.gov.it
https://www.cineca.it
http://netjson.org
http://opntec.org
https://blog.opntec.org
http://blog.opntec.org
https://www.oppia.org
https://osgeo.org
https://www.osgeo.org
https://wiki.osgeo.org
http://planet.osgeo.org
http://wiki.osgeo.org
http://www.ow2.org
https://mail.ow2.org
https://gitlab.ow2.org
https://tc.ow2.org
https://www.owasp.org
http://owasp.blogspot.com
http://www.p2psp.org
http://pecanproject.org
http://pecanproject.github.io
http://pcp.io
https://www.phpbb.com
https://blog.phpbb.com
https://www.phpmyadmin.net
https://lists.phpmyadmin.net
https://planet.phpmyadmin.net
https://plone.org
https://community.plone.org
https://planet.plone.org
https://docs.plone.org
https://pmd.github.io
https://pslab.fossasia.org
http://blog.fossasia.org
http://pollylabs.org
https://en.wikipedia.org
https://polly.llvm.org
http://isl.gforge.inria.fr
http://barvinok.gforge.inria.fr
http://ppcg.gforge.inria.fr
http://pet.gforge.inria.fr
https://postgresql.org
https://lists.postgresql.org
https://www.postgresql.org
https://planet.postgresql.org
https://wiki.postgresql.org
https://probot.github.io
http://expressjs.com
http://publiclab.org
https://publiclab.org
https://git.purrdata.net
http://disis.music.vt.edu
http://www.hydra-gsoc.appspot.com
https://www.w3.org
http://hydra-gsoc.appspot.com
https://www.python.org
https://mail.python.org
http://python-gsoc.org
https://pyfound.blogspot.com
https://qemu.org
https://wiki.qemu.org
https://www.qemu.org
https://quill.org
https://community.quill.org
https://www.fastcompany.com
https://blog.google
https://www.quill.org
https://trello.com
https://www.r-project.org
http://www.r-bloggers.com
http://radare.org
http://radare.today
https://reactos.org
https://www.reactos.org
https://readthedocs.org
http://docs.readthedocs.io
https://blog.readthedocs.com
http://redhenlab.org
https://red-hen-gsoc.slack.com
https://sites.google.com
http://robocomp.org
https://robocomp.github.io
https://rocket.chat
https://open.rocket.chat
https://rspamd.com
https://www.rtems.org
https://devel.rtems.org
https://www.ruby-lang.org
http://slack.bundler.io
http://rubyonrails.org
http://weblog.rubyonrails.org
http://sciruby.com
http://www.ruby.or.jp
https://www.sagemath.org
https://planet.sagemath.org
https://wiki.sagemath.org
http://www.scala-lang.org
https://users.scala-lang.org
https://scala-lang.org
https://www.scilab.org
http://www.scilab.org
https://wiki.scilab.org
https://scummvm.org
http://wiki.scummvm.org
http://planet.scummvm.org
http://www.seastar-project.org
http://seastar-dev@googlegroups.com
https://scylladb-users.slack.com
http://docs.seastar-project.org
https://www.sosy-lab.org
https://cpachecker.sosy-lab.org
http://linuxtesting.org
http://space.vt.edu
https://spdx.org
https://lists.spdx.org
http://wiki.spdx.org
https://www.stemformatics.org
https://stellar-group.org
https://mail.cct.lsu.edu
https://bmi.stonybrookmedicine.edu
https://strace.io
https://lists.strace.io
http://man7.org
https://streetmix.net
https://streetmix-slack.herokuapp.com
https://streetmix.readme.io
http://submitty.org
https://join.slack.com
https://rcos.io
https://cs.rpi.edu
http://www.sugarlabs.org
http://lists.sugarlabs.org
http://chat.sugarlabs.org
https://wiki.sugarlabs.org
http://www.scorelab.org
https://swift.org
http://www.sympy.org
https://anitab.org
http://www.systers.org
http://systers.io
http://systers-opensource.blogspot.com
http://code.v.igoro.us
https://calendly.com
http://teammatesv4.appspot.com
http://tinyurl.com
http://ccl.northwestern.edu
http://netlogoweb.org
https://ccl.northwestern.edu
http://www.eclipse.org
https://accounts.eclipse.org
https://projects.eclipse.org
http://planet.eclipse.org
https://wiki.eclipse.org
https://honeynet.org
https://gsoc-slack.honeynet.org
https://libreswan.org
https://lists.libreswan.org
http://www.linuxfoundation.org
http://www.linux-foundation.org
https://www.macports.org
https://trac.macports.org
http://mifos.org
https://mifosforge.jira.com
https://www.NetBSD.org
https://www.netbsd.org
https://wiki.netbsd.org
https://blog.NetBSD.org
https://www.nsnam.org
https://ns-3.zulipchat.com
http://processingfoundation.org
https://forum.processing.org
http://processing.org
http://py.processing.org
http://wiki.qt.io
http://lists.qt-project.org
https://wiki.qt.io
http://blog.qt.io
https://syslog-ng.org
https://lists.balabit.hu
https://vega.github.io
https://communityinviter.com
http://medium.com
http://bit.ly
https://www.winehq.org
https://wiki.winehq.org
https://code.timvideos.us
https://hdmi2usb.tv
https://lamport.azurewebsites.net
http://xena.ucsc.edu
http://biojs.io
https://ucscgenomics.soe.ucsc.edu
https://www.videolan.org
https://mailman.videolan.org
http://planet.videolan.org
https://wiki.videolan.org
https://visp.inria.fr
https://team.inria.fr
http://visp-doc.inria.fr
https://vitrivr.org
http://www.vitrivr.org
https://webpack.js.org
http://webpack.js.org
http://wikimediafoundation.org
https://lists.wikimedia.org
https://wikimedia.zulipchat.com
https://blog.wikimedia.org
https://www.mediawiki.org
http://worldbrain.io
http://join-worldbrain.herokuapp.com
http://www.x.org
https://www.x.org
https://planet.freedesktop.org
https://xapian.org
https://trac.xapian.org
http://kodi.tv
http://kodi.wiki
https://kodi.tv
https://www.reddit.com
https://botbot.me
http://google.github.io
https://xpra.org
http://lists.devloop.org.uk
https://www.xpra.org
http://www.xwiki.org
http://dev.xwiki.org
http://gsoc.xwiki.org
https://zulip.com
https://chat.zulip.org
https://zulipchat.com
http://zulip.readthedocs.io
https://blog.zulip.org
https://zulip.readthedocs.io
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment