Skip to content

Instantly share code, notes, and snippets.

@nloadholtes
Created April 11, 2012 02:43
Show Gist options
  • Save nloadholtes/2356500 to your computer and use it in GitHub Desktop.
Save nloadholtes/2356500 to your computer and use it in GitHub Desktop.
A quick-and-dirty word frequency counter
217 python
113 web
77 django
55 linux
55 javascript
54 programming
50 design
47 mysql
47 data
44 computer
40 software
40 html
40 css
34 sql
34 application
31 degree
30 code
28 postgresql
25 source
25 solid
25 open
25 jquery
25 developing
25 communication
25 agile
24 java
24 git
24 framework
24 databases
24 database
23 languages
21 testing
21 server
21 must
21 etc
21 and/or
20 team
20 high
19 one
19 frameworks
19 desirable
19 comfortable
19 building
19 able
17 written
17 relational
16 system
16 nosql
15 ruby
15 new
15 management
15 including
15 familiar
15 extensive
15 apache
14 similar
14 projects
14 not
14 mongodb
14 language
14 integration
14 english
14 engineering
14 c++
14 are
13 very
13 version
13 large
13 complex
13 coding
13 but
12 unit
12 time
12 practices
12 oriented
12 fluent
12 driven
12 ajax
12 administration
11 problem
11 perl
11 object-oriented
11 nginx
11 ideally
11 any
10 xml
10 twisted
10 technical
10 solving
10 restful
10 project
10 php
10 passion
10 oracle
10 object
10 more
10 minimum
10 mercurial
10 learn
10 field
10 expert
10 distributed
10 architecture
9 write
9 will
9 use
9 ubuntu
9 test
9 sqlalchemy
9 some
9 solutions
9 scripting
9 scale
9 record
9 rdbms
9 postgres
9 operating
9 mvc
9 multiple
9 like
9 learning
9 highly
9 following
9 designing
9 deploying
9 can
9 best
9 based
9 background
9 apis
9 all
9 about
8 your
8 well
8 verbal
8 unix
8 tools
8 technology
8 structures
8 servers
8 scalable
8 rest
8 relevant
8 redis
8 python/django
8 implementation
8 html5
8 deployment
8 demonstrated
8 aws
8 algorithms
7 tdd
7 svn
7 small
7 services
7 pylons
7 production
7 machine
7 keen
7 internet
7 flask
7 exposure
7 developer
7 commercial
7 clean
7 bonus
7 bash
7 automated
7 api
7 also
6 when
6 website
6 unix/linux
6 understand
6 tornado
6 their
6 teams
6 standards
6 spoken
6 service
6 server-side
6 proven
6 principles
6 position
6 patterns
6 own
6 orm
6 networking
6 mathematics
6 mac
6 library
6 level
6 json
6 implementing
6 higher
6 fast
6 facebook
6 dynamic
6 creating
6 concepts
6 collaborative
6 cloud
6 client
6 candidates
6 both
6 big
6 basic
6 attention
6 area
6 apps
6 approach
6 analytical
6 analysis
5 yrs
5 wsgi
5 writing
5 willing
5 where
5 user
5 two
5 track
5 sites
5 shell
5 security
5 scrum
5 schema
5 rackspace
5 quickly
5 pyramid
5 protocols
5 proficient
5 problems
5 optional
5 motivated
5 maintaining
5 know
5 into
5 interest
5 hands-on
5 hadoop
5 grasp
5 google
5 get
5 environments
5 detail
5 delivering
5 continuous
5 computing
5 command
5 citizenship
5 backbone
5 bachelor’s
5 asset
5 another
5 advanced
4 would
4 within
4 what
4 via
4 variety
4 university
4 things
4 than
4 test-driven
4 successful
4 standard
4 stack
4 social
4 simple
4 sense
4 scaling
4 scala
4 rapid
4 quality
4 practical
4 phd
4 performance
4 passionate
4 parsing
4 over
4 operational
4 oop
4 on-site
4 office
4 network
4 mongo
4 modeling
4 maintain
4 issues
4 integrating
4 initiative
4 information
4 industry
4 independently
4 i'm
4 html/css
4 fundamentals
4 functional
4 front-end
4 fabric
4 eye
4 experienced
4 existing
4 enjoy
4 ec2
4 details
4 desire
4 database-driven
4 create
4 configuration
4 clients
4 caching
4 build
4 aspects
4 amazon
3 you're
3 windows
3 web-services
3 web-based
3 users
3 twitter
3 tuning
3 troubleshooting
3 top
3 tool
3 thinking
3 thing
3 templating
3 template
3 techniques
3 support
3 subversion
3 startup
3 sphinx
3 solr
3 snmp
3 skill
3 sets
3 secure
3 scripts
3 reviews
3 results
3 regular
3 recent
3 really
3 python-based
3 puppet
3 prototype
3 process
3 prior
3 points
3 player
3 packaging
3 out
3 others
3 osx
3 oral
3 optimization
3 open-source
3 nltk
3 *nix
3 necessary
3 mining
3 methodology
3 methodologies
3 memcache
3 meet
3 maths
3 make
3 maintainable
3 love
3 load
3 link
3 linguistics
3 levels
3 lean
3 leading
3 large-scale
3 jinja
3 javascript/jquery
3 it's
3 its
3 interactive
3 history
3 has
3 hard
3 great
3 gis
3 getting
3 general
3 frontend
3 flash
3 five
3 finance
3 fast-paced
3 experience;
3 exceptional
3 excel
3 enterprise
3 embedded
3 email
3 either
3 don't
3 done
3 documented
3 django's
3 django/python
3 developed
3 develop
3 dev
3 delivery
3 deep
3 debugging
3 debian
3 cycle
3 customer
3 creative
3 couchdb
3 confident
3 company
3 community
3 communications
3 come
3 combination
3 cms
3 candidate
3 bootstrap
3 back-end
3 backend
3 bachelor's
3 availability
3 attitude
3 areas
3 architectures
3 applicants
3 additional
3 3-5
2 you've
2 you’re
2 you'll
2 years’
2 years'
2 year
2 xhtml
2 willingness
2 want
2 w3c
2 volume
2 voip
2 visual
2 virtualenv
2 view
2 video
2 various
2 value
2 valid
2 useful
2 up-to-date
2 unfamiliar
2 typical
2 turbogears
2 translate
2 tracking
2 trac
2 tolerance
2 ticket
2 through
2 three
2 thorough
2 this
2 these
2 them
2 that's
2 telephony
2 teamwork
2 styles
2 style
2 store
2 startups
2 start
2 sql;
2 spirit
2 specifically
2 speak
2 south
2 sources
2 solve
2 solution
2 solaris
2 situations
2 significant
2 side
2 should
2 short
2 set
2 send
2 semantic
2 self-starter
2 self
2 selenium
2 see
2 sector
2 second
2 sdk's
2 scoping
2 scm
2 scipy/numpy
2 scientific
2 scalability
2 sass
2 san
2 running
2 ror
2 role
2 revision
2 retrieval
2 resourceful
2 research
2 requirements
2 relocate
2 release
2 references
2 reduce
2 real
2 read
2 raw
2 querying
2 query
2 python;
2 purpose
2 public
2 providers
2 provide
2 prototypes
2 programs
2 programmer
2 productive
2 product
2 producing
2 produce
2 processing
2 prefer
2 pragmatic
2 positive
2 popular
2 polished
2 plots
2 platforms
2 platform
2 planning
2 place
2 physical
2 phase
2 person
2 permanent
2 party
2 participation
2 package
2 paced
2 otherwise
2 orms
2 organizational
2 openid
2 ood
2 online
2 nyc
2 nose
2 non-relational
2 non
2 networks;
2 needs
2 native
2 mysql/postgresql
2 multimedia
2 mssql
2 most
2 months
2 month
2 modelling
2 mobile
2 mixture
2 mind
2 memcached
2 may
2 matlab
2 mastery
2 master’s
2 masters
2 market
2 maps
2 map
2 managing
2 manage
2 mainly
2 machines
2 lua
2 looking
2 live
2 listen
2 linux/bsd
2 line
2 lifecycle
2 life
2 libraries
2 learner
2 leadership
2 leader
2 layout
2 lamp
2 knowledgeable
2 jvm
2 jira
2 james
2 iterative
2 irc
2 interpersonal
2 international
2 interfaces
2 interface
2 interesting
2 interaction
2 install
2 in-depth
2 html/css/javascript
2 how
2 his
2 high-volume
2 high-level
2 her
2 help
2 gunicorn
2 graduates
2 generate
2 full
2 front
2 friendly
2 foundation
2 fluency
2 flow
2 first
2 firm
2 financial
2 files
2 file
2 fault
2 faced
2 evidence
2 event-driven
2 essential
2 erlang
2 enthusiastic
2 engines
2 engine
2 eligibility
2 elastic
2 efficient
2 education
2 econometrics
2 e-commerce
2 eager
2 duration
2 documentation
2 distribution
2 disparate
2 discipline
2 devices
2 developers
2 desktop
2 designs
2 designed
2 deadlines
2 day
2 cycles
2 customers
2 css3
2 cross-platform
2 cross-browser
2 critical
2 couple
2 contributions
2 contract
2 contact
2 constantly
2 concurrency
2 complete
2 competent
2 competencies
2 compatibility
2 communicating
2 common
2 commitments
2 comments
2 combined
2 college
2 collaboration
2 closely
2 client-side
2 client/server
2 client-server
2 chef
2 celery
2 cassandra
2 care
2 card
2 business
2 built
2 bs/ms
2 broadway's
2 broad
2 boto
2 boston
2 beneficial
2 basis
2 basics
2 base
2 bachelor's/master's
2 bachelors
2 available
2 australian
2 asynchronous
2 architectural
2 appreciation
2 app
2 analyze
2 analysing
2 advantage
2 administering
2 admin
2 adept
2 address
2 across
2 accurate
2 access
2 academic
2 2-5
2 10+
1  html
1 yui
1 you!
1 york
1 xslt
1 xserve/os
1 xquery
1 xls
1 wtforms
1 writes
1 workstations
1 workplace
1 workload
1 workflow
1 worker
1 worked
1 workable
1 without
1 //wiki
1 wielding
1 widgets
1 whisperer
1 while
1 whether
1 what’s
1 what's
1 werkzeug
1 we’re
1 well-known
1 welcome
1 webtests
1 webservices
1 weblogic
1 web-based applications
1 weather
1 wear
1 water
1 wai
1 volumes
1 vocal
1 visualization
1 visualising
1 vision
1 virtualization
1 vim
1 versed
1 versatility
1 varnish/redis
1 varied
1 variations
1 variable
1 valued
1 uses
1 used
1 upon
1 un*x
1 until
1 unix-like
1 unix;
1 unit-tested
1 unit-testable
1 units
1 unicode
1 under
1 uml
1 typically
1 type
1 tweaking
1 turning
1 tunnelling
1 tue/wed
1 troubleshoot
1 triumph
1 trial
1 transition
1 transactional
1 training
1 traffic
1 traditional
1 trackers
1 tor
1 tomcat
1 title
1 tiniest
1 throughput
1 throughout
1 thrive
1 threading
1 though
1 thoroughly
1 third
1 think
1 they
1 there's
1 theme
1 textmate
1 text
1 tests
1 tested
1 tertiary
1 terms
1 terminals
1 terminal
1 terabyte
1 temporary
1 tell
1 telecommuting
1 telecommute
1 telecom
1 technology-agnostic
1 technology;
1 technologies;
1 team-spirit
1 team;
1 tdm
1 tddspry
1 tcp/ip
1 tcpdump
1 tcp
1 taylor@emory
1 taylor
1 tastypie
1 taking
1 take
1 tags
1 tackle
1 tables
1 system/application
1 syntax
1 swedish
1 suse
1 supervisord
1 supervisor
1 supervision
1 superb
1 summarize
1 sugar
1 subset
1 submit
1 structuring;
1 structure
1 stringent
1 strengths
1 stream
1 strategies
1 storing
1 stores
1 storage
1 still
1 statistics
1 statistical
1 statements
1 state
1 standards-driven
1 standards-based
1 stand-alone
1 stage
1 staff
1 ssh
1 sqlite
1 spss
1 sproutcore
1 spring
1 spotify
1 sponsorship
1 spine
1 specify
1 specific
1 specifications
1 specification
1 sparsly
1 sophisticated
1 soon
1 something
1 someone
1 sockets
1 socket
1 sociable
1 soap
1 smarter
1 smart
1 small-scale
1 smaller
1 s/m
1 slicehost
1 skills;
1 skillful
1 skilled
1 sketching
1 situation
1 sites/companies
1 sites;
1 site
1 simultaneously
1 sides
1 show
1 shove
1 shouldn't
1 shop
1 ships
1 shipped
1 ship
1 shaping
1 several
1 setting
1 sessions
1 serving
1 server-browser
1 /server
1 series
1 seoul
1 senior-level
1 senior
1 sence
1 semantics
1 self-management
1 self-manage
1 self-built
1 selects
1 search
1 sdlc
1 sdk’s
1 sdk
1 scraping
1 scrapers
1 sc/m
1 scipy
1 sciences
1 science/engineering
1 scheme-like
1 scales
1 scaled
1 scalajava
1 s/b
1 same
1 salary
1 saas
1 rules
1 rrd
1 rpm
1 roll
1 robust
1 rise
1 right
1 revolutionary
1 reviewed
1 reversion
1 re-used
1 re-usability
1 return
1 result
1 restructuredtext
1 reston
1 rest-ful
1 responsible
1 responsibility
1 respond
1 resides
1 requrirements
1 requiring
1 requirejs
1 representation
1 reports
1 reliability
1 relationships
1 regularly
1 regression
1 regardless
1 regarding
1 refactoring
1 redmine
1 recruiters
1 recreational
1 recommendation
1 recognised
1 reasonable
1 reads
1 rdf/triple
1 range
1 rails
1 rabbitmq
1 qunit
1 quicktest
1 queries
1 quantities
1 qualities
1 qualified
1 qualifications
1 pyunit
1 python/ruby
1 pythonic
1 python!
1 pyramid]
1 pypy
1 pylons/pyramid
1 purposes
1 punks
1 pull
1 pubsub
1 publication
1 psu
1 proxies
1 provision
1 protocol
1 /project
1 programming/production
1 programming;
1 program
1 profilers
1 professionals
1 professionally
1 products
1 processes
1 procedural
1 problem-solving
1 proactive
1 prioritization
1 principals
1 principal
1 primarily
1 previous
1 pretoria
1 pressured
1 pressure
1 presentations
1 presentation-layer
1 present
1 prerequisites
1 preprocessors
1 preferable
1 practises
1 practices'-minded
1 powerful
1 power
1 potentially
1 postgis
1 possibly
1 possible
1 possession
1 positioning
1 pos
1 porting
1 polymaps
1 polish
1 plug-in/api
1 plone
1 please
1 player;
1 play
1 plans
1 pip
1 pick
1 physics
1 photovoltaics
1 photoshop
1 photography
1 phone
1 philosophy
1 personal
1 permitted
1 periods
1 period
1 performing
1 perform
1 perforce
1 payment
1 patching
1 paste
1 past
1 passions
1 pass
1 particularly
1 particular
1 part
1 paris
1 paradigms
1 pandas
1 pair
1 paid
1 page
1 packages;
1 ownership
1 overcoming
1 outstanding
1 outside
1 organized
1 organization
1 organise
1 organisational
1 optimizing
1 optimizations
1 optimised
1 optimisation
1 opportunities
1 operators
1 operations
1 open-source-style
1 openlayers
1 openerp
1 onsite
1 only
1 off-site
1 offices
1 offer
1 occur
1 obviously
1 objective-c
1 numpy
1 normalization
1 non-technical
1 non-programmers
1 node
1 nlp’s
1 nix
1 night
1 newcastle
1 never
1 netbsd
1 negotiable
1 need
1 navigating
1 nature
1 natural
1 naming
1 mysql/oracle/ms
1 mysql/mssql
1 mysql/mongodb
1 mysql;
1 /mysql
1 “must”
1 multi-user
1 multitude
1 multi-tiered
1 multi-task
1 multitask
1 multi-server
1 multidisciplinary
1 multi-device
1 mule
1 much
1 mssql/oracle
1 mountain
1 motivation
1 moodle
1 monkey
1 modules
1 mods
1 modify
1 models
1 modelling;
1 model
1 mock
1 mis-specified
1 ministry
1 minimal
1 migrations
1 might
1 mid-june
1 middleware
1 mid
1 microsoft
1 mibs
1 mets
1 meticulous
1 methods
1 methodical
1 method
1 metaprogramming
1 metadata
1 mentality
1 memory
1 member
1 medium
1 mediacore
1 media
1 matter
1 mathematical
1 math
1 master
1 massively
1 marc
1 map/reduce
1 mapreduce
1 many
1 manner
1 manipulation
1 manhattan
1 mandatory
1 manajement
1 manageable
1 mako
1 makes
1 major
1 maintenance
1 maintains
1 maintainability
1 main
1 mailing
1 lucene
1 low
1 loves
1 lover
1 london
1 lolcat
1 log
1 location
1 located
1 local
1 lived
1 lists
1 list
1 linux/unix
1 linux/apache/mysql/python
1 like/play
1 lighttpd
1 life-cycle
1 lieu
1 lies
1 letter
1 less
1 legacy
1 learned
1 leaders
1 lead
1 ldap
1 layers
1 laugh
1 latest
1 larger
1 languages/techniques
1 korn
1 korean
1 knows
1 know-it-alls
1 kind
1 key
1 keeps
1 keep
1 kdb+
1 juggle
1 judgment
1 js/jquery
1 jquery/backbone/extjs/dojo
1 jquery;
1 jose
1 joins
1 jobs
1 jinja2
1 jetty
1 jerks
1 jerk
1 javasscript
1 javascript/html/css
1 javascript/css
1 javascript/ajax
1 javascript;
1 jade
1 iterations
1 is required
1 issue
1 iphone
1 i/o
1 involving
1 investigator
1 intuitive
1 interviews
1 interview
1 interpretive
1 interpreted
1 internationalisation/localisation
1 internal
1 interfacing
1 interested
1 interacting
1 interact
1 intelligent
1 intelligence
1 instruction
1 instincts
1 installation
1 insert
1 input
1 innovation
1 initiatives
1 infro
1 informal
1 infographics
1 indesign
1 independent
1 improvising
1 improve
1 importantly
1 important
1 implement
1 immediately
1 imagine
1 illustrator
1 iis
1 idioms
1 identifying
1 ideas
1 hypothesis
1 hypervisors
1 hurts
1 humor
1 html/css/js
1 html5/css/js
1 html5/css3/javascript/query/flask/nginx
1 html5/css3
1 hours
1 host
1 hopefully
1 home-grown
1 homegrown
1 home
1 hold
1 hive/hadoop
1 high-traffic
1 high-quality
1 high-performance
1 hessian
1 helpful
1 heavy
1 hear
1 haystack/solr
1 haystack
1 hats
1 haskell
1 hardcore
1 happy
1 handy
1 handlebars
1 handle
1 handcraft
1 haml
1 hadoop/hive
1 hacking
1 hacker
1 guis
1 guides
1 gtd
1 groups
1 ground
1 groovy
1 grid-based
1 green
1 graphs
1 graphical
1 graph
1 gradle
1 google+
1 goals
1 gnu/linux
1 gmake
1 global
1 giving
1 give
1 germany
1 geo-spatial
1 geodjango
1 genshi
1 generating
1 generalist
1 gateway
1 gaming
1 games
1 game
1 functioning
1 functionality
1 full-time
1 frontend/backend
1 fresno
1 frequent
1 french;
1 french
1 france
1 framework;
1 foundations
1 forums
1 formulae
1 forms
1 format
1 formal
1 focused
1 focus
1 flux
1 flickr
1 flavour
1 flavors
1 flair
1 fix
1 fit
1 firms
1 finite
1 finished
1 filters
1 filesystem
1 figure
1 fields
1 field;
1 few
1 feedback
1 fedora
1 features
1 feature
1 fear
1 fast-growth
1 faster
1 fastcgi
1 failover
1 face
1 extremely
1 extreme
1 extjs
1 extension
1 extended
1 extend
1 expressions
1 experiance
1 expected
1 expanding
1 expand
1 exosystem
1 exit
1 execute
1 excuse
1 excited
1 examples
1 events
1 event
1 etsy
1 etl
1 ethic
1 estjs
1 estimates
1 especially
1 esb
1 error
1 enviroment
1 enough
1 enhance
1 engagements
1 energy
1 end-to-end
1 end
1 encryption
1 enables
1 emphasis
1 embrace
1 e-mail
1 emacs
1 electricity
1 efficiency
1 effectively
1 edu/galaxyishiring
1 edu
1 editors
1 editor
1 edge
1 ecosystem
1 ecommerceexperience
1 ecommerce
1 east
1 each
1 dynamodb
1 dsp
1 drupal
1 dropbox
1 drive
1 dream
1 down
1 dom
1 dojo
1 docbook
1 dns
1 django-celery
1 django/
1 /django
1 distributions
1 distributed/cloud
1 displays
1 display
1 disco
1 directories
1 directly
1 direct
1 different
1 difference
1 dhtml/xhtml
1 dhtml
1 devs
1 develops
1 developments
1 development;
1 developing/designing
1 developer!
1 develop/analyze
1 detection
1 detailed
1 design patterns
1 design/implementation
1 design-focused
1 department
1 demonstrate
1 demonstrable
1 delete
1 definition
1 decorators
1 decide
1 debuggers
1 debit/credit
1 debian-flavor
1 deadline-driven
1 db's
1 datatypes
1 databses
1 database-
1 data/
1 daemons
1 cyclone
1 cutting
1 curriculum
1 current
1 cs4
1 cross-team
1 cross-functional
1 creativity
1 crap
1 covers
1 coursework
1 couldn't
1 core-java
1 coordinating
1 cooperative
1 co-op
1 conventions
1 contribution
1 contributed
1 contribute
1 contracts
1 content
1 contemporary
1 consumption
1 consumer-facing
1 consumer
1 consulting
1 consult
1 consistent
1 connect
1 conformance
1 configure
1 confidence
1 concisely
1 concise
1 concerns
1 conceptual
1 concepts;
1 conceived
1 conceive
1 computers
1 computational
1 compsci
1 comprehension
1 complicated
1 compliant
1 complexity
1 completed
1 competitive
1 competence
1 compensation
1 comp
1 communities
1 communicator
1 communicate
1 comms
1 commit
1 command/shell
1 collective
1 colleagues
1 collaborated
1 collaborate
1 cognitive
1 coffeescript
1 coder
1 codebase
1 “code
1 cnri
1 clustered
1 cluster
1 cloud-based
1 closures
1 climate
1 client's
1 clearly
1 clearance
1 clear
1 cleansing
1 chosen
1 chooses
1 chinese
1 chicago
1 cherrypy
1 checked
1 challenges
1 centers
1 c/c++
1 cause
1 catalyst
1 casual
1 capability
1 capabilities
1 can't
1 calls
1 bug
1 bruno
1 browser-based
1 brilliant
1 breathe
1 break
1 bottle
1 blueprint
1 bigtable-esque
1 big-data
1 bid
1 better;
1 better
1 'best
1 berlin
1 benchmarking
1 believe
1 being
1 behind
1 becoming
1 beauty”
1 beauty
1 beautiful
1 bay
1 bases;
1 bases
1 balancer
1 back-ends
1 backends
1 back
1 ba/bs
1 awk
1 awesometudity
1 awesome
1 awake
1 autonomously
1 autonomous
1 automation
1 authorization
1 authenticate
1 atompub
1 asterisk
1 assumes
1 assist
1 asset;
1 assess
1 asp
1 arrays
1 arduino
1 architecting/designing
1 arch
1 appscript
1 appropriate
1 apply
1 application;â
1 appliations
1 appengine
1 api’s
1 apache/nginx/gunicorn
1 any;
1 [any
1 ant
1 answer
1 analyzing
1 amqplib
1 always
1 alternative
1 alternate
1 algorithmic
1 algorithm
1 ajaz/websockets/other
1 afrikaans
1 affinity
1 aesthetics
1 adoption
1 adobe
1 admire
1 administration/web
1 administration;
1 addressing
1 added
1 adapting
1 adaptable
1 adapt
1 activity/enthusiast
1 actively
1 active
1 actionscript
1 action
1 acquisition
1 achieving
1 achieve
1 accomodation
1 acclimated
1 accessibility
1 academically
1 abstracted
1 abreast
1 above
1 6/2
1 3rd
1 2-6
1 1-2
1 10g
1 100%
1 0-3
1 000ft
#
# keywords.py
# Nick Loadholtes <nick@ironboundsoftware.com>
#
# Quick script to see what's the hot skills these days.
#
# Best results are to run it this way:
#python3 keywords.py ; sort -rn job_keywords.txt | more
import bs4 #http://www.crummy.com/software/BeautifulSoup/
stopwords= ('and', 'the', 'from', 'with', 'for','our','job','http','com', 'experience','development','years','plus','skills','strong','knowledge','work','systems','working','that','have','excellent','such','equivlent','related','other','least','control','ability','science','applications','understanding','good','you','technologies','environment','preferably','equivalent','preferred','proficiency','professional','familiarity','expertise','required','using' )
replaceables = ('\n','\t',',','(',')',':','"','.')
f = open('jobs.html') #Saved copy of http://www.python.org/community/jobs/
data = f.read()
f.close()
soup = bs4.BeautifulSoup(data)
ttext = []
strongs = soup.find_all('strong')
for strong in strongs:
if strong.next == 'Requirements':
ul = strong.find_next('ul')
ttext.append(ul.get_text().lower())
text = ' '.join(ttext)
for r in replaceables:
text = text.replace(r, ' ')
stext = text.split(' ')
count = {}
for w in stext:
if len(w) < 3 or w in stopwords:
continue
c = count.get(w, 0)
c += 1
count[w] = c
f2 = open('job_keywords.txt', 'w')
for c in count:
f2.write('%d %s\n' % (count[c], c))
f2.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment