As websites become more JavaScript heavy, it's harder to automate things like screenshotting for archival purposes. I've seen examples and suggestions to use PhantomJS for visual testing/archiving of websites, but have run into issues such as the non-rendering of webfonts. I've never tried out Selenium until today...and while I'm not thinking about performance implications yet, Selenium seems far more accurate than PhantomJS...which makes sense since it actually opens a real browser. And it's not too hard to script to do complex interactions: here's an [example of how to log in to Twitter, write a tweet, upload an image, and send a tweet via Selenium and DOM element selection](https://gist.github.com/dannguyen/8a6fa49253c1d6a0eb92
This is a very simple approach to doing role-based access control with Neo4j. It is optimistic, in the sense that all items are assumed to be world-readable unless they have specific constraints. Item visibility can be constrained to either individual users or all users who belong to a role. Roles are also hierarchical, so can inherit privileges from other roles.
First, lets create our basic example data:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Joe Sandbox API wrapper. | |
# REQUIRES: python-requests http://docs.python-requests.org/en/latest/ | |
import sys | |
import time | |
import random | |
import getpass | |
import requests | |
try: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example: | |
- Use create in the index API (assuming you can). | |
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval). | |
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap. | |
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000). | |
- Increase the memory allocated to elasticsearch node. By default its 1g. | |
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine. | |
- Increase the number of machines you have so |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cd ~ | |
sudo yum update | |
sudo yum install java-1.7.0-openjdk.i686 -y | |
wget https://github.com/downloads/elasticsearch/elasticsearch/elasticsearch-0.19.9.tar.gz -O elasticsearch.tar.gz | |
tar -xf elasticsearch.tar.gz | |
rm elasticsearch.tar.gz | |
mv elasticsearch-* elasticsearch | |
sudo mv elasticsearch /usr/local/share |