-
Open a browser
# start an instance of firefox with selenium-webdriver $browser_type = 'firefox' $host = 'http://localhost:4444/wd/hub'
$capabilities = array(\WebDriverCapabilityType::BROWSER_NAME => $browser_type);
Have you ever wanted to get a specific data from another website but there's no API available for it? That's where Web Scraping comes in, if the data is not made available by the website we can just scrape it from the website itself.
But before we dive in let us first define what web scraping is. According to Wikipedia:
{% blockquote %} Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox. {% endblockquote %}
Elasticsearch was created in 2010 by Shay Banon after forgoing work on another search solution, Compass, also built on Lucene and created in 2004.
server { | |
listen 80 default_server; | |
listen [::]:80 default_server; | |
root /your/root/path; | |
index index.html; | |
server_name you.server.com; |
# to generate your dhparam.pem file, run in the terminal | |
openssl dhparam -out /etc/nginx/ssl/dhparam.pem 2048 |
- What do Etcd, Consul, and Zookeeper do? | |
- Service Registration: | |
- Host, port number, and sometimes authentication credentials, protocols, versions | |
numbers, and/or environment details. | |
- Service Discovery: | |
- Ability for client application to query the central registry to learn of service location. | |
- Consistent and durable general-purpose K/V store across distributed system. | |
- Some solutions support this better than others. | |
- Based on Paxos or some derivative (i.e. Raft) algorithm to quickly converge to a consistent state. | |
- Centralized locking can be based on this K/V store. |
# Change YOUR_TOKEN to your prerender token and uncomment that line if you want to cache urls and view crawl stats | |
# Change example.com (server_name) to your website url | |
# Change /path/to/your/root to the correct value | |
server { | |
listen 80; | |
server_name example.com; | |
root /path/to/your/root; | |
index index.html; |
Packer
Packer is used to build image from a base image, perform provisions and store (commit) the final image.
We use provisioners and Packer templates to do the actual work to create the final image.
We use Ansible for provisioning.
I have been an aggressive Kubernetes evangelist over the last few years. It has been the hammer with which I have approached almost all my deployments, and the one tool I have mentioned (shoved down clients throats) in almost all my foremost communications with clients, and it was my go to choice when I was mocking my first startup (saharacluster.com).
A few weeks ago Docker 1.13 was released and I was tasked with replicating a client's Kubernetes deployment on Swarm, more specifically testing running compose on Swarm.
And it was a dream!
All our apps were already dockerised and all I had to do was make a few modificatons to an existing compose file that I had used for testing before prior said deployment on Kubernetes.
And, with the ease with which I was able to expose our endpoints, manage volumes, handle networking, deploy and tear down the setup. I in all honesty see no reason to not use Swarm. No mission-critical feature, or incredibly convenient really nice to have feature in Kubernetes that I'm go
This will get you routable containers with IPs on your existing subnets, advertising to Consul. They will also be scalable and placed across a cluster of Swarm hosts. It's assumed that you are already running Consul, so if not, there are a ton of tutorials out there. It's also assumed you know how to install Docker and various Linux kernels.
Bonus: We add an autoscaling API called Orbiter (https://gianarb.it/blog/orbiter-the-swarm-autoscaler-moves).
So you have an existing environment. You use Consul for service discovery. Life is good. Containers are now a thing and you want to work them in without having to worry about overlay networking or reverse proxies. You also don't want to add extra latency (as some naysayers could use it as fuel to kill your hopes and dreams). Lastly, you don't have a lot of time to invest in a complex orchestration tool, such a