Felipe Forbeck fforbeck

## gist:43f3e946357a045a6cf3
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

- Use create in the index API (assuming you can).
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
- Increase the memory allocated to elasticsearch node. By default its 1g.
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
- Increase the number of machines you have so

## Clojure.sublime-settings
// installed Clojure packages:
//
// * BracketHighlighter
// * lispindent
// * SublimeREPL
// * sublime-paredit

{
  "word_separators": "/\\()\"',;!@$%^&|+=[]{}`~?",
  "paredit_enabled": true,

## profiles.clj
{:user {:dependencies [[org.clojure/tools.namespace "0.2.3"]
                       [spyscope "0.1.3"]
                       [criterium "0.4.1"]]
        :injections [(require '(clojure.tools.namespace repl find))
                     ; try/catch to workaround an issue where `lein repl` outside a project dir
                     ; will not load reader literal definitions correctly:
                     (try (require 'spyscope.core)
                       (catch RuntimeException e))]
        :plugins [[lein-pprint "1.1.1"]
                  [lein-beanstalk "0.2.6"]

## ReCaptchaUtil.java
import static org.apache.commons.lang.StringUtils.isBlank;

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.apache.log4j.Logger;

import net.tanesha.recaptcha.ReCaptchaImpl;
import net.tanesha.recaptcha.ReCaptchaResponse;

## interesse-dos-usuários.adoc

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                fforbeck
                / interesse-dos-usuários.adoc
            
            
              Last active
              December 24, 2015 07:19
            
              
                Desafio Graph-gist : http://www.neo4j.org/learn/graphgist_challenge
              
          
    Interesse dos usuários


Versão em inglês


Motivação


Para empresas que trabalham com publicidade online, mais precisamente DSPs e suas plataformas de Real-time Bidding é muito importante coletar a analisar informações a cerca do comportamento e interesses dos usuários enquanto navegam na internet.
Sendo assim, descrevo nesse graph-gist uma abordagem básica que pode ser utilizada para analisar tais dados, considerando um determinado período de tempo e os produtos visualizados por cada usuário.
Vamos considerar que alguns personagens de Breaking Bad navegaram na internet há alguns dias e encontraram alguns produtos, elementos químicos e, possuem interesse em comprá-los. Tal informação será de extrema importância no momento de dar um lance em um leilão de publicidade, sabendo o perfil e interesse de um determinado usuário.
Então armazenamos tais usuários, produtos e datas das visualizações para que possamos extrair essas informações futuramente.

  
## Graph Gist Sample
#New graph
START root=node(0)
CREATE
(User1 { name:'User1' }),
(User2 { name: 'User2' }),
(User3 { name: 'User3' }),

(Mac { name: 'Mac' }),
(Samsung { name: 'Samsung' }),
(Brastemp { name: 'Brastemp' }),

## 1-users-interests.adoc

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                fforbeck
                / 1-users-interests.adoc
            
            
              Last active
              December 22, 2015 21:19
                — forked from nawroth/GraphGist-syntax.adoc
            
              
                Graph-gist challenge: http://www.neo4j.org/learn/graphgist_challenge

              
    Interest of users


Portuguese version


Motivation


For companies that work with online advertising, more precisely DSPs and their Real-time Bidding platforms is very important to consider collecting information about the behavior and interests of users while they surfing on the internet.
Therefore, in this graph gist, I describe a basic approach that can be used to analyze such data, considering a certain period of time and the products visualized by each user.
Let’s consider some characters from Breaking Bad surfed on the internet a few days ago and found some products interesting, chemical elements, and they are thinking about buying them. Such information is extremely important when making a bid at auction advertising, knowing the profile and interests of a given user.
Then we store these users, products and dates of views so we can extract this information in the future.

  
## add_ebs_to_ubuntu_ec2
You need to format the EBS volume (block device) with a file system between step 1 and step 2. So the entire process with your sample mount point is:

Create EBS volume.

Attach EBS volume to /dev/sdf (EC2's external name for this particular device number).

Format file system /dev/xvdf (Ubuntu's internal name for this particular device number):

sudo mkfs.ext4 /dev/xvdf
Mount file system (with update to /etc/fstab so it stays mounted on reboot):

## fix_java_plugin_macos
sudo mkdir -p /Library/Internet\ Plug-Ins/disabled
sudo mv /Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin /Library/Internet\ Plug-Ins/disabled
sudo ln -sf /System/Library/Java/Support/Deploy.bundle/Contents/Resources/JavaPlugin2_NPAPI.plugin /Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin
sudo ln -sf /System/Library/Frameworks/JavaVM.framework/Commands/javaws /usr/bin/javaws

## git_install.sh
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:git-core/ppa
sudo apt-get update
sudo apt-get install git
	If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

	- Use create in the index API (assuming you can).
	- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
	- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
	- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
	- Increase the memory allocated to elasticsearch node. By default its 1g.
	- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
	- Increase the number of machines you have so
	// installed Clojure packages:
	//
	// * BracketHighlighter
	// * lispindent
	// * SublimeREPL
	// * sublime-paredit

	{
	"word_separators": "/\\()\"',;!@$%^&\|+=[]{}`~?",
	"paredit_enabled": true,
	{:user {:dependencies [[org.clojure/tools.namespace "0.2.3"]
	[spyscope "0.1.3"]
	[criterium "0.4.1"]]
	:injections [(require '(clojure.tools.namespace repl find))
	; try/catch to workaround an issue where `lein repl` outside a project dir
	; will not load reader literal definitions correctly:
	(try (require 'spyscope.core)
	(catch RuntimeException e))]
	:plugins [[lein-pprint "1.1.1"]
	[lein-beanstalk "0.2.6"]
	import static org.apache.commons.lang.StringUtils.isBlank;

	import javax.servlet.http.HttpServletRequest;
	import javax.servlet.http.HttpServletResponse;

	import org.apache.log4j.Logger;

	import net.tanesha.recaptcha.ReCaptchaImpl;
	import net.tanesha.recaptcha.ReCaptchaResponse;
	#New graph
	START root=node(0)
	CREATE
	(User1 { name:'User1' }),
	(User2 { name: 'User2' }),
	(User3 { name: 'User3' }),

	(Mac { name: 'Mac' }),
	(Samsung { name: 'Samsung' }),
	(Brastemp { name: 'Brastemp' }),
	You need to format the EBS volume (block device) with a file system between step 1 and step 2. So the entire process with your sample mount point is:

	Create EBS volume.

	Attach EBS volume to /dev/sdf (EC2's external name for this particular device number).

	Format file system /dev/xvdf (Ubuntu's internal name for this particular device number):

	sudo mkfs.ext4 /dev/xvdf
	Mount file system (with update to /etc/fstab so it stays mounted on reboot):
	sudo mkdir -p /Library/Internet\ Plug-Ins/disabled
	sudo mv /Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin /Library/Internet\ Plug-Ins/disabled
	sudo ln -sf /System/Library/Java/Support/Deploy.bundle/Contents/Resources/JavaPlugin2_NPAPI.plugin /Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin
	sudo ln -sf /System/Library/Frameworks/JavaVM.framework/Commands/javaws /usr/bin/javaws
	sudo apt-get install python-software-properties
	sudo add-apt-repository ppa:git-core/ppa
	sudo apt-get update
	sudo apt-get install git