cristiano74/chi square python

## chi square python
#Chi square - python


[toc]


##Definizione del processo

1. Generazione di una popolazione a partire da una tupla generica
2. estrazione casuale di un campione di n_elementi dalla pop creata
3. calcolo della frequenze attese e ossevarte in base alla tupla estratta
4. calcolo del chi-quadrato con p_value associato e definizione dei livelli di accuratezza del risulato.

+ State your conclusion in terms of your hypothesis.

+ **If the p value for the calculated chi-square is p > 0.05, accept your hypothesis. 'The deviation is small enough that chance alone accounts for it**.

+ A p value of **0.6**, for example, means that **there is a 60% probability that any deviation from expected is *due to chance only***. This is within the range of acceptable deviation.

+ If the p value for the calculated chi-square is ***p < 0.05*, REJECT your hypothesis, and conclude that some factor other than chance** is operating for the deviation to be so great.

+ For example, a p value of 0.01 means that there is only a 1% chance that this deviation is due to chance alone. Therefore, other factors must be involved.

+ The chi-square test will be used to test for the "goodness to fit" between observed and expected data from several laboratory investigations in this lab manual.

LINK (http://www2.lv.psu.edu/jxm57/irp/chisquar.html)


##Esempio

 **La tupla generica estratta è:**

`    px [0.0, 0.05, 0.15, 0.1, 0.35, 0.15, 0.2]`

Da una popolazione di 100,000 elementi , estraggo 1000 elementi:
Quindi la freq. **attesa** assoluta (FA) è la seguente (totale = 1000 elementi):

`fa_pre [50, 150, 100, 350, 150, 200]`

*Notare che il **valore 0** è stato eliminato dal vettore, per [bug con il sistema di calcolo python][6]


Quindi la freq. **osservata** assoluta (FO) è :

`    fo_pre [53, 155, 96, 332, 162, 202]`

Il confronto fra FA e FO:

`[ 50 150 100 350 150 200] 	--> FA`
`[ 53 155  96 332 162 202]  --> FO`


Calcolo del pvalue: **`Chi square test: 0.789628937062`**

A ciscun valore di p_value corrisponde un livello di accuratezza, come nella tabella sotto:

Dalla elaborazione della [v1.1](#v11), appare come risultato quanto segue:

###Risultato ottenuto con [CSTP](#cstp) v1.1

BASH line$ in linux: `python test4.py 500 100000 1000 10  `


` ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 149, ' Med:', 201, ' Low:', 150, 'nullo', 0)
`


name					|	label
------------------------|
'pop=', 100000,			|   100k casi considerati come  pop
'campione=', 1000    	|   1000 casi estratti (sample)
'Tuple estratte=' 500,  |   500 combinazioni di freschezza su 230k totali utilizzate nel processo
' High:', 149,   		| 149 su 500 combinazioni hanno dato p value alto (vedere [tabella](#ptable))
' Med:', 201,   		| 201 su 500 combinazioni hanno dato p value medio
' Low:', 150,   		| 150 su 500 combinazioni hanno dato p value medio
'nullo', 0  			| da trascurare - solo per controllo

Il processo è ripetuto n volte, con la stessa numerosità della poopolazione e del campione, ma con differenti tuple estratte.
I risultati sono memorizzati nel file [**SAMPLERESULTS_CHISQUARE.txt**](https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/SAMPLERESULTS_CHISQUARE.txt?edit]).

Un estratto del file :
```
('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 139, ' Med:', 221, ' Low:', 140, 'nullo', 0)
('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 163, ' Med:', 177, ' Low:', 160, 'nullo', 0)
('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 170, ' Med:', 179, ' Low:', 151, 'nullo', 0)
('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 142, ' Med:', 209, ' Low:', 149, 'nullo', 0)
('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 146, ' Med:', 195, ' Low:', 159, 'nullo', 0)
('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 146, ' Med:', 204, ' Low:', 150, 'nullo', 0)
```

###<a id="ptable">Tabella di referimento per il p_value</a>

p value   | level
--------- | -----
<0.30     | low accuracy
0.30-0.70 | medium accuracy
\>0.70	  | High accuracy


----------
## CODE
### Create chi square test process [CSTP] <a id="cstp"></a>- test4.py
> note: Chi Square Test Process = CSTP

**`CODICE per la generazione del pvalue CHI SQUARE`**
[source code ] [7]


####<a id="v11">V1.1</a>
> - introdotto il ciclo sulle ripetizioni del processo, param 4
> 	- la lista delle tuple viene ri-estratta ogni ciclo
> - save dei risultati in: 	[**SAMPLERESULTS_CHISQUARE.txt**](https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/SAMPLERESULTS_CHISQUARE.txt?edit])
> - dipende dalle tuple create con [Cretate tuples v1.0](#createtuplesv1.0)

```
############################################
#   USAGE:
#   python test4.py 100 100000 5000
#   param1= numero estrazioni casuali di tuple
#   param2= num pop
#   param3= num sample
#   param4= num di ripetizioni del processo
############################################
import sys
import random
import collections
import numpy
#QUI SI APRE IL FILE CON LE COMBINAZIONI
#################################
lista=[]
import pickle
#f = file("test.txt", "r")
f = file("outfile.txt", "r")
lista=pickle.load(f)
#############################
#from scipy.stats import mannwhitneyu
#from scipy.stats import chisquare
#from scipy.stats import kruskal
from scipy.stats import rv_discrete

import scipy.stats.mstats as mst

tuple_tot = 0
acc=0
high = 0
low = 0
med = 0
nullo = 0
p_value=0
# a= random.sample(range(100), 100)
n_extract = int(sys.argv[1]) #, indare numero di estrazioni casuali di tuple dalle permutazioni

fa = []
fa_pre = []
fo_pre = []
fo =[]
tot=int(sys.argv[2])
Nsample = int(sys.argv[3])

for i in xrange(int(sys.argv[4])):
    lista = random.sample(lista,n_extract)
    acc=0
    high = 0
    low = 0
    med = 0
    nullo = 0
    p_value=0
    for e in lista:

        print e

        #px=[float(e[0]/100),float(e[1]/100),float(e[2]/100),float(e[3]/100),float(e[4]/100),float(e[5]/100),float(e[6]/100)]  # 7 rep]

        e1= ((e[0]/float(100)))
        e2= ((e[1]/float(100)))
        e3= ((e[2]/float(100)))
        e4= ((e[3]/float(100)))
        e5= ((e[4]/float(100)))
        e6= ((e[5]/float(100)))
        e7= ((e[6]/float(100)))


        px=[e1,e2,e3,e4,e5,e6,e7]  # 7 rep]
        print "px", px
        x =[1,2,3,4,5,6,7]
        fa_pre=([int(e1*Nsample),int(e2*Nsample),int(e3*Nsample),int(e4*Nsample),int(e5*Nsample),int(e6*Nsample),int(e7*Nsample)])
        fa_pre[:] = (value for value in fa_pre if value != 0.0)
        print "fa_pre",fa_pre
        fa= numpy.array(fa_pre)
        #fa = numpy.array([int(e1*Nsample),int(e2*Nsample),int(e3*Nsample),int(e4*Nsample),int(e5*Nsample),int(e6*Nsample),int(e7*Nsample)])

        pop = rv_discrete(values=(x, px)).rvs(size=tot)
        random.shuffle(pop)
        sample1 = random.sample(pop,Nsample)
        counter_sample=collections.Counter(sample1)
        fo_pre = [counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]]
        fo_pre[:] = (value for value in fo_pre if value != 0.0)
        print "fo_pre",fo_pre
        fo = numpy.array(fo_pre)
        #fo = numpy.array([counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]])
        print fa
        print fo
        #print mst.chisquare(fo, fa)

        #print(samples)
        try:
            u, p_value = mst.chisquare(fo, fa)
            #u, p_value  =   2*mannwhitneyu(pop, sample1)
            #u, p_value =   kruskal(pop, sample1)
            if p_value > 0.70:
                acc += 1
                test = "HIGH precision"
                high += 1
            else:
                if p_value < 0.30:
                    test = "Low precision"
                    low += 1
                else:
                    test = "Medium precision"
                    med += 1
        except:
                nullo +=1
                p_value = 0
                pass
        print "Chi square test:", p_value
    final = "<<<Risultato>>>",'pop=', tot, "; campione=", Nsample ,"Estrazioni=",n_extract, "High:",high," Med:",med," Low:",low, "nullo", nullo
    print final
    log = open("SAMPLERESULTS_CHISQUARE.txt","a")
    print >> log,final
    log.close()

```

####v 1.0


    ############################################
    #   USAGE:
    #   python test4.py 100 100000 5000
    #   param1= replicazioni test, casuali tuple
    #   param2= num pop
    #   param3= num sample
    ############################################
    import sys
    import random
    import collections
    import numpy

	#QUI SI APRE IL FILE CON LE COMBINAZIONI
    #################################
    lista=[]
    import pickle
    #f = file("test.txt", "r")
    f = file("outfile.txt", "r")
    lista=pickle.load(f)
    #############################

    #from scipy.stats import mannwhitneyu
    #from scipy.stats import chisquare
    #from scipy.stats import kruskal
    from scipy.stats import rv_discrete

    import scipy.stats.mstats as mst

    tuple_tot = 0
    acc=0
    high = 0
    low = 0
    med = 0
    nullo = 0
    p_value=0
    # a= random.sample(range(100), 100)
    n_extract = int(sys.argv[1]) #, indare numero di estrazioni casuali di tuple dalle permutazioni
    lista = random.sample(lista,n_extract)
    fa = []
    fa_pre = []
    fo_pre = []
    fo =[]
    for e in lista:
        tot=int(sys.argv[2])
        Nsample = int(sys.argv[3])
        print e

        #px=[float(e[0]/100),float(e[1]/100),float(e[2]/100),float(e[3]/100),float(e[4]/100),float(e[5]/100),float(e[6]/100)]  # 7 rep]

        e1= ((e[0]/float(100)))
        e2= ((e[1]/float(100)))
        e3= ((e[2]/float(100)))
        e4= ((e[3]/float(100)))
        e5= ((e[4]/float(100)))
        e6= ((e[5]/float(100)))
        e7= ((e[6]/float(100)))


        px=[e1,e2,e3,e4,e5,e6,e7]  # 7 rep]
        print "px", px
        x =[1,2,3,4,5,6,7]
        fa_pre=([int(e1*Nsample),int(e2*Nsample),int(e3*Nsample),int(e4*Nsample),int(e5*Nsample),int(e6*Nsample),int(e7*Nsample)])
        fa_pre[:] = (value for value in fa_pre if value != 0.0)
        print "fa_pre",fa_pre
        fa= numpy.array(fa_pre)
        #fa = numpy.array([int(e1*Nsample),int(e2*Nsample),int(e3*Nsample),int(e4*Nsample),int(e5*Nsample),int(e6*Nsample),int(e7*Nsample)])

        pop = rv_discrete(values=(x, px)).rvs(size=tot)
        random.shuffle(pop)
        sample1 = random.sample(pop,Nsample)
        counter_sample=collections.Counter(sample1)
        fo_pre = [counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]]
        fo_pre[:] = (value for value in fo_pre if value != 0.0)
        print "fo_pre",fo_pre
        fo = numpy.array(fo_pre)
        #fo = numpy.array([counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]])
        print fa
        print fo
        #print mst.chisquare(fo, fa)

        #print(samples)
        try:
            u, p_value = mst.chisquare(fo, fa)
            #u, p_value  =   2*mannwhitneyu(pop, sample1)
            #u, p_value =   kruskal(pop, sample1)
            if p_value > 0.70:
                acc += 1
                test = "HIGH precision"
                high += 1
            else:
                if p_value < 0.30:
                    test = "Low precision"
                    low += 1
                else:
                    test = "Medium precision"
                    med += 1
        except:
                nullo +=1
                p_value = 0
                pass
        print "Chi square test:", p_value
    final = "<<<Risultato>>>",'pop=', tot, "; campione=", Nsample ,"Estrazioni=",n_extract, "High:",high," Med:",med," Low:",low, "nullo", nullo
    print final

###<a id="createtuples">Create tuples-test1.py</a>
> file per la generazione delle tuple con permutazione , con i seguenti valori di percentuale:

`v=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100];`

>[Source code for test1.py][8]

Il risultato delle 230230 tuple è memorizzato nel file OUTFILE.TXT
####<a id="createtuplesv1.0">V1.0</a>

> esempio BASH : **python filename.py 1**
> il valore 1 indica che il testmode=1, cioe' falso.
```
##################################################
#QUESTO FILE GENERA LE TUPLE DA UTILIZZARE NEL FILE
#DA ESEGUIRE UNA SOLA VOLTA
#IL FILE : OUTFILE.TXT
##################################################

import itertools;
v=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100];

import sys

# esempio BASH : python filename.py 1


#testmode=0
testmode=int(sys.argv[1])

target=100      #somma da raggiungere - test

if testmode==1:
    n_rep=3

else:
    n_rep=7

############### CREAZIONE TUPLA
combos = [a for a in itertools.product(v,repeat=n_rep) if sum(a)==target]
#############################

lista = []
pop = []

############### AGGIUNGE TUPLA ALLA LISTA
for e in combos:
        # print e     #indica la tupla
        lista.append(e)
#########################################

############### SCRIVE LISTA SU FILE OUTFILE.TXT
import pickle
f = file("outfile.txt", "w")
pickle.dump(lista, f)
#print lista
##################################

```
##Notes

####Chi square Goodness of fit: - python
1. [Articolo di un modulo python per chi square GOF][4]
2. [**How to Test Your Discrete Distribution- Minitab**] [5]
3. [Difference between scipy.stats.mstats.chisquare and
scipy.stats.mstats in Python][6]
1. [p VALUE - INTERPRETATION][9]
1. [How to generate numbers based on an arbitrary **discrete distribution**?][2]
2.	[ANOVA, Nonparametric Tests, and More Added to SVS via New **Python** Capabilities][3]

####PYTHON CODE. SUM and permutation
1. http://mathoverflow.net/questions/9477/uniquely-generate-all-permutations-of-three-digits-that-sum-to-a-particular-value
2. http://stackoverflow.com/questions/4632322/finding-all-possible-combinations-of-numbers-to-reach-a-given-sum
4. Utilizzo *LIST* in python: http://effbot.org/zone/python-list.htm
5. [Sample size for pilot studies](http://www.statstodo.com/SSiz2Props_Pgm.php)


####Algorithm
1. http://web.eecs.utk.edu/~vose/Publications/random.pdf
2. Bootstrap Python : https://github.com/cgevans/scikits-bootstrap
3. [wilcoxon](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wicoxon.html#scipy.stats.wilcoxon)
5. [mann withney](http://www.alglib.net/hypothesistesting/mannwhitneyu.php)


----------


[1]: https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/6_example.py?edit
[2]: http://stats.stackexchange.com/questions/26858/how-to-generate-numbers-based-on-an-arbitrary-discrete-distribution
[3]:[http://blog.goldenhelix.com/?p=595]
[4]: (http://statsmodels.sourceforge.net/stable/stats.html#goodness-of-fit-tests-and-measures)
[5]: http://blog.minitab.com/blog/adventures-in-statistics/how-to-test-your-discrete-distribution
[6]:(http://stackoverflow.com/questions/13913572/difference-between-scipy-stats-mstats-chisquare-and-scipy-stats-mstats-in-python)
[7]:https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/test4.py?edit
[8]:https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/test1.py?edit
[9]: (http://www2.lv.psu.edu/jxm57/irp/chisquar.html)
	#Chi square - python



	[toc]



	##Definizione del processo

	1. Generazione di una popolazione a partire da una tupla generica
	2. estrazione casuale di un campione di n_elementi dalla pop creata
	3. calcolo della frequenze attese e ossevarte in base alla tupla estratta
	4. calcolo del chi-quadrato con p_value associato e definizione dei livelli di accuratezza del risulato.

	+ State your conclusion in terms of your hypothesis.

	+ If the p value for the calculated chi-square is p > 0.05, accept your hypothesis. 'The deviation is small enough that chance alone accounts for it.

	+ A p value of 0.6, for example, means that *there is a 60% probability that any deviation from expected is due to chance only***. This is within the range of acceptable deviation.

	+ If the p value for the calculated chi-square is **p < 0.05, REJECT your hypothesis, and conclude that some factor other than chance** is operating for the deviation to be so great.

	+ For example, a p value of 0.01 means that there is only a 1% chance that this deviation is due to chance alone. Therefore, other factors must be involved.

	+ The chi-square test will be used to test for the "goodness to fit" between observed and expected data from several laboratory investigations in this lab manual.

	LINK (http://www2.lv.psu.edu/jxm57/irp/chisquar.html)


	##Esempio

	La tupla generica estratta è:

	` px [0.0, 0.05, 0.15, 0.1, 0.35, 0.15, 0.2]`

	Da una popolazione di 100,000 elementi , estraggo 1000 elementi:
	Quindi la freq. attesa assoluta (FA) è la seguente (totale = 1000 elementi):

	`fa_pre [50, 150, 100, 350, 150, 200]`

	Notare che il valore 0* è stato eliminato dal vettore, per [bug con il sistema di calcolo python][6]


	Quindi la freq. osservata assoluta (FO) è :

	` fo_pre [53, 155, 96, 332, 162, 202]`

	Il confronto fra FA e FO:

	`[ 50 150 100 350 150 200] --> FA`
	`[ 53 155 96 332 162 202] --> FO`


	Calcolo del pvalue: `Chi square test: 0.789628937062`

	A ciscun valore di p_value corrisponde un livello di accuratezza, come nella tabella sotto:

	Dalla elaborazione della [v1.1](#v11), appare come risultato quanto segue:

	###Risultato ottenuto con [CSTP](#cstp) v1.1

	BASH line$ in linux: `python test4.py 500 100000 1000 10 `


	` ('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 149, ' Med:', 201, ' Low:', 150, 'nullo', 0)
	`


	name \| label
	------------------------\|
	'pop=', 100000, \| 100k casi considerati come pop
	'campione=', 1000 \| 1000 casi estratti (sample)
	'Tuple estratte=' 500, \| 500 combinazioni di freschezza su 230k totali utilizzate nel processo
	' High:', 149, \| 149 su 500 combinazioni hanno dato p value alto (vedere [tabella](#ptable))
	' Med:', 201, \| 201 su 500 combinazioni hanno dato p value medio
	' Low:', 150, \| 150 su 500 combinazioni hanno dato p value medio
	'nullo', 0 \| da trascurare - solo per controllo

	Il processo è ripetuto n volte, con la stessa numerosità della poopolazione e del campione, ma con differenti tuple estratte.
	I risultati sono memorizzati nel file [SAMPLERESULTS_CHISQUARE.txt](https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/SAMPLERESULTS_CHISQUARE.txt?edit]).

	Un estratto del file :
	```
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 139, ' Med:', 221, ' Low:', 140, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 163, ' Med:', 177, ' Low:', 160, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 170, ' Med:', 179, ' Low:', 151, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 142, ' Med:', 209, ' Low:', 149, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 146, ' Med:', 195, ' Low:', 159, 'nullo', 0)
	('<<<Risultato>>>', 'pop=', 100000, '; campione=', 1000, 'Tuple estratte=', 500, 'High:', 146, ' Med:', 204, ' Low:', 150, 'nullo', 0)
	```

	###<a id="ptable">Tabella di referimento per il p_value</a>

	p value \| level
	--------- \| -----
	<0.30 \| low accuracy
	0.30-0.70 \| medium accuracy
	\>0.70 \| High accuracy


	----------
	## CODE
	### Create chi square test process [CSTP] <a id="cstp"></a>- test4.py
	> note: Chi Square Test Process = CSTP

	`CODICE per la generazione del pvalue CHI SQUARE`
	[source code ] [7]



	####<a id="v11">V1.1</a>
	> - introdotto il ciclo sulle ripetizioni del processo, param 4
	> - la lista delle tuple viene ri-estratta ogni ciclo
	> - save dei risultati in: [SAMPLERESULTS_CHISQUARE.txt](https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/SAMPLERESULTS_CHISQUARE.txt?edit])
	> - dipende dalle tuple create con [Cretate tuples v1.0](#createtuplesv1.0)

	```
	############################################
	# USAGE:
	# python test4.py 100 100000 5000
	# param1= numero estrazioni casuali di tuple
	# param2= num pop
	# param3= num sample
	# param4= num di ripetizioni del processo
	############################################
	import sys
	import random
	import collections
	import numpy
	#QUI SI APRE IL FILE CON LE COMBINAZIONI
	#################################
	lista=[]
	import pickle
	#f = file("test.txt", "r")
	f = file("outfile.txt", "r")
	lista=pickle.load(f)
	#############################
	#from scipy.stats import mannwhitneyu
	#from scipy.stats import chisquare
	#from scipy.stats import kruskal
	from scipy.stats import rv_discrete

	import scipy.stats.mstats as mst

	tuple_tot = 0
	acc=0
	high = 0
	low = 0
	med = 0
	nullo = 0
	p_value=0
	# a= random.sample(range(100), 100)
	n_extract = int(sys.argv[1]) #, indare numero di estrazioni casuali di tuple dalle permutazioni

	fa = []
	fa_pre = []
	fo_pre = []
	fo =[]
	tot=int(sys.argv[2])
	Nsample = int(sys.argv[3])

	for i in xrange(int(sys.argv[4])):
	lista = random.sample(lista,n_extract)
	acc=0
	high = 0
	low = 0
	med = 0
	nullo = 0
	p_value=0
	for e in lista:

	print e

	#px=[float(e[0]/100),float(e[1]/100),float(e[2]/100),float(e[3]/100),float(e[4]/100),float(e[5]/100),float(e[6]/100)] # 7 rep]

	e1= ((e[0]/float(100)))
	e2= ((e[1]/float(100)))
	e3= ((e[2]/float(100)))
	e4= ((e[3]/float(100)))
	e5= ((e[4]/float(100)))
	e6= ((e[5]/float(100)))
	e7= ((e[6]/float(100)))


	px=[e1,e2,e3,e4,e5,e6,e7] # 7 rep]
	print "px", px
	x =[1,2,3,4,5,6,7]
	fa_pre=([int(e1Nsample),int(e2Nsample),int(e3Nsample),int(e4Nsample),int(e5Nsample),int(e6Nsample),int(e7*Nsample)])
	fa_pre[:] = (value for value in fa_pre if value != 0.0)
	print "fa_pre",fa_pre
	fa= numpy.array(fa_pre)
	#fa = numpy.array([int(e1Nsample),int(e2Nsample),int(e3Nsample),int(e4Nsample),int(e5Nsample),int(e6Nsample),int(e7*Nsample)])

	pop = rv_discrete(values=(x, px)).rvs(size=tot)
	random.shuffle(pop)
	sample1 = random.sample(pop,Nsample)
	counter_sample=collections.Counter(sample1)
	fo_pre = [counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]]
	fo_pre[:] = (value for value in fo_pre if value != 0.0)
	print "fo_pre",fo_pre
	fo = numpy.array(fo_pre)
	#fo = numpy.array([counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]])
	print fa
	print fo
	#print mst.chisquare(fo, fa)

	#print(samples)
	try:
	u, p_value = mst.chisquare(fo, fa)
	#u, p_value = 2*mannwhitneyu(pop, sample1)
	#u, p_value = kruskal(pop, sample1)
	if p_value > 0.70:
	acc += 1
	test = "HIGH precision"
	high += 1
	else:
	if p_value < 0.30:
	test = "Low precision"
	low += 1
	else:
	test = "Medium precision"
	med += 1
	except:
	nullo +=1
	p_value = 0
	pass
	print "Chi square test:", p_value
	final = "<<<Risultato>>>",'pop=', tot, "; campione=", Nsample ,"Estrazioni=",n_extract, "High:",high," Med:",med," Low:",low, "nullo", nullo
	print final
	log = open("SAMPLERESULTS_CHISQUARE.txt","a")
	print >> log,final
	log.close()

	```

	####v 1.0



	############################################
	# USAGE:
	# python test4.py 100 100000 5000
	# param1= replicazioni test, casuali tuple
	# param2= num pop
	# param3= num sample
	############################################
	import sys
	import random
	import collections
	import numpy

	#QUI SI APRE IL FILE CON LE COMBINAZIONI
	#################################
	lista=[]
	import pickle
	#f = file("test.txt", "r")
	f = file("outfile.txt", "r")
	lista=pickle.load(f)
	#############################

	#from scipy.stats import mannwhitneyu
	#from scipy.stats import chisquare
	#from scipy.stats import kruskal
	from scipy.stats import rv_discrete

	import scipy.stats.mstats as mst

	tuple_tot = 0
	acc=0
	high = 0
	low = 0
	med = 0
	nullo = 0
	p_value=0
	# a= random.sample(range(100), 100)
	n_extract = int(sys.argv[1]) #, indare numero di estrazioni casuali di tuple dalle permutazioni
	lista = random.sample(lista,n_extract)
	fa = []
	fa_pre = []
	fo_pre = []
	fo =[]
	for e in lista:
	tot=int(sys.argv[2])
	Nsample = int(sys.argv[3])
	print e

	#px=[float(e[0]/100),float(e[1]/100),float(e[2]/100),float(e[3]/100),float(e[4]/100),float(e[5]/100),float(e[6]/100)] # 7 rep]

	e1= ((e[0]/float(100)))
	e2= ((e[1]/float(100)))
	e3= ((e[2]/float(100)))
	e4= ((e[3]/float(100)))
	e5= ((e[4]/float(100)))
	e6= ((e[5]/float(100)))
	e7= ((e[6]/float(100)))


	px=[e1,e2,e3,e4,e5,e6,e7] # 7 rep]
	print "px", px
	x =[1,2,3,4,5,6,7]
	fa_pre=([int(e1Nsample),int(e2Nsample),int(e3Nsample),int(e4Nsample),int(e5Nsample),int(e6Nsample),int(e7*Nsample)])
	fa_pre[:] = (value for value in fa_pre if value != 0.0)
	print "fa_pre",fa_pre
	fa= numpy.array(fa_pre)
	#fa = numpy.array([int(e1Nsample),int(e2Nsample),int(e3Nsample),int(e4Nsample),int(e5Nsample),int(e6Nsample),int(e7*Nsample)])

	pop = rv_discrete(values=(x, px)).rvs(size=tot)
	random.shuffle(pop)
	sample1 = random.sample(pop,Nsample)
	counter_sample=collections.Counter(sample1)
	fo_pre = [counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]]
	fo_pre[:] = (value for value in fo_pre if value != 0.0)
	print "fo_pre",fo_pre
	fo = numpy.array(fo_pre)
	#fo = numpy.array([counter_sample[1],counter_sample[2],counter_sample[3],counter_sample[4],counter_sample[5],counter_sample[6],counter_sample[7]])
	print fa
	print fo
	#print mst.chisquare(fo, fa)

	#print(samples)
	try:
	u, p_value = mst.chisquare(fo, fa)
	#u, p_value = 2*mannwhitneyu(pop, sample1)
	#u, p_value = kruskal(pop, sample1)
	if p_value > 0.70:
	acc += 1
	test = "HIGH precision"
	high += 1
	else:
	if p_value < 0.30:
	test = "Low precision"
	low += 1
	else:
	test = "Medium precision"
	med += 1
	except:
	nullo +=1
	p_value = 0
	pass
	print "Chi square test:", p_value
	final = "<<<Risultato>>>",'pop=', tot, "; campione=", Nsample ,"Estrazioni=",n_extract, "High:",high," Med:",med," Low:",low, "nullo", nullo
	print final

	###<a id="createtuples">Create tuples-test1.py</a>
	> file per la generazione delle tuple con permutazione , con i seguenti valori di percentuale:

	`v=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100];`

	>[Source code for test1.py][8]

	Il risultato delle 230230 tuple è memorizzato nel file OUTFILE.TXT
	####<a id="createtuplesv1.0">V1.0</a>

	> esempio BASH : python filename.py 1
	> il valore 1 indica che il testmode=1, cioe' falso.
	```
	##################################################
	#QUESTO FILE GENERA LE TUPLE DA UTILIZZARE NEL FILE
	#DA ESEGUIRE UNA SOLA VOLTA
	#IL FILE : OUTFILE.TXT
	##################################################

	import itertools;
	v=[0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100];

	import sys

	# esempio BASH : python filename.py 1



	#testmode=0
	testmode=int(sys.argv[1])

	target=100 #somma da raggiungere - test

	if testmode==1:
	n_rep=3

	else:
	n_rep=7

	############### CREAZIONE TUPLA
	combos = [a for a in itertools.product(v,repeat=n_rep) if sum(a)==target]
	#############################

	lista = []
	pop = []

	############### AGGIUNGE TUPLA ALLA LISTA
	for e in combos:
	# print e #indica la tupla
	lista.append(e)
	#########################################

	############### SCRIVE LISTA SU FILE OUTFILE.TXT
	import pickle
	f = file("outfile.txt", "w")
	pickle.dump(lista, f)
	#print lista
	##################################

	```
	##Notes

	####Chi square Goodness of fit: - python
	1. [Articolo di un modulo python per chi square GOF][4]
	2. [How to Test Your Discrete Distribution- Minitab] [5]
	3. [Difference between scipy.stats.mstats.chisquare and
	scipy.stats.mstats in Python][6]
	1. [p VALUE - INTERPRETATION][9]
	1. [How to generate numbers based on an arbitrary discrete distribution?][2]
	2. [ANOVA, Nonparametric Tests, and More Added to SVS via New Python Capabilities][3]

	####PYTHON CODE. SUM and permutation
	1. http://mathoverflow.net/questions/9477/uniquely-generate-all-permutations-of-three-digits-that-sum-to-a-particular-value
	2. http://stackoverflow.com/questions/4632322/finding-all-possible-combinations-of-numbers-to-reach-a-given-sum
	4. Utilizzo LIST in python: http://effbot.org/zone/python-list.htm
	5. [Sample size for pilot studies](http://www.statstodo.com/SSiz2Props_Pgm.php)


	####Algorithm
	1. http://web.eecs.utk.edu/~vose/Publications/random.pdf
	2. Bootstrap Python : https://github.com/cgevans/scikits-bootstrap
	3. [wilcoxon](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wicoxon.html#scipy.stats.wilcoxon)
	5. [mann withney](http://www.alglib.net/hypothesistesting/mannwhitneyu.php)


	----------


	[1]: https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/6_example.py?edit
	[2]: http://stats.stackexchange.com/questions/26858/how-to-generate-numbers-based-on-an-arbitrary-discrete-distribution
	[3]:[http://blog.goldenhelix.com/?p=595]
	[4]: (http://statsmodels.sourceforge.net/stable/stats.html#goodness-of-fit-tests-and-measures)
	[5]: http://blog.minitab.com/blog/adventures-in-statistics/how-to-test-your-discrete-distribution
	[6]:(http://stackoverflow.com/questions/13913572/difference-between-scipy-stats-mstats-chisquare-and-scipy-stats-mstats-in-python)
	[7]:https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/test4.py?edit
	[8]:https://www.pythonanywhere.com/user/cristiano74/files/home/cristiano74/mysite/test1.py?edit
	[9]: (http://www2.lv.psu.edu/jxm57/irp/chisquar.html)