ibartomeus/pref

## pref
---
title: "Preferring a preference index"
author: "I. Bartomeus"
output: html_document
---

I've been reading about preference indexes lately, speciphically for characterizing pollinator preferences for plants, so here is what I learnt. Preference is defined as using an item (e.g. plant) more than expected given the item abundance.

First I like to use a quantitative framework (you can use ranks-based indices as in Williams et al 2011, which has nice propiertiest too). The simpliest quantitative index is the forage ratio:

```{r}
#obs: observed frequency of visits to item 1
#exp: expected frequency of visits to item 1 (i.e. resource abundance/availability)
fr <- function(obs, exp){
    p_obs <- obs/sum(obs)
    p_exp <- exp/sum(exp)
    out <- p_obs/p_exp
    out
}
```

But it ranges from 0 to infinity, which makes it hard to interpret.

Second, the most used index is "Electivity" (Ivlev 1961), which is nice because is bounded between -1 and 1. "Jacobs" index is similar, but correcting for item depletion, which do not apply to my question, but see below for completness.

```{r}
electicity <- function(obs, exp){
    p_obs <- obs/sum(obs)
    p_exp <- exp/sum(exp)
    out <- (p_obs-p_exp)/(p_obs+p_exp)
}

jacobs <- function(obs, exp){
    p_obs <- obs/sum(obs)
    p_exp <- exp/sum(exp)
    out <- (p_obs-p_exp)/(p_obs+p_exp-2*p_obs*p_exp)
    out
}
```

However, those indexes do not tell you if pollintors have general preferences over several items, and which of this items are preferred or not. To solve that we can use a simple chi test. Chi statistics can be used to assess if there is an overall preference. The Chi test Pearson residuals (obs-exp/√exp) estimate the magnitude of preference or avoidance for a given item based on deviation from expected values. Significance can be assessed by building Bonferroni confidence intervals (see Neu et al. 1974 and Beyers and Steinhorst 1984). That way you know which items are preferred or avoided.

```{r}
chi_pref <- function(obs, exp, alpha = 0.05){
    chi <- chisq.test(obs, p = exp, rescale.p = TRUE)
    print(chi) #tells you if there is an overall preference. (sig = pref)
    res <- chi$residuals
    #res <- (obs-exp)/sqrt(exp) #hand calculation, same result.
    #calculate bonferoni Z-statistic for each plant.
    alpha <- alpha
    k <- length(obs)
    n <- sum(obs)
    p_obs <- obs/n
    ak <- alpha/(2*k)
    Zak <- abs(qnorm(ak))
    low_interval <- p_obs - (Zak*(sqrt(p_obs*(1-p_obs)/n)))
    upper_interval <- p_obs + (Zak*(sqrt(p_obs*(1-p_obs)/n)))
    p_exp <- exp/sum(exp)
    sig <- ifelse(p_exp >= low_interval & p_exp <= upper_interval, "ns", "sig")
    plot(c(0,k+1), c(min(low_interval),max(upper_interval)), type = "n",
         ylab = "Preference", xlab = "items", las = 1)
    arrows(x0 = c(1:k), y0 = low_interval, x1 = c(1:k), y1 = upper_interval, code = 3
           ,angle = 90)
    points(p_exp, col = "red")
    out <- data.frame(chi_test_p = rep(chi$p.value, length(res)),
                      chi_residuals = res, sig = sig)
    out
}
```

And we wrap up all indexes...

```{r}
#wrap up for all indexes
preference <- function(obs, exp, alpha = 0.05){
    f <- fr(obs, exp)
    e <- electicity(obs, exp)
    #j <- jacobs(obs, exp)
    c <- chi_pref(obs, exp, alpha = alpha)
    out <- data.frame(exp = exp, obs = obs, fr = f, electicity = e, c)
    out
}
```

It works well for preferences among items with similar availability, and when all are used to some extent. The plot shows expected values in red, and the observed confidence interval in black. If is not overlapping the expected, indicates a significant preference.

```{r}
x <- preference(obs = c(25, 22,30,40), exp = c(40,12,12,53))
x
#pairs(x[,c(-5,-7)])
```

We can see that indices are correlated by simulating lots of random values.

```{r}
x <- preference(obs = sample(c(0:100),100), exp = sample(c(0:100),100))
#x
pairs(x[,c(-5,-7)]) #more or less same patern, across indexes.
```

But the indexes do not behave well in two situations. First, when an item is not used, it is significanly un-preferred regardless of its abundance. I don't like that, because a very rare plant is expected to get 0 visits from a biological point of view. This is an issue when building bonferroni intervals around 0, which are by definition 0, and any value different from 0 appears as significant.


```{r}
x <- preference(obs = c(0,25,200), exp = c(0.00003,50,100))
x
```

The second issue is that if you have noise due to sampling (as always), rare items are more likely to be wrongly assessed than common items.

```{r}
#assume no preferences
x <- preference(obs = c(5,50,100), exp = c(5,50,100))
x
#now assume some sampling error of +-4 observations
x <- preference(obs = c(1,54,96), exp = c(5,50,100))
x
```

As a conclusion, If you have all items represented (not highly uneven distributions) or all used to some extent (not 0 visits) this is a great tool.

Happy to know better options in the comments, if available.

*Refs:*
-Williams et al 2011. Basic and Applied Ecology 12, 332-341.
-Neu, C.W., C.R. Byers, and J.M. Peek. 1974. A technique for analysis of utilization‐
availability data. Journal of Wildlife Management 38:541‐545.
-Beyers, C.R., and R.K. Steinhorst. 1984. Clarification of a technique for analysis of
utilization‐availability data. Journal of Wildlife Management 48:1050‐1053.
-Ivlev 1961 Experimental ecology of the feeding of fishes.yale University Press. CN, USA.
	---
	title: "Preferring a preference index"
	author: "I. Bartomeus"
	output: html_document
	---

	I've been reading about preference indexes lately, speciphically for characterizing pollinator preferences for plants, so here is what I learnt. Preference is defined as using an item (e.g. plant) more than expected given the item abundance.

	First I like to use a quantitative framework (you can use ranks-based indices as in Williams et al 2011, which has nice propiertiest too). The simpliest quantitative index is the forage ratio:

	```{r}
	#obs: observed frequency of visits to item 1
	#exp: expected frequency of visits to item 1 (i.e. resource abundance/availability)
	fr <- function(obs, exp){
	p_obs <- obs/sum(obs)
	p_exp <- exp/sum(exp)
	out <- p_obs/p_exp
	out
	}
	```

	But it ranges from 0 to infinity, which makes it hard to interpret.

	Second, the most used index is "Electivity" (Ivlev 1961), which is nice because is bounded between -1 and 1. "Jacobs" index is similar, but correcting for item depletion, which do not apply to my question, but see below for completness.

	```{r}
	electicity <- function(obs, exp){
	p_obs <- obs/sum(obs)
	p_exp <- exp/sum(exp)
	out <- (p_obs-p_exp)/(p_obs+p_exp)
	}

	jacobs <- function(obs, exp){
	p_obs <- obs/sum(obs)
	p_exp <- exp/sum(exp)
	out <- (p_obs-p_exp)/(p_obs+p_exp-2p_obsp_exp)
	out
	}
	```

	However, those indexes do not tell you if pollintors have general preferences over several items, and which of this items are preferred or not. To solve that we can use a simple chi test. Chi statistics can be used to assess if there is an overall preference. The Chi test Pearson residuals (obs-exp/√exp) estimate the magnitude of preference or avoidance for a given item based on deviation from expected values. Significance can be assessed by building Bonferroni confidence intervals (see Neu et al. 1974 and Beyers and Steinhorst 1984). That way you know which items are preferred or avoided.

	```{r}
	chi_pref <- function(obs, exp, alpha = 0.05){
	chi <- chisq.test(obs, p = exp, rescale.p = TRUE)
	print(chi) #tells you if there is an overall preference. (sig = pref)
	res <- chi$residuals
	#res <- (obs-exp)/sqrt(exp) #hand calculation, same result.
	#calculate bonferoni Z-statistic for each plant.
	alpha <- alpha
	k <- length(obs)
	n <- sum(obs)
	p_obs <- obs/n
	ak <- alpha/(2*k)
	Zak <- abs(qnorm(ak))
	low_interval <- p_obs - (Zak(sqrt(p_obs(1-p_obs)/n)))
	upper_interval <- p_obs + (Zak(sqrt(p_obs(1-p_obs)/n)))
	p_exp <- exp/sum(exp)
	sig <- ifelse(p_exp >= low_interval & p_exp <= upper_interval, "ns", "sig")
	plot(c(0,k+1), c(min(low_interval),max(upper_interval)), type = "n",
	ylab = "Preference", xlab = "items", las = 1)
	arrows(x0 = c(1:k), y0 = low_interval, x1 = c(1:k), y1 = upper_interval, code = 3
	,angle = 90)
	points(p_exp, col = "red")
	out <- data.frame(chi_test_p = rep(chi$p.value, length(res)),
	chi_residuals = res, sig = sig)
	out
	}
	```

	And we wrap up all indexes...

	```{r}
	#wrap up for all indexes
	preference <- function(obs, exp, alpha = 0.05){
	f <- fr(obs, exp)
	e <- electicity(obs, exp)
	#j <- jacobs(obs, exp)
	c <- chi_pref(obs, exp, alpha = alpha)
	out <- data.frame(exp = exp, obs = obs, fr = f, electicity = e, c)
	out
	}
	```

	It works well for preferences among items with similar availability, and when all are used to some extent. The plot shows expected values in red, and the observed confidence interval in black. If is not overlapping the expected, indicates a significant preference.

	```{r}
	x <- preference(obs = c(25, 22,30,40), exp = c(40,12,12,53))
	x
	#pairs(x[,c(-5,-7)])
	```

	We can see that indices are correlated by simulating lots of random values.

	```{r}
	x <- preference(obs = sample(c(0:100),100), exp = sample(c(0:100),100))
	#x
	pairs(x[,c(-5,-7)]) #more or less same patern, across indexes.
	```

	But the indexes do not behave well in two situations. First, when an item is not used, it is significanly un-preferred regardless of its abundance. I don't like that, because a very rare plant is expected to get 0 visits from a biological point of view. This is an issue when building bonferroni intervals around 0, which are by definition 0, and any value different from 0 appears as significant.


	```{r}
	x <- preference(obs = c(0,25,200), exp = c(0.00003,50,100))
	x
	```

	The second issue is that if you have noise due to sampling (as always), rare items are more likely to be wrongly assessed than common items.

	```{r}
	#assume no preferences
	x <- preference(obs = c(5,50,100), exp = c(5,50,100))
	x
	#now assume some sampling error of +-4 observations
	x <- preference(obs = c(1,54,96), exp = c(5,50,100))
	x
	```

	As a conclusion, If you have all items represented (not highly uneven distributions) or all used to some extent (not 0 visits) this is a great tool.

	Happy to know better options in the comments, if available.

	Refs:
	-Williams et al 2011. Basic and Applied Ecology 12, 332-341.
	-Neu, C.W., C.R. Byers, and J.M. Peek. 1974. A technique for analysis of utilization‐
	availability data. Journal of Wildlife Management 38:541‐545.
	-Beyers, C.R., and R.K. Steinhorst. 1984. Clarification of a technique for analysis of
	utilization‐availability data. Journal of Wildlife Management 48:1050‐1053.
	-Ivlev 1961 Experimental ecology of the feeding of fishes.yale University Press. CN, USA.