Monday, August 22, 2011

False Negatives

I continue to analyze the data gathered in my first field season studying Northern Goshawks in the Sawtooth National Forest as a part of my Masters program in Raptor Biology. This is a continuation of my first post on the analysis process. I actually received requests from MULTIPLE people asking me to continue blogging on the analysis phase of my project. So, hold on tight and stick with me, I will try to make this as painless as possible!

First nest discovered with nestlings!

In the first post, I illustrated using straight-forward confidence intervals that avian abundance was a significant predictor for goshawk nest occupancy and success. Here I will highlight another more sophisticated approach to arrive at a very similar answer. Then I will discuss the problem with these two approaches and why these are no longer valid for this particular case and l talk a little bit about where I will be taking it from here.

In the previous post I introduced the concept of AIC based model selection. I mentioned that in ecological studies it is now a foundational expectation for data analysis. Here I will illustrate analyzing nest occupancy and success using AIC model selection.

The first step is to hypothesize which variables should be predictors for nest occupancy and success. An important constraint is that you need to limit the number of possible explanatory variables, based on the sample size. There are different rules of thumb, but one is that you must have 10 samples for each variable, some say five. I am going to stretch this a bit and try three variables for 24 samples. I predict that Avian prey abundance and Mammalian prey abundance, are significant predictors. I also want to include the percent of the territory in sagebrush habitat (versus forested habitat) as I believe that has affected my abundance measurements. The first step is to ensure that none of the predictor variables are overly correlated. For this I will set a threshold for the correlation coefficient greater than 0.70.

Graph of relationships with correlation coefficients.

The graph indicates a low amount of correlation between the predictor variables (0.24, -0.29, 0.34). This was a surprise as I expected the mammalian abundance to be highly correlated to the amount of sagebrush habitat (seemed that we saw more mammals in open habitat). But, low correlation coefficients are good. I have then calculated the AIC values (actually AICc values which are AIC values adjusted for small sample size) for each of the eight combinations of these three variables as predictors for nest success, including the null model (no predictor). There are ranked based on the lowest AICc values. This analysis indicates that the model using mammalian abundance and avian abundance best explains the data, although the model using only avian abundance is very close (within 2 AICc can be assumed roughly equivalent). There are various approaches to moving forward from here: 1. Use the top model; 2. "Model Average" those within 2 AICc; 3. "Model Average" those within 4 AICc; 4. "Model Average" all eight models. Some have even recommended model averaging within 10 AICc. Here I will simply take the top model, create coefficient estimates, and 95% confidence intervals for the coefficients. If the confidence intervals do not include 0, then we can assume that predictor to have a significant influence.

From this we can take away that a model of both Mammalian prey abundance and Avian prey abundance best predicts nest occupancy and success, and that the influence of Avian prey abundance is significant (confidence interval does not overlap with zero). Hmm, that was the same answer as the previous analysis!

The Problem

But, there is a problem with both of these analyses. They both have an issue of "False Negatives". In these analyses, I had 8 successful nests and 16 failed or unoccupied nests. The issue is that I don't know for sure that there was not a successful nest in one of the 16 territories that we failed to discover. The "No Nest Detected" territories could have one of three values: 1. There was no occupied nest there; 2. There was a nest there but it failed before we detected it; 3. There was a nest there and it was successful, but we failed to detect it. Of course, this last category is very troublesome. For large conspicuous creatures, you might assume this probability is very, very low. For goshawks it is not low. We have walked right under nests without seeing them. We know this as we discovered it later.

Rosenstock et al. (2002) analyzed 224 papers published in nine different journals between 1989 and 1998 and found that only 13% of these studies acknowledged and addressed the case of false negative detections. The bar for publication is clearly higher today.

The Solution

The solution to this issue is to include the detection probability into the analysis. Essentially, instead of saying a territory is "not occupied", we say the territory has a X% probability of being occupied and successful. The whole statistical approach changes. The challenge is determining whether the detection probability is constant, what it is, and what it depends upon. There are a number of ways to generate this, which I am still investigating. At a minimum I will use the values provided by Woodbridge and Hargis (2006), who have analyzed the discovery method I utilized in the field and determined it has a 90% detection rate. I should say that having used the methodology in the field, I believe that they are being quite optimistic!

That's it for now. Thanks for sticking with me. More to come. I welcome your feedback.

No comments: