Friday, January 20, 2012

Integrating Detection Probabilities

I guess it has been over a month since I have provided an update on my Goshawk research. That doesn't mean that progress isn't being made. In fact, quite the contrary is true. For those new here, I study breeding season diet and prey abundance influences on the Northern Goshawk in a small isolated forest in southern Idaho. By choosing this filter, you will see all of my previous posts on the first year of my study (stories, photos, videos, etc.). I am now preparing for my second field season this coming May.

In December I finished and submitted my annual report to the Forest Service. This was a major accomplishment and its great to have that behind me. As I mentioned in my previous post I have been studying Hierarchical Bayesian Statistics on my own to rework my analysis to account for detection probabilities and to deal with underlying variance in my prey abundance estimates. This has been a hard path to take. After trying to force my way through it, I have decided to step back and work through a textbook end-to-end including all of the exercises. This is intended to increase my overall confidence in the method and ensure I can adequately explain it in my thesis. The textbook I have chosen is Doing Bayesian Data Analysis: A Tutorial with R and BUGS. Don't the puppies on the cover make it look easy!?! I am now on chapter five and I can honestly saw that this is an excellent book. Even though I have a math degree, it has indeed been over 24 years since I have used calculus. This book is written in relatively easy to understand english with just the right amount of math mixed in.

While working on that exercise, I have continued to apply the concepts to my data analysis. I have made significant progress on this front, however I won't know how correct my analysis is until later. Regardless, I have decided to explain what I have done and what the PRELIMINARY results look like. This will provide a basis for asking the experts to take a look over my work and provide me feedback.

I summarized the first step of this Bayesian re-analysis in a previous post. Essentially the first step was to perform the same analysis which I previously completed using Logistic Regression, but this time using Bayesian approaches. I was successful in accomplishing this task. The results continued to show that avian prey abundance and mammalian prey abundance were important predictors of nest occupancy, but only avian abundance was significant. However, the model including both avian and mammalian prey abundance still fit the data better than the model with only avian prey abundance included. This is why I can say that mammalian abundance was important even though it was not significant.

So the next major step in the analysis, and one of the reasons I am shifting to Bayesian statistics is to include detection probabilities to deal with the issue of false negatives. Essentially the analysis up to this point has assumed that if I found a goshawk on a nest then that territory was occupied - safe assumption - but if I did not find a goshawk then the site was not occupied - a very unsafe assumption. By integrating the probability of detecting an occupied nest into the analysis, I am essentially allowing each of those "unoccupied" territories to instead be "possibly occupied, but not detected". The really cool part is that you can actually do this, through some really fancy calculus. The inclusion of detection probability can have a dramatic impact on the results. It has been shown to have a huge impact on abundance measurements, but my study is not looking into abundance. However, it is still applicable to my study and could strengthen or dilute the effect of predictor variables such as avian or mammalian prey abundance.

Detection probability is usually determined from intrinsic factors in the data. Ideally surveys would be repeated up to three times. These repeat visits provide a basis for the imperfect detection when the animal is detected on one visit, but not another. However, repeat visits are very expensive. Last season we had over 450 survey points. We lacked the time or money to perform repeat visits. My approach to address this took two steps. First, there were two nests where we had broadcast a goshawk alarm call within 200m of the nest and did not receive a response. We later discovered the nest. In these cases I have counted the first visit as no detection and the second visit as a detection. I set all other second visits to NA (no second visit performed). Here is a sample of what the observation table might look like.

NestVisit 1Visit 2
...
...
...
Nest 13
1
NA
Nest 14
0
1
Nest 15
1
NA
Nest 16
0
NA
...
...
...

The Bayesian approach is robust to unbalanced sampling designs such as this. In addition to this minimal data from my surveys, my Bayesian models specify "prior" knowledge of detection probabilities. One of the strengths and one of the criticisms of the Bayesian approach is that you can include "prior" knowledge that influences the results. Those who use frequentist statistics often use this argument against Bayesian results. However, the frequentist approach does have "prior" knowledge built into the assumptions which in most ecological studies, are not fully met!

I know of three other goshawks studies using the same search protocol that I used (it's a Forest Service standard). These studies have found the detection probabilities to be 0.54, 0.75, and 0.90 for single visit surveys. I included in my model the "prior" knowledge that the detection probability was uniformly distributed between 0.5 and 0.9. In other words, it has an equal chance of being any value between 0.5 and 0.9. With this "prior" knowledge and my limited repeat visit datapoints, I should get a new estimate of the detection probability for my study and an estimate of what we actually missed.

First, it is time to perform model selection. Running all of the models that I previously used in Logistics Regression, but this time in a Bayesian analysis, the top rated model was the model including avian prey abundance but not mammalian prey abundance or the percent of the territory which is forested. So, detection probability did have an impact on model selection! Looking into the model results, avian prey abundance remained a significant predictor of nest occupancy (Figure 1). Significance is indicated by the 95% High Density Interval (HDI) not overlapping zero. The High Density Interval is roughly analogous to the frequentist 95% confidence interval.

Figure 1: Posterior distribution of model coefficient for avian prey abundance.

The combination of the "prior" for probability detection and my data produce a posterior estimate for the probability of detection for my survey of 67.8% (Figure 2).

Figure 2: Posterior distribution for Probability of Detection.

The real unique result comes from the estimate of how many occupied nests there really were. The sampling approach used to solve these problems essentially produced 50,000 estimates of all of the possible values, each constrained by the model and the priors. The result is a distribution of those estimates which provide a credible probability of the true value given the priors and the data. Figure 3 illustrates the estimated number of occupied territories out of the original set of 24 historical goshawk territories. Hence there is a 95% chance that there were 11 to 16 occupied territories out of 24. The expected value is 12 nests (mode) or 13 nests (mean). The "observed" value, those that we actually found, was 10. The model predicts that we most likely missed three occupied territories. Had you asked me prior to this analysis, I would have guessed that we missed two. I am not surprised that it could be as high as six.

Figure 3: Posterior distribution of estimated occupied nests within 24 historical territories.

Now it's back to the textbook to improve my comprehension and confidence with all that I have explained here. The next step with this analysis will be to integrate the uncertainty of my prey abundance estimates into this model.

Writing this post has helped me clarify which concepts I fully grasp and which still need some work. If you are still reading, thanks for sticking around, I appreciate it. Be sure to let me know if you have any feedback.

No comments: