Sunday, July 31, 2011

Plucking the data for the verdict

The data analysis has begun! Now that my first field season of studying Northern Goshawks in the Sawtooth National Forest is complete, I have started to analyze the data. This will involve watching 3 months worth of video and performing many elaborate statistical procedures. The first analysis I have chosen to focus on is determining the prey abundance in each of the Northern Goshawk territories. These values will be used in further calculations in future steps of my analysis. Here I present what I hope is a simplified description of the process. Don't be frightened, I will try to make it as straight forward as possible.

For background on this project, these links to previous posts might be useful:

One of my key thesis questions is whether or not prey abundance influences Northern Goshawk nest occupancy and success. Further, which prey items have the greatest influence. The first step, and the step summarized here, is focused on prey abundance. This data was gathered by performing linear transect surveys within each historic Northern Goshawk territory. As each survey is performed, we logged the perpendicular distance of the prey item from the transect "line" by measuring it with a rangefinder. Prey could be detected by sight or sound. My field partner Lauren and I walked 96 transects, each 750 meters long, noting each prey item we detected and its distance from the line. Each goshawk territory had four of these transects randomly placed within its bounds. We tried our best to spread the surveys out over the nine weeks of the project, but late access to a number of territories prevented this. We also attempted to ensure that each of us performed two surveys in each territory. Scheduling also prevented this, but we were able to ensure that we each of us covered at least one of the surveys in each territory. This procedure resulted in 507 observations of prey items along with a number of non-prey items - coyotes, deer, elk, moose, cows, Red-tailed Hawks, Common Ravens, etc.

The theory behind this process assumes that it is easy to detect prey items on the line, but "detection rate" decreases with distance from the line. 0 meters is easier than 20 meters, which is easier than 40m, etc. Once a full set of data is collected, you would expect more detections near the line, dropping off with distance from the line. In fact the method assumes that the detection rate is 100% on the line and drops off from there. 100% detection on the line is rarely possible, but the procedures are fairly robust against some violation of this assumption. Mathematically a function can be applied to the data which models the observations recorded and this can be used to estimate abundance within a given area. Don't worry if you are a little lost. It's complicated, but should be more clear as I present the results below.

I am using a more sophisticated approach within the process which allows for the inclusion of covariates. Covariates consist of any conditions which might change the detection of the animals. The method I used allows for covariates to influence the scope of the detection function, but not the shape. In my case, this assumption is valid. An example of a covariate is the day of the year. We would expect some prey to be more or less abundant or more or less easily detected as the season progresses. Thus, the day of the year the individual survey was performed is probably the most common covariate used in this type of study. Time of day is a second covariate I am evaluating. It seemed like some prey was more easily detected later in the day. Who performed the survey is an important factor. Lauren and I have different experiences, knowledge, and recognition of prey species. Thus, the method allows for adjustments to be made based on who performed which survey. For example, I emphasized strongly to Lauren to "mind the line". In other words, she should devote a large amount of focus on detecting all animals on the line and less away from the line. It is clear from the data that she did this. I, however, have lower detection rate on the line than 10-20 meters away. Ooops. The great news is that our two surveys combined together look great! There are other biases influenced by who performed the survey. For example, I detected Green-tailed Towhees by sight and by call, but not by song. Lauren detected by sight only. The procedures are robust in handling these differences as long as who performed the survey is considered as a covariate. The 4th covariate I am evaluating was recorded on each observation as to whether the animal was seen or heard. The data show what you might expect that observing animals by sight is limited to those closer to the line, where observing by sound extends out much more evenly to about 50 meters. The last covariate deals with the distance of the survey that was performed in open sage/grass versus in forested habitat. Remember the surveys were laid out randomly across the landscape. This one is tricky as we were often walking in sage by a forest stand where the bird was observed. We did not record the habitat of each observation, only the structure of the transect. Next year I may collect more detailed habitat information.

After entering all of the data, the first step in the process is referred to as exploratory data analysis (Thomas et al. 2010). In this process, the data is checked against the assumptions and the covariates are evaluated for importance. This used a fairly complicated model selection procedure called Akaike Information Criterion (AIC). This method has become the standard in ecological research and is one I used in my undergraduate research (Miller et al. 2011). The good news is that unless performing the research you don't have to understand the method, only how to interpret the results. The process is to calculate an AIC value for each combination (or only the relevant combinations) of covariates. In our case, all of the covariates appeared to be relevant. Five covariates produced 32 combinations including the model with no covariates. The AIC score represent how well that model or combination of covariates fit the data that was recorded. Essentially choose the model with the lowest AIC value. Sometimes you could choose a model with an AIC within 2 points which includes fewer covariates (within 2 points is considered roughly equivalent. Generally fewer covariates are better, but AIC does penalize the use of too many, so most people just choose the lowest AIC. I should note the AIC value is meaningless except for comparison between models using the EXACT SAME data. Blah, blah, blah, I hope that wasn't too much theory...

So after entering all of the data, I started the model selection for the covariates. I have chosen to process birds and mammals separately as they are generally detected in different ways, are detected at different distances, and are of different size and thus could have different impact on the diet of goshawks. Also note that we are only including prey items for goshawks. We did not count small warblers of which a nestling goshawk would have to eat 40-50 a day. While goshawks may occasionally eat small prey, that prey is not likely to have an important impact on nest success. For birds we included robins, woodpeckers, doves, tanagers, towhees, bluebirds, blackbirds, and grouse. Mammals included ground squirrels and chipmunks. The following table shows the mammal output for the top models. Note the smallest AIC is at the top. Most AIC tables, like this one, include a delta AIC which performs the math for you. The top model is 0 and the delta AIC indicates the AIC difference from the top model.

Name # params Delta AIC AIC
Mammal Time Who 3 0.00 393.12
Mammal Julian Time Who 4 0.93 394.05
Mammal Time Who Seen 4 1.25 394.37
Mammal Time Who Sage 4 1.83 394.95
Mammal Who 2 1.90 395.02
Mammal Julian Time Who Seen 5 2.05 395.17
Mammal Julian Time Who Sage 5 2.80 395.92
Mammal Who Sage 3 3.22 396.34
Mammal Time Who Seen Sage 5 3.24 396.35
Mammal Who Seen 3 3.50 396.61
Mammal Julian Who 3 3.70 396.82
Mammal Julian Time Who Seen Sage 6 4.05 397.17

Here we can see the top model for Mammals only uses the time of day of the survey and who performed the survey, dropping the other potential covariates and simplifying the model. The bottom line is that ground squirrels were more easily detected later in the day and Lauren and I detected them differently. The analysis of bird detections chose a model using the day of year, who performed the survey, whether they were seen or heard the bird, and the amount of survey performed in open sage/grass versus forest. Excellent.

Now lets see how the data look once they are adjusted by these covariates, extreme values are truncated, and data is grouped into 10 meter buckets to smooth the curve. Mammals came out looking exactly like it should, as illustrated in the graph below! The sampled data is in blue and the "fitted curve" is in red. The fitted curve will be used to estimate abundance in each territory.

Sampled mammal data (blue) and fitted curve (red).

This may not look exciting, but it is absolutely awesome! This illustrates that my survey method worked. Not only did it work, it worked great! I was ecstatic when I first saw this. As scientists we like to find significant results, but we like even more to implement methods that when executed produce valid results. This did just that!

The avian results, as illustrated below, are not quite as clean due to a "training and assumption" issue in our field methods. The only issue is the increased detection probability near 50m. As you can see from the curve below, it had little influence on the fitted model and falls well within the bounds of the methodology. The end effect is potentially a slight under estimate of abundance within each territory, applied consistently across all of my territories. Since I am using abundance as a relative measure, the end effect is irrelevant.

Sampled bird data (blue) and fitted curve (red).

Thanks for sticking with me. All of this work simply leads us to the conclusion that the method works, the data is valid, and abundance estimates produced can be used in the next steps of my analysis. It is often the case that more work is required to prove the method than to interpret the results.

Now, on to the real results. I categorized each of the goshawk nesting territories as either "no occupied nest detected", "occupied nest that failed", or "occupied nest that successfully fledged young". Calculating mammal and avian prey abundance in each of these territories, we can compare the results between these categories. Mammalian prey abundance did not have a significant impact on nest occupancy or nest success. This is a bit of a surprise as we know from the not yet quantified camera footage and personal observation, that goshawks in the South Hills do eat a lot of ground squirrels.

Mammalian prey abundance in goshawk territories categorized by success - not significant.

A significant result would occur if the bar height of one bar fell outside of the "whiskers" on another bar. The bar represents the expected value and the whiskers represent the range within the true value should fall with 95% confidence. If a bar were to fall outside the 95% confidence of another category, we could call it significantly different. In this case none of the bars fall outside of the whiskers of any other bar. The whiskers on the "breeding, nest failed" bar are larger since there are only two territories in this category. Fewer samples result in larger uncertainty and a wider range of possible values. But, the avian prey results are significant!!

Avian abundance is a significant predictor of nest success!.

If you think I was excited before, this is outstanding! This illustrates that avian prey abundance in successful and breeding/failed territories are both significantly different than avian prey abundance in territories where no nest was detected. Although, the abundance between successful and failed nests is not significant. This far exceeds my expectations for my first field season! Awesome!

Why might avian prey abundance be significant while mammals are not. I will leave most of this for the discussion section of my research publication, but a likely answer is that nest occupancy is determined by two factors - past breeding success and prey abundance in February/March when the territories are chosen. Ground squirrels are unavailable in Feb/March and thus don't likely influence initial occupancy. Past success is dependent upon many phases including early spring, breeding season, and the post fledging dependency period (time after birds fledge but before they are independent from their parents). During this later period ground squirrels begin to estivate (summer hibernation) and the goshawk diet has to shift back to birds. Thus, birds early and late may possibly be the limiting factor to success. This theory would be consistent with my results. If you made it this far send me an email or post a comment so I know that this write-up was valuable.

I should note these results are still preliminary and I have months worth of further work and analysis to do. I will plan to provide other analysis updates as I make progress. I plan to present these results and the next few analysis steps at the annual Raptor Research Foundation conference in Duluth MN in early October (their logo is even a goshawk!).


John B. said...

Nice summary! I'm looking forward to the next installment.

I'm wondering if there is a way to test the importance of avian prey early in the season vs. late in the season (i.e., once ground squirrels are estivating). I would think the earlier prey abundance would be more important since that influences site selection and keeps the parents fed during the incubation period. But I could see an argument for the later since that is when chicks need to grow towards independence.

Rob said...

Thanks John. I have been thinking about how to measure prey early and late. Late is no problem, but early present a number of challenges.

Drew said...

Thanks for sharing a bit of of how you are analyzing and interpreting your data. Looks like you have some good stuff in there and I'll be waiting to see what else you share with us.