Using machine learning to predict chronic wasting disease in white-tailed deer at the county scale

Despite governance autonomy for management agencies, wild wildlife crosses jurisdictional boundaries. Thus, wildlife management agencies across North America would benefit from collaborative efforts designed to understand common risk factors for the disease. Our study was the first to use regional data representing a single species subject to diverse management objectives, population dynamics, habitat types, and regulations across 16 US states. Additionally, by applying state-of-the-art ML techniques to wildlife health data, we were able to identify counties with similar characteristics to Midwestern and Eastern US counties where CWD has been confirmed.

The results of the LGB algorithm reveal regulatory implications for the CWD predictions shown in Figure 2. Indeed, wildlife experts have long pointed to human transfer of prions via live deer, carcasses, trophy heads, deer parts, and urine lures as risk factors for CWD introduction.^8,9As a result, wildlife management authorities have implemented a variety of regulatory measures to limit or eliminate these anthropogenic routes of introduction.³⁴The result of the LGB algorithm is that the natural movement of the deer³⁵ We strongly caution, however, that these features and their importance are a phenomenon of the data and may not be absolute. Ultimately, the three other candidate algorithms performed similarly on these data (see Supplementary Table S2), with their results relying on an entirely different set and ranking of factor importance. Specifically, the RF algorithm ranked hunter harvest (a proxy for deer density) over the 100% predicted CWD status.³⁶Clay soil^{37,38,39,40,41}forest cover¹³and the distance to the river³⁵ The DT algorithm ranked hunter harvest as the most important feature driving predictions of CWD status.³⁶distance to river³⁵Clay soil^{37,38,39,40,41} and forest cover¹³ In order, we listed the key features that influence the prediction of CWD status. Finally, the GB algorithm uses the hunter harvest rate as a³⁶distance to river³⁵forest cover¹³and clay soil^{37,38,39,40,41} are listed, in that order, as the top features that influence the prediction of CWD status. Every algorithm has a way of backing up its importance with prior research. These seemingly similar results raise the question: If the LGB, RF, DT, and GB algorithms were comparable in accuracy, how did we choose the LGB algorithm shown in Figure 2? The answer lies in the underlying mathematics. We recognized that there was not yet enough data for a clearly superior predictor to emerge, so we chose the highest predictor among the current predictors. Average Precision (Even when other algorithms perform better than LGB, in certain instances, it is often by random chance.) LGB has also been demonstrated to provide the best balance of training and testing accuracy over RF, DT, and GB options. As more data is incorporated into future fittings of these ML models (see additional discussion below), performance will likely improve. average will settle to an asymptotic mean according to the law of large numbers⁴²And all of these four algorithms (including the corresponding feature importance and rank) could emerge as good predictors of CWD status.

The importance of factors for the LGB, RF, DT, and GB algorithms arose from a spatially diverse dataset, so the results provide additional insights when compared to results obtained using more localized data. However, these factors were found to be important for these algorithms only among the factors evaluated, and other factors that may be unique pathobiological characteristics of CWD were excluded from this study. For example, the pooled dataset³³ It did not include data on factors that may be related to CWD status, such as weather, prion strain, type of diagnostic test, deer genetics, or apparent dispersal of deer.³⁵,Management strategy⁸the presence of sympatric susceptible species⁴³Illegal acts such as the unauthorized movement or release of captive-bred deer from CWD-positive herds.^44,45or geographic proximity to infections in nearby areas^Five.

Results of the LGB algorithm applied to the pooled dataset³³ Regulation was found to be important in predicting CWD status. However, two specific regulations identified by the LGB model (Urine attractant and Importing whole bodies) is confounded with other regulations that we excluded due to their high correlation (Breeding facilities, Interstate importation of live cervids, Intrastate movement of live deerBecause of the covariance regulation (and the selection procedure for variables to remove and retain see Methods), rather than taking the variable names at face value, we investigated the importance of urine lure and whole carcass regulation by: Common human activities that can lead to contamination of reservoirs.

There is much room for improvement in this model. The North American Wildlife Conservation Model recognizes science as an appropriate tool to guide wildlife resource management.⁴⁶However, each state wildlife agency has the autonomy to determine the scientific approach that best suits its needs.^16,17Therefore, the first challenge of this study was to find a spatial unit that was the “lowest common denominator” across all states. Many of the agencies represented in the pooled dataset³³ We decided to conduct our analysis at the county scale because their CWD inspection (surveillance) data recorded counties (and ancillary spatial data were collected at units that allow us to confidently infer counties from reported locations). However, we recognize that counties may not be ecologically relevant to deer herd biology nor to the spatial units of interest to wildlife managers. Additionally, the choice of counties created problems for predictions in Minnesota (see discussion below). Nevertheless, using counties had several advantages. First, this decision allowed us to leverage the power of the largest set of existing CWD surveillance data to create the first-ever regional model showing predictions of CWD status in North America. Second, this decision allowed us to compare CWD status across a myriad of local configurations (management and policy) and identify potential unique characteristics of CWD. While work remains to identify the best algorithms for predicting CWD status in North America, our results so far suggest that regulations, hunter hunting (as a proxy for deer density), and habitat variables (forests, clay, distance to rivers) may affect CWD status independent of local management decisions and policies. Finally, counties are a scale of interest to public health departments.⁴⁷ Agencies interested in tracking CWD in wild herds. ML methods require one year of pooled data to train the model and the following year of pooled data to evaluate predictions. Therefore, if other scales are of interest in your monitoring plan, we recommend that agencies work together to collect information for two consecutive years at the scales of interest.

Disagreements on CWD status CWD prediction web app Projections and surveillance data for the 2020-21 season can be explained in one of two ways for all participating states (Case 1). CWD prediction web app Case 2: A CWD positive was predicted, but surveillance data reported a CWD negative result, and CWD was not actually present in white-tailed deer in that county (so the error was on the side of the model). CWD prediction web app The Minnesota-specific discrepancy can be explained in a third known way (Case 3): the surveillance data reported a CWD negative result, whereas CWD was actually present in white-tailed deer in that county. CWD prediction web app We predicted the CWD positive or CWD nondetected status of each Minnesota county using harvest estimates that themselves deviate from reality. [Despite the lack of information to confidently convert harvest data across spatial scales in Minnesota, proportional allocation was used³³ to make county-based approximations of harvest from harvest tallies by Deer Permit Areas (DPAs). Sensitivity analysis of CWD Prediction Web App predictions relative to alterations in harvest revealed vulnerabilities in binary predictions. Specifically, 100% (52/52) of the predicted CWD-non detect counties and 94.3% (33/35) of the predicted CWD-positive counties in Minnesota hinged on the value of harvest obtained through the county-approximation. There is no way to know if or to what extent county approximations differ from reality. Nevertheless, the Supplement contains the county approximation value of hunter harvest used in predictions as well as the bifurcation point differentiating a CWD-positive prediction from a CWD-non detect prediction for each county in Minnesota.] Reducing the error in (Case 1) can be achieved by rerunning the model for a single seasonal year including all counties listed here plus counties from additional states with both CWD positive and CWD non-detected herds (the model cannot be improved by adding additional years of data from counties in states already shown, nor by adding counties in new states without CWD).Reducing the error in (Case 2) can be achieved by taking enough samples in each county so that we can be 95% confident that the CWD non-detected counties in our data are in fact disease-free.⁴⁸Reducing the error in (Case 3) can be achieved by pooling regional records with perfectly comparable units (or spatial scales) or by using only those records that contain enough information for a one-to-one transformation between units (or spatial scales).

Despite the large datasets and powerful modeling tools, CWD prediction web app It has complex statistical and ecological problems. For example, pooled data sets³³ They reported the presence or absence of CWD in counties directly from sample testing data but did not take into account sampling effort, potential introduction time, deer population growth rate, disease transmission, or probability of detection.⁴⁹Pooled Dataset³³ While this is the best available regional information on CWD status by county/season/year, we recognize that counties deemed CWD-free may have too few samples to support such a declaration. If this analysis were to be repeated with more agency partners (recommended), we suggest using data from counties where sufficient samples were taken to ensure statistical confidence in CWD status. Also, standardized methods exist for the diagnosis of CWD in captive deer herds.⁵⁰However, there are no similar criteria for wild cervids, and CWD designation is made by state wildlife officials.Furthermore, we suggest adopting standardized terminology and definitions for all CWD topics to facilitate data comparison in future regional studies.

of CWD prediction web app This is an important new tool for CWD surveillance programs, especially when managers of vast areas don't know where to start testing for the disease. CWD prediction web app There are three ways in which this could be done. First, it may be tempting to use this tool to predict CWD status in areas smaller than a county, such as game management units. Until the model underlying the tool is mature, we do not recommend this use. CWD prediction web app It is validated using a known dataset containing true positives and negatives at this geographic scale. Instead, we now recommend the use of habitat risk models.⁵¹ To conduct such an analysis, the surveillance data for the area in question must be geographically accurate. Second, given the findings of the FN study, Web App It should not be used alone to determine sampling strategy or as a substitute for annual institutional tissue collection and testing in the field. And third, given our findings that predictive performance was similar across the four ML algorithms but feature importance differed, we do not recommend interpreting the importance of LGB features as an absolute truth in CWD prediction.

Pooled Dataset³³ Although no data on distance to infected areas were included, a regional map revealed that many of the CWD-positive predictions were close to known infected areas (Figure 2). Authorities may already be searching for CWD in areas adjacent to the core infected areas, but CWD prediction web app This could help highlight counties that are especially vulnerable to CWD in less obvious locations. CWD prediction web app To find out if you are CWD positive, CWD prediction web app Combined with other models to pinpoint the conditions for on-site outbreaks^7,51,52 For surveillance planning: In addition to the error reduction recommended above, we encourage future ML models to incorporate information from geographic proximity data and/or diffusion models to more accurately characterize the spread of disease across landscapes.⁵³ We didn't do that.

Source link