Your E-mail
Remember me
 Forgotten password
SCALETOOL IntroductionDriversBiodiversityPolicies and managementConnectivity and protected areas

Case study - optimal sampling in Taiwan

Sampling is fundamental to most ecological studies and a representative sampling design is of high importance for biodiversity monitoring. It was previously recommended that the ecological sampling design be stratified to improve precision, accuracy, and to ensure proper spatial coverage. Hence, stratified random sampling has been one of the designs frequently applied in ecological studies. Among the various options for stratified random sampling, Latin hypercube sampling (LHS) is promising. It efficiently samples variables from their multivariate distributions and can be conditioned in the multidimensional space defined by environmental covariates, then called conditioned Latin Hypercube Sampling (cLHS). The cLHS approach may be used to optimize the sampling design and improve predictions of species distributions by introducing spatial structures of explanatory variables and their cross-spatial structures into the cLHS optimisation procedure. This is of special importance as overestimates in species distribution models often result from a lack of relevant explanatory variables, or spatial autocorrelation. Environmental variables and species' distribution data are frequently recorded in different cell (grain) sizes. Therefore, spatial resolution is critical in any examination of distributions of species. Reliable methods to downscale environmental variables or species distributions from coarse to fine grain resolutions have potential benefits for ecology and conservation studies. In regard to spatial resolution, species distribution models (SDMs) are impacted by the fact that environmental descriptors of samples are frequently recorded at different resolutions and may thus require scaling to the same resolution. A method called Area-to-Point (ATP) kriging uses spatial structures of predictors for downscaling to predict species distributions by taking spatial dependence of predictors into account. We illustrate the approach using Swinhoe's blue pheasants (Lophura swinhoii) in Taiwan as an example.

In Latin Hypercube sampling, one must first decide how many sample points to use, and to remember for each sample point in which row and column the sample point was taken. Statistically expressed, the sdcLHS approach will address the following optimization problem: given N sample sites with environmental variables (Z), select n sample sites (n<<N) such that the sampled sites form a Latin hypercube. For k continuous variables, each component of Z is divided into n equally probable strata based on their distributions; and z denotes a sub-sample of Z (Figure 1). The spatial conditioned Latin Hypercube Sampling (scLHS) is the sampling part (steps 2 to 8) of sdcLHS (Figure 1(a)) and was developed as a Windows-based tool to optimise sampling site selection (Figure 1(b)).

Figure 1. (a) Flowchart of the procedure of spatial downscaling conditioned Latin Hypercube Sampling (sdcLHS) ; (b) Interface of the Windows-based tool of spatially conditioned Latin Hypercube Sampling (scLHS). Step 1. ATP kriging to downscale environmental variables Z from coarse scale to the fine scale; Step 2. Division of the quantile distribution of Z into n strata; calculation of the quantile distribution for each variable; Step 3. Selection of n random samples from N; calculation of the correlation matrix of z (T); Step 4. Calculation of the objective functions. The overall objective function integrates four different components (objective functions) (for details see Lin et al. 2014). For general applications, the weight assigned to each component in the overall objective function is equal; Steps 5 to 7. Steps 5-7 are optimization procedures (for details see Minasny & McBratney 2006); Step 8. Repetition of steps 4 to 7 until either the objective function value fell beyond a given stop criterion or 10,000 iterations were completed.

We applied the optimal sampling method with a downscaling approach to locate optimal sampling sites at the 2km × 2km scale, and to improve the identification of the spatial structure of the distribution of Swinhoe's blue pheasants (Lophura swinhoii) (Figure 2) in Taiwan. The distribution of the focal species was estimated by Maximum Entropy based on the existing 803 2km × 2km samples and separately based on 725 1km × 1km samples (Figure 3). The estimated distributions were assumed to be the real distribution of the focal species for evaluating our proposed approach.

Figure 2. Photo of Swinhoe's blue pheasants (Lophura swinhoii) (a) Female; (b) Male.

Figure 3. Observed samples with presence and absence data of Swinhoe's Blue Pheasant (Lee et al., 2004) in (a) 2km × 2km sample sites; (b) 1km × 1km sample sites; and (c) rain-gauge stations used for scaling validation. (Blue: presence; Green: absence).

The presence of the species at certain locations, determined from a set of samples based on presence-absence data, combined with the values of a selected set of environmental variables were used as input for the calculations. The resulting output represented the distribution of maximum entropy among all distributions satisfying the set of constraints. These methodological constraints required that the expected value of each environmental variable under the estimated distribution was nearly equal to its empirical average. The performances of Maximum Entropy were validated by the Kappa and AUC values.

The sample locations at 2km × 2km and 1km × 1km resolution were partially clustered due to similar spatial patterns and structures (variograms) of the variation of several environmental parameters (Figure 4). The Kappa value of the Maximum Entropy model was 0.38 and the AUC value was 0.86 in model validations using 401 samples at the 2km × 2km resolution. The Kappa and AUC values in the Maximum Entropy method were slightly higher when using 362 samples (Kappa= 0.58; AUC= 0.92) at the 1km × 1km resolution. The predictions with 200, 400 and 600 optimal samples taken from the assumed real distributions showed a consistently high performance, with AUC values of 0.99 and Kappa values of 0.97-1.00 for 1km × 1km cells and AUC values of 0.98 and Kappa values of 0.96-0.98 for 2km × 2km cells.

Figure 4. Locations of (a) 200, (b) 400, and (c) 600 2km × 2km and 1km × 1km samples derived by the optimal sdcLHS approach (sdcLHS: spatial downscaling conditional Latin Hypercube sampling).

Incorporating spatial dependency of variables with different resolution into sampling approaches is critical to achieving efficient unbiased spatial sampling. Our analysis showed that fine scale data yielded accurate presence/absence maps using a subset of presence-absence data that were optimally located. Locations of samples tended to be spatially non-randomly distributed when sample size increased at a coarser cell size. In regards to cost and resource efficiency without losing the spatial structure of the focal species, a sample size of 200 sites is enough to capture the spatial structure and predict the spatial distribution of the focal species.


Keil, P., Belmaker, J., Wilson, A.M., Unitt, P., Jetz, W. 2013. Downscaling of species distribution models: a hierarchical approach. Methods in Ecology and Evolution 4:, 82-94.

Lin, Y.-P., Lin, W.-C., Li, M.-Y., Chen, Y.-Y., Chiang, L.-C., Wang,Y.-C. 2014 (in press). Identification of spatial distributions and uncertainties of multiple heavy metal concentrations by using spatial conditioned Latin Hypercube sampling. Geoderma. 10.1016/j.geoderma.2014.03.015.

Minasny, B., McBratney, A.B. 2006. A conditioned Latin hypercube method for sampling in the presence of ancillary information. Computers & Geosciences 32: 1378-1388.

Phillips, S.J., Dudík, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., et al. 2009. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications 19: 181-197.
Copyright and disclaimer: SCALES and SCALETOOL

CONDITIONS OF USE: We explicitly encourage the use of SCALETOOL. SCALETOOL is freely available for non-commercial use provided you acknowledge SCALES as source. For more extensive access to databases (e.g. for statistical analyses or if you want to contribute data), tools, or background material, please contact the SCALES coordinator (send us email).


© 2010 - 2018 SCALES. All rights reserved.