*Lophura swinhoii*) in Taiwan as an example.

In Latin Hypercube sampling, one must first decide how many sample points to use, and to remember for each sample point in which row and column the sample point was taken. Statistically expressed, the sdcLHS approach will address the following optimization problem: given

*N*sample sites with environmental variables (Z), select

*n*sample sites (

*n<<N*) such that the sampled sites form a Latin hypercube. For

*k*continuous variables, each component of Z is divided into

*n*equally probable strata based on their distributions; and z denotes a sub-sample of Z (Figure 1). The spatial conditioned Latin Hypercube Sampling (scLHS) is the sampling part (steps 2 to 8) of sdcLHS (Figure 1(a)) and was developed as a Windows-based tool to optimise sampling site selection (Figure 1(b)).

**Figure 1**. (a) Flowchart of the procedure of spatial downscaling conditioned Latin Hypercube Sampling (sdcLHS) ; (b) Interface of the Windows-based tool of spatially conditioned Latin Hypercube Sampling (scLHS). Step 1. ATP kriging to downscale environmental variables Z from coarse scale to the fine scale; Step 2. Division of the quantile distribution of Z into n strata; calculation of the quantile distribution for each variable; Step 3. Selection of n random samples from N; calculation of the correlation matrix of z (T); Step 4. Calculation of the objective functions. The overall objective function integrates four different components (objective functions) (for details see Lin et al. 2014). For general applications, the weight assigned to each component in the overall objective function is equal; Steps 5 to 7. Steps 5-7 are optimization procedures (for details see Minasny & McBratney 2006); Step 8. Repetition of steps 4 to 7 until either the objective function value fell beyond a given stop criterion or 10,000 iterations were completed.

We applied the optimal sampling method with a downscaling approach to locate optimal sampling sites at the *2km × 2km* scale, and to improve the identification of the spatial structure of the distribution of Swinhoe's blue pheasants (*Lophura swinhoii*) (Figure 2) in Taiwan. The distribution of the focal species was estimated by Maximum Entropy based on the existing 803 *2km × 2km* samples and separately based on 725 *1km × 1km* samples (Figure 3). The estimated distributions were assumed to be the real distribution of the focal species for evaluating our proposed approach.

**Figure 2**. Photo of Swinhoe's blue pheasants (*Lophura swinhoii*) (a) Female; (b) Male.

**Figure 3**. Observed samples with presence and absence data of Swinhoe's Blue Pheasant (Lee et al., 2004) in (a) *2km × 2km* sample sites; (b) *1km × 1km* sample sites; and (c) rain-gauge stations used for scaling validation. (Blue: presence; Green: absence).

The presence of the species at certain locations, determined from a set of samples based on presence-absence data, combined with the values of a selected set of environmental variables were used as input for the calculations. The resulting output represented the distribution of maximum entropy among all distributions satisfying the set of constraints. These methodological constraints required that the expected value of each environmental variable under the estimated distribution was nearly equal to its empirical average. The performances of Maximum Entropy were validated by the Kappa and AUC values.

The sample locations at

*2km × 2km*and

*1km × 1km*resolution were partially clustered due to similar spatial patterns and structures (variograms) of the variation of several environmental parameters (Figure 4). The Kappa value of the Maximum Entropy model was 0.38 and the AUC value was 0.86 in model validations using 401 samples at the

*2km × 2km*resolution. The Kappa and AUC values in the Maximum Entropy method were slightly higher when using 362 samples (Kappa= 0.58; AUC= 0.92) at the

*1km × 1km*resolution. The predictions with 200, 400 and 600 optimal samples taken from the assumed real distributions showed a consistently high performance, with AUC values of 0.99 and Kappa values of 0.97-1.00 for

*1km × 1km*cells and AUC values of 0.98 and Kappa values of 0.96-0.98 for

*2km × 2km*cells.

**Figure 4**. Locations of (a) 200, (b) 400, and (c) 600 *2km × 2km* and *1km × 1km* samples derived by the optimal sdcLHS approach (sdcLHS: spatial downscaling conditional Latin Hypercube sampling).

Incorporating spatial dependency of variables with different resolution into sampling approaches is critical to achieving efficient unbiased spatial sampling. Our analysis showed that fine scale data yielded accurate presence/absence maps using a subset of presence-absence data that were optimally located. Locations of samples tended to be spatially non-randomly distributed when sample size increased at a coarser cell size. In regards to cost and resource efficiency without losing the spatial structure of the focal species, a sample size of 200 sites is enough to capture the spatial structure and predict the spatial distribution of the focal species.

**References**

Lin, Y.-P., Lin, W.-C., Li, M.-Y., Chen, Y.-Y., Chiang, L.-C., Wang,Y.-C. 2014 (in press). Identification of spatial distributions and uncertainties of multiple heavy metal concentrations by using spatial conditioned Latin Hypercube sampling. Geoderma. 10.1016/j.geoderma.2014.03.015.

Minasny, B., McBratney, A.B. 2006. A conditioned Latin hypercube method for sampling in the presence of ancillary information. Computers & Geosciences 32: 1378-1388.

Phillips, S.J., Dudík, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., et al. 2009. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications 19: 181-197.