The spatial distribution of the woodland communities and their associated environmental drivers in the Golden Gate Highlands National Park , South Africa

Several studies across the world have attempted to predict the impact of environmental factors on the spatial distribution of plant communities in different landscapes (Hutyra et al. 2005; Salazar, Nobre & Oyama 2007). The findings from these studies predict changes in dry woodland and Afromontane forests because of climatic changes. The increase in temperatures and other environmental factors are also expected to result in increased fire frequencies and changes in woodland communities (Abiem et al. 2020; Adelabu, Adepoju & Mofokeng 2020; Everand 1986).


Introduction
Several studies across the world have attempted to predict the impact of environmental factors on the spatial distribution of plant communities in different landscapes (Hutyra et al. 2005;Salazar, Nobre & Oyama 2007). The findings from these studies predict changes in dry woodland and Afromontane forests because of climatic changes. The increase in temperatures and other environmental factors are also expected to result in increased fire frequencies and changes in woodland communities (Abiem et al. 2020;Adelabu, Adepoju & Mofokeng 2020;Everand 1986).
Previous studies on structural and floristic composition of the Afromontane forests and other woodland communities in Golden Gate Highlands National Park (GGHNP) were restricted to a smaller area before the expansion of the park to its current footprint (Manfred 1990). Since then, the park has expanded considerably from 11 346 ha to 32 758 ha. Woody communities found in The extreme variability in the topography, altitude and climatic conditions in the temperate Grassland Mountains of Southern Africa is associated with the complex mosaic of grassland communities with pockets of woodland patches. Understanding the relationships between plant communities and environmental parameters is essential in biodiversity conservation, especially for current and future climate change predictions. This article focused on the spatial distribution of woodland communities and their associated environmental drivers in the Golden Gate Highlands (GGHNP) National Park in South Africa. A generalized linear model (GLM) assuming a binomial distribution, was used to determine the optimal environmental variables influencing the spatial distribution of the woodland communities. The Coefficient of Variation (CV) was relatively higher for the topographic ruggedness index (68.78%), topographic roughness index (68.03), aspect (60.04%), coarse fragments (37.46%) and the topographic wetness index (31.33) whereas soil pH, bulk density, sandy and clay contents had relatively less variation (2.39%, 3.23%, 7.56% and 8.46% respectively). In determining the optimal number of environmental variables influencing the spatial distribution of woodland communities, roughness index, topographic wetness index, soil coarse fragments, soil organic carbon, soil cation exchange capacity and remote-sensing based vegetation condition index were significant (p < 0.05) and positively correlated with the woodland communities. Soil nitrogen, clay content, soil pH, fire and elevation were also significant but negatively correlated with the woodland communities. The area under the curve (AUC) of the receiver operating characteristics (ROC) was 0.81. This was indicative of a Parsimonious Model with explanatory predictive power for determination of optimal environmental variables in vegetation ecology.
the study area form an important biogeographical link between larger forest areas in the wider Drakensberg region (White 1978). These woodland communities in the Grassland biome are mostly restricted to deep valleys and drainage lines as compared to the dispersed spatial distribution of trees in the savanna biome. Studies undertaken in the grassland ecosystem suggest that plant communities respond in different ways to the increasing environmental changes and availability of moisture and protection against fire (Adagbasa, Adelabu & Okello 2018;Adagbasa, Adelabu & Okello 2020;Adelabu et al. 2020;Botha, Archibald & Greve 2020;Everand 1986). Other studies in the African continent showed elevation, slope and aspect as the determinants for the spatial and temporal distribution of plant communities and species composition (Sala & Paruelo 1997). Other environmental determinants such as topographic position, physical and chemical properties of the soil were also found to influence plant community types and the associated ecosystem services (Havstad et al. 2007;Sala & Paruelo 1997).
To monitor the ecological status and impact on these woodland communities, vegetation composition and the canopy structural parameters are typically measured in situ as indicators of change (Khan et al. 2012;Rahman et al. 2016;Ribichich & Protomastro 1998). However, employing in situ sampling in these areas can be time-consuming and impractical because of inaccessible terrains. The basic field measurements and conventional statistics in this instance can also not explain the spatial variation because of multiple interactions amongst state (vegetation, species distribution, understory cover, soil, topography, etc.) and other variables such as climate and human factors (Karahan & Erşahin 2018).
In order to provide a statistical relationship between the spatial distribution of woodland communities and the environmental variables at previously un-sampled locations, the exploratory and predictive models were used to predict probability of occurrence (Guisan, Edwards & Hastie 2002;Li et al. 2014;Pal 2005). Statistical analysis such as regression has been widely used in ecological modelling of the spatial distribution of species and their associated environmental parameters (Franklin 1998;Guisan, Weiss & Weiss 1999;Guisan & Zimmermann 2000;Lenihan 1993;Scott et al. 2002). In this study, we looked at the spatial distribution of the woodland communities and their associated environmental drivers. We used both exploratory statistics and logistic regression model (LRM) (Austin, Nicholls & Margules 1990) to determine the strength of the statistical relationship between a response (e.g. plant species presence) and a suite of environmental variables such as topography, edaphic variables, remotely sensed vegetation condition index (VCI) and fire.

Study area
This study was undertaken in the GGHNP located in the foothills of the Drakensberg Mountains in eastern Free State Province of South Africa. Golden Gate Highlands National Park lies between latitude 28°30' and 28°45' south and longitude 28°30' and 28°37' east, on the border between South Africa and Lesotho ( Figure 1). The park forms a part of the Maloti Drakensberg Catchment Complex that produces about 50% of the total water supply in South Africa. The area falls under summer rainfall region where the rainy season stretches from September to April with mean annual rainfall of 780 mm. The park is underlain by rock formations representing the upper Karoo Sequence which is intruded by dolerite dykes and sills.
The vegetation of the park is predominantly grassland, with a very small percentage of woodland communities along the rivers and in sheltered places that are protected from fire. Mucina and Rutherford (eds. 2006)

Delineation of forest patches, field data collection, data preparation and analysis
The high-resolution imagery provided via Google Earth has been increasingly used in scientific research (Fortin & Edwards 2001;Madin, Madin & Booth 2011;Pringle 2010) to assist in the selection of field sampling locations (Tang et al. 2010). The woodland polygons were delineated using high-resolution images within Google Earth mainly from Maxar technologies responsible for WorldView 1-3, and Geoeye with less than 1-m resolution from 2015 to 2020. Because the study area is a mountainous environment, time-slide in Google Earth was used to access historical image archives to offset shadow effects. The woodland community polygons were converted from Google Earth files to shapefile and topologically corrected in ArcGIS version 10.5.1. Fieldwork was also conducted to ensure that the woodland community polygons are adjusted and that the boundaries are correctly delineated for further analysis. Purposive sampling was undertaken using the delineated woodland polygons to verify mapped and unmapped woodland communities covered by the mountain shadows. The woodland community polygons data were used as training samples for statistical analysis (Estes et al. 2012).

Environmental data sets
Different environmental data sets were used to determine how they influence the spatial distribution of the woodland communities across the landscape. The environmental variables were downloaded from different databases sources (Table 1). Soil chemical and physical properties were one of the environmental variables used. Soil chemical variables included pH, nitrogen (N), soil organic carbon, cation exchange capacity (CEC), whilst soil physical properties included silt, coarse fragments, bulk density, sand, and clay content (Hengl et al. 2017). Shuttle Radar Topography Mission (SRTM) Digital Elevation Models (DEM) was also acquired (https://www2.jpl.nasa.gov/srtm/). Slope and aspect were computed from DEM using ArcGIS 10.5.1.
Topographical variables such as slope and aspect were computed using ArcGIS version 10.5.1. Topographic Position Index (TPI), Topographic Wetness Index (TWI), Topographic Ruggedness Index (TRI) and Topographic Roughness were computed using QGIS version 3.0 (Guisan et al. 1999;Mokarram, Roshan & Negahban 2015;Muddarisna et al. 2020;Radułaa, Szymura & Szymura 2018;Riley, DeGloria & Elliot 1999). The TPI computes the differences of elevation at a specific pixel (central pixel) and the average elevation around it within a defined radius (Gallant & Wilson 2000;Weiss 2001). The elevation is an input to the computations of the TPI. The TWI is one of the hydrologically based topographic index. Topographic Wetness Index describes the tendency of a cell to accumulate water (Gruber & Peckham 2009), and can be used as an indicator of soil moisture content. The input variable to calculate TWI is the slope as computed from the elevation data. The TRI was computed according to Riley et al. (1999) providing summary of change in elevation over 3 × 3 pixel window. It computes the terrain heterogeneity measurement for every location, and each pixel contains the difference in elevation from a centre pixel and the eight pixels surrounding it. The extraction of values from various environmental variables were extracted using a tool 'Extracting values to points' embedded in ArcGIS version 10.5.1. The fire frequency was done using Moderate Resolution Imaging Spectroradiometer (MODIS) burn area index, and the VCI for February 2020 was computed from 2001 to 2020 MODIS normalized difference vegetation index (NDVI) ( Table 1). Vegetation condition index is a good indicator of how vegetation is affected by drought (Kogan 1990).

Data preparation for analysis
To prepare data for analysis, delineated woodland polygons were used to extract data from a series of environmental data. The first step was to create a regular point shapefile layer with a spacing of 250 m using QGIS version 3.0. The point data were then overlaid on the delineated woodland polygons to create a binary layer (250 m spacing) indicating the presence (ones) and absence (zeros) of woodland polygons. The 250 m spacing was selected because most of the environmental layers had 250 m spatial resolution. The binary layer was used to extract all the environmental variables associated with the presence and the absence data for statistical analysis (Table 1). There were about 1500 presence points and approximately the same amount of absence points for the park. The binary layer with associated environmental variables was converted into a text file for further statistical analysis. For statistical analysis, the binary field in the latter layer is the dependent variable and all the environmental variables are independent.

Data analysis and modelling
Descriptive analysis (minimum, maximum, mean, standard deviation and coefficient of variation (standard deviation/ mean) × 100) was done to determine the spread and variation of the data sets across the study area, associated with the presence and absence locations. Cross-correlation was also done to determine a level to which explanatory variables are significantly related based on the 95% confidence level.
A generalized linear model (GLM) assuming a binomial distribution (LRM) (Bolker et al. 2009;Hosmer, Lemeshow & Sturdivant 2013), was used to determine optimal environmental variables influencing the spatial distribution of the woodland communities. To minimise multicollinearity in highly correlated independent variables and over-fitting in training and test datasets, the stepwise logistic model was used: Where P is the probability of occurrence for the forest patches as explained by several environmental variables, and y = β1x1+ β2x2+ β3x3… + c, where β = slope, x = variables and c = intercept: To determine the optimal number of environmental variables influencing the spatial distribution of woodland communities, several significance tests were done at 95% confidence level (p < 0.05). 'Optimal environmental variables' in this article refer to significant variables for explaining the spatial distribution of the woodland communities or a model with the best fit. The optimal variables, and their statistical estimates and confidence levels were also recorded (Table 2). To validate the LRMs used, the data was split into 70% calibration and 30% validation and tested accordingly. The area under-thecurve (AUC) was used as an accuracy indicator for the model (Bradley 1997;Hand 2009 The Spearman rank order correlation coefficient, (ρ) was also used to measure the strength and direction of association that exists between two variables (Eqn 3): Where d ι = difference in paired ranks and n = number of cases. Figure 2 provides a summary of methods and procedures followed in data collection and analysis.

Delineation of forest patches
Thirty-two woodland polygons were identified in GGHNP and delineated using the Google Earth programme. The woodland polygons were converted to a shapefile and edited to ensure topology in ArcGIS version 10.5.1. Field verification was done for grouping woodland polygons into communities. Using GGHNP broad vegetation map and classification by Manfred (1990), woodland polygons were grouped into four different categories (Figure 3), namely Afromontane forests (Olinia, Podocarpus, Kiggelaria), Euclea woodland, Leucosidea woodland, and Protea woodland. The largest extent of woodland community, Leucosidea woodland was associated with the drainage lines and forest margins, although encroaching the footslopes and midslopes in some areas. Afromontane forests (Olinia, Podocarpus, and Kiggelaria) was confined to sheltered gorges and deep valleys. The Euclea woodland community was found at base of cliffs and shelter of large boulders, and also in small clumps in rocky open grassland. The Protea woodland community occurred in open grassland and associated with well drained soils in the footslopes and midslopes.

Exploratory analysis (descriptive)
The descriptive statistics showed relative variability amongst the environmental parameters with appreciably different means (Table 2). For example, variation was relatively higher for TRI (68.78%), topographic roughness (68.03), aspect (60.04%), coarse fragments (37.46%) and TWI (31.33), whereas pH, bulk density, sand and clay content had relatively less variation (2.39%, 3.23%, 7.56% and 8.46%, respectively). However, the topographical position index showed a relatively higher negative value (−818.18%). According to studies on topographic position and landform analysis, negative TPI values signify locations that are lower than their surroundings (gorges, deep valleys), whereas positive TPI values represent locations that are higher than the average of their surroundings (De Reu et al. 2013;Weiss 2001).
The Spearman Rank Correlation showed no significant correlation between most of the environmental parameters ( Figure 4). However, significant and positive correlations were found between a few variables such as Topographic Roughness versus TRI (r = 0.985), CEC versus Soil Organic Carbon (r = 0.823); Silt versus CEC (r = 0.783); Elevation versus Soil Organic Carbon (r = 0.757), Clay Content versus Soil Organic Carbon (r = 0.64) etc., whilst negative and significant correlations were found between variables such as Sand versus Silt (r = −0.879); Sand versus Clay Content

Optimal environmental parameters influencing woodland communities
In determining the optimal number of environmental variables influencing the spatial distribution of woodland communities, topography, edaphic factors (soil chemical and physical properties), fire frequency and remote sensing-based VCI were found to be significant at 95% confidence level (Table 3). The environmental variables such as topographic roughness, TWI, soil organic carbon, CEC and coarse fragments were significant and correlated positively with the woodland assemblages (Table 3). Elevation, soil nitrogen, soil pH, clay content, and fire were also significant but negatively correlated with the woodland assemblages.

Model validation
The AUC (= 0.81) of the receiver operating characteristic (ROC) was used in evaluating the GLM model for predicting the accuracy of the distributions of the woodland assemblages ( Figure 5) (Wang & Zheng 2013). The AUC indicated that the optimal environmental variables selected as determinants of the spatial distribution of woodland assemblages were based on the good or parsimonious model.

Discussion
The study assessed the spatial distribution of the woodland communities and their environmental drivers in the GGHNP. The use of high-resolution imagery from Google Earth and ESRI Base maps, and field verification data was effective in delineating the woodland in the current study. Georeferenced data sourced from different databases were also found to be of great value in describing the spatial distribution of the woodland and their associated environmental parameters. Our GLM model analysis identified topography, edaphic factors (soil chemical and physical properties), fire frequency and remote sensing-based VCI as the most important environmental variables influencing the spatial distribution of the woodland communities. Three topographic factors, namely elevation, topographic roughness and TWI influenced the occurrence of the woodland communities. Although temperature and precipitation were not included as environmental variables in the study, they are critical variables influencing the state of the vegetation in the mountain ecosystems (Basist, Bell & Meentemeyer 1994). However, local climatic conditions strongly influence the relationship between topography and the spatial distribution of precipitation (Basist et al. 1994); hence, temperature and precipitation were excluded in the study. This suggests that one can estimate the spatial distribution of mean annual precipitation using topographically based regression equations (Basist et al. 1994). In addition, we also used VCI, which is an index for assessing drought impact on vegetation.
Elevation in the GGHNP was found to be one of the environmental parameters limiting the distribution of the woodland communities. Elevation was significant but negatively correlated with the woodland communities. Woodland communities were absent above 2757 m and below 1662 m above sea level. In the Natal Drakensberg, the woodland communities limit lies in the montane and subalpine vegetation belts just below 2500 m above sea level (Chawla et al. 2008;Killick 1963). Topographic wetness index and topographic roughness derived from digital elevation model also had significant and positive correlation, implying that the soil moisture content was an important driver in the spatial distribution of the four different types of woodland communities (Li & McCarty 2019).
Although soil clay content was significantly and negatively associated with the woodland communities, relatively low variation in soil clay content did occur (coefficient variation = 8.46%). The woodland communities associated with the gorges, deep valleys and drainage lines (i.e. Afromontane forests and Leucosidea shrublands) have relative high water retention capacity because of the clay content (Groen et al. 2008). These communities are also associated with high soil organic content because of positive correlation between soil organic content versus clay content (r = 0.64). The Euclea woodland is mostly associated with sheltered boulders and shallow soils whereas Protea woodland is associated with well drained sandy soils in the footslopes and midlopes of GGHNP. The rocky and sandy habitats are poor in soil organic matter and have low water retention capacity and decreased water infiltration (Chow et al. 2007). This association is supported by negative and significant correlation found between soil organic carbon versus sand (r = −0832), and sand versus clay content (r = −0.825).
Soil nitrogen and soil pH were also found to be significant but negatively correlated with the woodland communities.
AUC, areas under the curve.   The Protea and Euclea woodland communities are associated with low soil pH in steeper and drier slopes because of drained nutrients (Woldemariam et al. 2008). Studies in the Maloti-Drakensberg Park also showed steep topography to be associated with leached nutrients and low pH soils (Carbutt & Edward 2015). However, despite intrinsically high levels of total soil nitrogen, the soil economy of the Drakensberg Mountains (> 1800 m above sea level) is characterised by low inorganic soil nitrogen availability because of the effect of low temperatures on soil nitrogen mineralisation (Carbutt et al. 2013).
Fire frequency was significant but also negatively correlated with the woodland communities. Nearly 26.9% spatial variation in fire frequency can be ascribed to different habitats associated with the woodland communities. The Afromontane forests and the Leucosidea woodland communities occurring in gorges and deep valleys are protected from fire (Manfred 1990). The Euclea woodland community remain relatively self-protected from burning because of rocky habitat and the lack of grass biomass required for intense frequent fires (Van Langevelde et al. 2003). Positive correlation was also found between satellite-based VCI and woodland communities. Approximately 19.69% of the spatial variations in VCI can be alluded to evergreen woodland communities occurring in areas of deep valley, drainage lines and gorges (Afromontane forest and Leucosidea woodland) and deciduous communities such as Euclea woodland community.

Conclusion
The GGHNP forms part of the Maluti Drakensberg Transfrontier region characterised by diverse topography and steep altitudinal gradients. It is an ideal region in which to study possible vegetation changes or shifts as a consequence of environmental changes (Hill 1996).
The analysis from the environmental data in GGHNP found topography, soil properties (chemical and physical), fire, and VCI to be the most influential factors in the spatial distribution of the woodland communities. Topographical variables such as roughness and topographical wetness index were found to be the most significant variables positively correlated with the woodland communities. Soil organic carbon, coarse fragments, CEC and VCI were also significant and positively correlated with the spatial distribution of the plant communities. Elevation, nitrogen, pH, fire frequency and clay content were also significant but negatively correlated with the woodland communities.
The data derived from this study will be of great conservation value as the woodland communities found in the area are the remnants of the Afromontane forests in the Maloti Drakensberg region (Kotze & Lawes 2007;White 1978). These woodland communities act as the nutrients pump and seedbank to the surrounding ecosystems, whilst maintaining genetic and floristic diversity. Understanding the relationships between these woodland communities and environmental parameters is also essential in the biodiversity conservation of the grassland biome and their associated woodland communities.