Mapping alien and indigenous vegetation in the KwaZulu-Natal Sandstone Sourveld using remotely sensed data

The grassland biome is one of the most dominant land cover types on earth, covering approximately 52.5 million km2 (40.5%) of the global terrestrial landscape – without Greenland and Antarctica (Carlier et al. 2009; World Resources Institute 2000). In South Africa, the grassland is the second largest and most diverse biome, covering 16.5% of the country (Matsika 2007). In the KwaZuluNatal province, a significant portion of the landscape is characterised by the indigenous KwaZuluNatal Sandstone Sourveld (KZN SS), a veld type dominated by a diversity of short grass species, shrubs, legumes and trees (Rutherford et al. 2006). However, the veld has been steadily diminishing. Jewitt (2011), for instance, notes that only 11.4% of the KZN SS was in its natural habitat as at 2008. The eThekwini Municipality (Figure 1), which has the port city of Durban, is the most densely populated area within the province with a population of about 3.5 million on about 2300 km2 (eThekwini Municipality 2012). Increasing immigration, agriculture, land tenure challenges and radial growth from the city have increasingly put pressure on the city’s natural landscape (Roberts et al. 2012). Hence about 73% of the veld has been lost to agriculture and physical development. A paltry 116 ha (0.74%) of the veld is formally protected (eThekwini Municipality 2012).


Introduction
The grassland biome is one of the most dominant land cover types on earth, covering approximately 52.5 million km 2 (40.5%) of the global terrestrial landscape -without Greenland and Antarctica (Carlier et al. 2009;World Resources Institute 2000).In South Africa, the grassland is the second largest and most diverse biome, covering 16.5% of the country (Matsika 2007).In the KwaZulu-Natal province, a significant portion of the landscape is characterised by the indigenous KwaZulu-Natal Sandstone Sourveld (KZN SS), a veld type dominated by a diversity of short grass species, shrubs, legumes and trees (Rutherford et al. 2006).However, the veld has been steadily diminishing.Jewitt (2011), for instance, notes that only 11.4% of the KZN SS was in its natural habitat as at 2008.The eThekwini Municipality (Figure 1), which has the port city of Durban, is the most densely populated area within the province with a population of about 3.5 million on about 2300 km 2 (eThekwini Municipality 2012).Increasing immigration, agriculture, land tenure challenges and radial growth from the city have increasingly put pressure on the city's natural landscape (Roberts et al. 2012).Hence about 73% of the veld has been lost to agriculture and physical development.A paltry 116 ha (0.74%) of the veld is formally protected (eThekwini Municipality 2012).
The remnant KZN SS covers a portion of the municipality and anchors soil, contributes to carbon sequestration, sustains the agricultural sector, is a habitat for a variety of endangered species and is a source of traditional medicine (Martindale 2007;Rutherford et al. 2006).It is diverse in graminoids and herbaceous species and hosts an array of endemic plants that play an integral role in the Background: The indigenous KwaZulu-Natal Sandstone Sourveld (KZN SS) grassland is highly endemic and species-rich, yet critically endangered and poorly conserved.Ecological threats to this grassland ecosystem are exacerbated by encroachment of woody plants, with severe negative environmental and economic consequences.Hence, there is an increasing need to reliably determine the extent of encroached or invaded areas to design optimal mitigation measures.Because of inherent limitations that characterise traditional approaches like field surveys and aerial photography, adoption of remotely sensed data offer reliable and timely mapping of landscape processes.grassland's ecosystem (Rutherford et al. 2006).However, recent studies have shown that the KZN SS is threatened by encroachment of woody plants.Such encroachment is known to significantly compromise a grassland's quality and quantity; for instance, an increase in shrub or tree density may reduce grass biomass, density and cover (Briggs, Schaafsma & Trekov 2007;Van Auken 2009).Generally, species richness and composition is negatively altered as woody plants begin to dominate the landscape (Van Auken 2009).The problem is further exacerbated if the species are invasive (Lalla 2014).Hence, timely and cost-effective mapping of the indigenous and invasive encroachment is necessary for designing appropriate encroachment mitigation measures within the municipality's Municipal Adaptation Plan (MAP).Specifically, understanding the proportion and spatial configuration of alien versus indigenous woody encroachment is critical in discerning the severity and type of woody vegetation encroachment.However, within the municipality, spatial distribution of indigenous and alien woody plants within the KZN SS remain largely unexplored.
To date, studies on woody encroachment have commonly been explored using field surveys, field-based knowledge and interpretation and analysis of hard-copy maps and aerial photographs, amongst others (Shekede, Murwira & Masocha 2015;Yuan et al. 2005).However, although relatively accurate, these approaches require intensive field work and ancillary data analysis, which is labour intensive and susceptible to human error (Shekede et al. 2015).Furthermore, these approaches are often time consuming, and therefore impractical for large-scale implementation, and commonly lack the required geometric accuracy (Mansour, Mutanga & Everson 2012;Xie, Sha & Yu 2008).Remotely sensed data sets and approaches, in comparison to the abovementioned approaches, offer a more practical and economical means of classifying and quantifying vegetation characteristics and density (Mansour et al. 2012).The recent wide adoption of remote sensing approaches can be attributed to advances in sensor technology, proliferation of image data sets and advances in software and hardware capabilities (Rogan & Chen 2004).For instance, improvements in satellite-based remote sensing technologies have led to the acquisition of imagery characterised by finer spatial and higher spectral resolutions, necessary for improving classification accuracies (Adelabu, Mutang & Adam 2014).
A number of multispectral data (e.g.Landsat) have been adopted in land cover mapping on the basis of their long archival data (Odindi, Mhangara & Kakembo 2012).However, their lower spectral resolution (less than 11 bands) limits their capabilities to discriminate surface types characterised by subtle reflectance variation (Melgani & Bruzzone 2004).Munyati, Shaker and Phasha (2011)  Whereas higher spectral resolution is valuable for discriminating subtle reflectance variability on a landscape, it does not guarantee higher classification accuracy as this is further dependent on the classifier adopted (Lu & Weng 2007;Mountrakis, Im & Ogole 2011).A number of classifiers that include classical pattern recognition approaches like maximum likelihood, minimum distance to mean and nearest neighbour as well as discriminant analysis have been adopted for land cover mapping.However, Lu and Weng (2004) note that these classifiers are often unable to effectively deal with complex landscapes and the mixed pixel phenomenon, compromising mapping accuracy.To deal with these limitations, more advanced and robust machine learning classification algorithms have recently emerged (Sesnie et al. 2010;Watanachaturaporn, Arora & Varshney 2008).Specifically, the Random Forest (RF) algorithm has shown great potential in vegetation mapping (Adelabu et al. 2013;Lawrence, Wood & Sheley 2006;Naidoo et al. 2012).First developed by Breiman (2001), the RF is a non-parametric statistical algorithm that can handle discrete and continuous data sets and has been adopted in a wide range of remotely sensed data sets that include multispectral (Pal 2005), hyperspectral (Ham et al. 2005), LIDAR (Guo et al. 2011), synthetic aperture radar (Loosevelt et al. 2012) and imagery from aerial platforms (Chapman et al. 2009).A number of studies have adopted the algorithm to predict the occurrence of trees in grasslands and savannah landscapes.Naidoo et al. (2012), for instance, used the RF algorithm to classify tree species within the greater Kruger National Park in South Africa whilst Lawrence et al. (2006) assessed the capabilities of the RF algorithm in identifying alien invasive species in Montana, United States of America.Whereas the abovementioned studies have attempted to discriminate woody vegetation in grasslands using remotely sensed data, none, to our knowledge, has sought to distinguish the presence of alien from indigenous species within grasslands in a heterogeneous urban landscape.In this study, we aim to map alien and indigenous woody cover distribution within the KZN SS using the RF algorithm on RapidEye imagery.

The study area
The general area is located within a subtropical climate, characterised by hot and humid summers and sunny and mild winters.Most of the 762 mm per annum of rain is experienced during summer months (October-March) whilst winter months (April-August) are generally dry (Mucina & Rutherford 2006).Average midday temperatures of 27 °C and 21.6 °C are experienced in the summer and winter months, respectively.There is a relatively small difference between summer and winter temperatures.
Temperate and moist conditions because of underlying clastic sedimentary sandstone that allows for the percolation of water in the study area are ideal for grasses to flourish (Martindale 2007).
The KZN SS is a unique grassland ecosystem in South Africa, endemic to the KwaZulu-Natal province (Jewitt 2011;Mucina & Rutherford 2006).It is characterised by a matrix of short and diverse grasses, isolated shrubs and woody plants.The grassland often dominates plateau surfaces formed on Natal Group Sandstone resistant to erosion.Soils within the grassland, underlain by the Natal Group Sandstone, are often shallow, infertile and poorly drained (Mucina & Rutherford 2006).Patches of the KZN SS grassland are predominantly found in the south-eastern part of the KwaZulu-Natal Province (Mucina & Rutherford 2006).Within the eThekwini Municipality, patches of the grassland occupy some of the remnant natural landscapes in the western part of the area (Figure 1).The ecosystem is mainly threatened by commercial agriculture (mainly sugarcane and forest plantations), subsistence farming and urbanisation (Mucina & Rutherford 2006).A detailed description of the grassland can be found in Mucina and Rutherford (2006).

Image acquisition
To cover the entire study area, 12 scenes of RapidEye imagery, detailed in

Random Forest
The RF ensemble was used for data analysis and optimum parameters input in the EnMAP-Box to classify RapidEye imagery.The ensemble is a machine learning approach established by Breiman (2001) that facilitates better classification and regression of trees (CART) through a combination of a large set of de-correlated decision trees (Lin et al. 2010).It benefits from the powerful bagging and random selection process (Lin et al. 2010).In the classification process, the RF first builds several binary classification trees (referred to as ntree) based on a number of bootstrap samples with replacement extracted from the original observations.An input vector is assigned a tree in the forest.Every tree then provides a classification, regarded as a 'voting' for the class.The forest then determines the classification with the highest number of votes based on tree votes.Samples not accommodated in the bootstrap sample are referred to as out-of-bag (OOB) samples.These samples, typically about 37% of the entire data set, are often used to estimate the misclassification error and variable importance.Thereafter, at each subsequent node, a specific quantity of input variables (referred to as mtry), in this case canopy classes, are indiscriminately picked from a random subset of the features and an optimal split determined using the subset of the used features.To guarantee diversity amongst trees and therefore reduce bias, there is no pruning and all trees within the forest are allowed to grow maximally (Breiman 2001;Lin et al. 2010).
To improve the accuracy of the classification, the mtry and ntree require optimisation (Adam et     Source: Authors' own work Peters et al. 2007).Furthermore, it is relatively easy to interpret and implement.However, because the algorithm uses thresholds to separate classes, only vertical and horizontal boundaries can be used because one attribute is split at a time (Abdel-Rahman et al. 2014b;Breiman 2001).An extended description of the RF can be found in Breiman (2001).

Random Forest optimisation
Optimisation of the RF algorithm was achieved using the two parameters (ntree and mtry).This process was aimed at determining the optimal parameters required to achieve the most precise classification result (Abdel-Rahman et al. 2014a; Adelabu et al. 2013;Breiman 2001).The grid search optimisation procedure of the 10 cross-fold validation was implemented to achieve this objective.This approach is the most effective when dealing with few parameters.Furthermore, it is simple to perform, quickly executed and reliable as it considers parameters to be independent.The RF algorithm was executed on the training data to map the presence of alien and indigenous trees and the dominant tree species within the KZN SS grassland.

Map generation
As mentioned, a map of the presence and absence of indigenous woody plant cover was generated in the EnMAP-Box using RF output from RapidEye imagery.The spectral reflectance extracted using field data was used to separate and classify classes at each node.A second and more detailed map showing the dominant indigenous and alien trees was then produced by executing RF using the training data (n = 106) and occurrence of alien and other dominant species mapped individually.This was necessary to illustrate the presence of indigenous and alien woody plants.

Accuracy assessment
Accuracy assessment is paramount in any classification process.In this study, a 70% subset (n = 110 for the first data set and n = 106 for the second data set) of the observed data was used for training and modelling.The remaining 30% (n = 49 for the first data set and n = 46 for the second data set) was used for model validation.A confusion matrix was used to determine the absence and presence of alien vegetation and the producer's, user's and overall accuracies.Producer's accuracy indicates the probability of a reference pixel being classified correctly.It is determined by dividing a category's number of correct pixels by the category's reference data (Congalton 1991).The user's accuracy indicates the probability that a pixel classified on the map represents that category on the ground.It is determined by dividing the correct number of pixels in a category by the pixels that were actually classified in the category (Congalton 1991).Overall accuracy is the summation of the number of pixels correctly classified divided by the total number of pixels (Congalton 1991).
A quantity and allocation disagreement suggested by Pontius and Millones (2011) was used to determine the classification accuracy.The quantity and allocation of disagreement is defined by Pontius and Millones (2011) as the number of variant pixels in the reference map output in comparison to related maps of imperfect proportions of categories.An extended description, including formulae, can be found in Pontius and Millones (2011).

Random Forest optimisation and classification
The mtry and ntree values which produced the least error were selected to classify the presence of woody alien and indigenous plants in the KZN SS after the grid search optimisation procedure.The default number of trees (ntree = 500) did not yield an appropriate error.Therefore, the ntree was increased to 5000.A value of 2 for the mtry produced the smallest OOB error of 18%.Generally, based on the error matrix for the training and test data, the RF algorithm yielded relatively good accuracy (86%) when mapping the existence of indigenous and alien woody cover (Figure 2).These general classes were further classified into dominant indigenous and alien species within the KZN SS.A relatively good accuracy (74%) was achieved when mapping the indigenous species and alien woody cover.

Discussion
To date, a number of studies (e.g.Belluco, Camuffo & Ferrari 2006;Govender et al. 2008;Thenkabail et al. 2004) have demonstrated the superiority of hyperspectral sensors in discriminating tree species.This is attributed to their higher spectral resolution, which allows for discrimination of subtle variation in vegetation canopy.However, constraints such as cost, availability and huge data dimensionality have limited their wide adoption.Results in this study have shown that the RapidEye imagery, characterised by fewer but strategically positioned bands like the blue and the red-edge bands, provides a viable alternative to hyperspectral imagery in determining the distribution of alien and indigenous woody cover.
According to Gilmore et al. (2008), vegetation spectral characteristics and species separability is determined by leaf pigmentation, water content, leaf size and leaf structure, amongst others.The additional red-edge band available on the RapidEye sensor has been shown to be sensitive to subtle variability in these characteristics, hence valuable for discriminating plant species (Cho et al. 2012).The spectral configuration and improved spatial resolution that characterise new generation sensors like RapidEye offer new opportunities in land cover mapping (Govender et al. 2008).Specifically, the increased number of bands in these sensors facilitates the discrimination of surface objects with subtle reflectance variation.Novack et al. (2011), for instance, showed an improved classification accuracy using new generation sensors with additional strategically positioned bands in comparison to the Quickbird-2 sensor (with four traditional bands) on an urban landscape.The red-edge band is particularly valuable in discriminating vegetation classes and species (Safri, Salleh & Ghiyamat 2006).According to Cho and Skidmore (2006), the red-edge position is similar amongst different vegetation species and is sensitive to variation in chlorophyll content and internal leaf structure.
As plant chlorophyll content and internal leaf structure vary between species, the band has shown great promise for interspecies and intra-species discrimination, determination of biomass quantity, stages of crop development and plant health (Gitelson & Merzlyak 1994).In this study, over 86%    overall accuracy was achieved for the major vegetation categories (i.e.grass, indigenous trees and alien vegetation) whilst over 68% overall accuracy was achieved for the dominant species.Although the influence of the red-edge on classification accuracy was not tested, the reliable classification accuracy achieved is consistent with Adelabu et al. (2013) who established a 78%-80.25%and 85%-88.75%accuracy when the red-edge band was excluded and included in the classification, respectively.
A suite of traditional classification algorithms has been developed for remotely sensed imagery.However, amongst the existing algorithms, the maximum likelihood and minimum distance to mean approaches have been popular  Waske and Braun (2009), who attributed RF's high classification approach to its nonparametric approach on handling remotely sensed data.These findings are also consistent with Naidoo et al. (2012), who achieved 88% accuracy in classifying woody species within the Kruger National Park's savannah vegetation using the algorithm.Watts et al. (2009) note that, unlike existing statistical approaches, RF has no distributional assumptions on input data set, is not affected by over-fitting, has the ability to handle unbalanced data sets and is computationally efficient.Therefore, non-parametric approaches like RF are a viable option for generating robust classifications from complex image brightness, and are suitable for highly multifarious landscapes (Ghimire, Rogan & Miller 2010).The high classification accuracy achieved in grass can be attributed to its greater spectral variation from woody species, thus reducing the probability of misclassification (Abdel-Rahman et al. 2014a).
Results in this study showed high (86% and 88%) individual accuracies for alien and indigenous classes, respectively.Spectrally, the two classes are relatively similar, particularly their reflectance at the green and NIR sections of the electromagnetic spectrum.However, as mentioned, the higher classification accuracy can be attributed to the sensor's higher spatial and spectral resolution, which includes the red-edge band that facilitates the discrimination of species (Oldeland et al. 2010).Relatively good accuracies, ranging between 68% and 95%, were achieved for dominant vegetation species (Figure 3).In land cover mapping, a 70% classification accuracy is often considered ideal (Thomlinson, Bolstad & Cohen 1999).In our study, we acknowledge that the 68% classification accuracy for Lantana camara was below the required threshold.We attribute this to its indistinct spectral characteristics that varies with growth stages, causing spectral confusion with other species.Furthermore, Lantana camara is often found in heterogeneity with other vegetation types, causing confusion.Nevertheless, we consider the 68% classification accuracy valuable for initial screening for areas with Lantana camara invasion.Generally, these findings are consistent with Adelabu et al. (2013), who discriminated five indigenous and exotic tree species with over 70% overall classification accuracy and 10% allocation and 3% quantity disagreement scores.
Larger patches of the indigenous and alien species are predominantly found in the western part of eThekwini Municipality.Furthermore, this study shows that there is significant indigenous and alien plant species cover within the KZN SS.This study therefore offers valuable insight into the distribution of woody and alien vegetation within the grassland, valuable for management and optimisation of the grassland.
The aim of this study was to map woody vegetation in the KZN SS using satellite remote sensing.The results of the study have shown that: • RF was successful in discriminating between the two types of woody cover, with an overall accuracy and individual accuracy above 80% • RF was successful in differentiating between five tree species (both indigenous and alien) with individual accuracy values above 68% • low quantity disagreement and allocation disagreement scores asserted the robustness of the model.
Overall, the results of this study have shown the importance of new generation sensors in mapping and discriminating the distribution of trees on a landscape scale.Specifically, this study provides an understanding of alien and woody vegetation encroachment within the KZN SS, valuable for optimising relevant conservation management plans for restoration or maintenance of the grassland's ecological integrity.

FIGURE 2 :
FIGURE 2: Indigenous and alien trees in the eThekwini Municipality using Random Forest

FIGURE 3 :
FIGURE 3: User and producer accuracies obtained from the Random Forest classification algorithm for dominant cover types.

Table 2 )
Adelabu, Mutang & Adam 2015;Henderson et al. 2005;Rogan et al. 2008d data set, totalling 153 samples, was also collected in the field where the canopy cover of the five dominant woody species was recorded.The indigenous species considered in this study were Syzygium cordatum, Millettia grandis and Strelitzia nicolai whilst the alien species were Lantana camara and Eucalyptus grandis.A random 70/30 split (Tables2 and 3) was applied to the two data sets to generate training and test data.In comparison to other combinations, a number of studies (e.g.Adelabu, Mutang & Adam 2015;Henderson et al. 2005;Rogan et al. 2008) noted that the 70/30 data split yields the highest classification accuracy and the lowest standard deviation.Co-ordinates were then taken at sampled plots and reflectance extracted from the five RapidEye bands using zonal statistic, a spatial tool within the Spatial Analyst toolbox in ArcGIS 10.3.1 (Environmental Systems Research Institute 2016).
Fraser, Abuelgasim and Latifovic (2005)cted in September 2014.Stratified purposive non-random sampling was used to select stands dominated by grass, indigenous plants and invasive alien species.This approach was adopted to ensure a representative sample.Using visual categorisation and estimation recommended byFraser, Abuelgasim and Latifovic (2005), a 10 × 10 m quadrat was established and grass, indigenous and invasive cover estimated and recorded.The 10 × 10 m quadrats were used to ensure that the 5 × 5 RapidEye pixels were accommodated within sampling sites.Stands with over 75% indigenous canopy cover were classified as indigenous whilst those with over 75% alien cover were categorised as alien.A total of 65 alien, 74 indigenous trees and 20 pure grass stands were sampled (
Source: Authors' own work

TABLE 2 :
Training and test data of indigenous and alien plots collected within the KZN SS.

TABLE 3 :
Training and test data of dominant indigenous and alien tree species collected within the KZN SS.