Satellite image classification using Vegetation, soils, and water indices
This article will help readers understand satellite imagery analysis in a broad context and the tools used are easy to learn and apply. For a broader understanding of Bands, we will use 2 datasets. One with a smaller number of bands but high-quality files, the other one with a higher number of bands but lesser-quality files.
Contents of the blog:
- Datasets 1 and 2
- Data Analysis
- Vegetation and Soil Indices
- Water Indices
- Geology Indices
- Land cover classification (Using Unsupervised learning)
- Light BGM and PCA
- Conclusion
Data
Dataset-1(Sundarban Imagery)
The Sundarbans, a region formed by the confluence of the Ganges, Brahmaputra, and Meghna rivers in the Bay of Bengal, is truly a wonder of nature. Its expansive mangrove forests cover an area of approximately 10,000 square kilometers across both India and Bangladesh, with 40% of the region situated within India’s borders.
For simplicity, In this article, we are going to use a small part of Sundarbans satellite data.
The Satellite data has 954 * 298 pixels, 12 bands with the spectral resolution varying from 10–60 meters. The data can be downloaded using the below link.
Dataset-2
This dataset is not public from the source so the download link can’t be shared. But here are the data files,
Data Analysis(Importing Libraries and EDA)
We will use libraries such as EarthPy, RasterIO, matplotlib, and Plotly for Data Visualisation and Analysis to execute various operations on the Sundarbans data.
(Kindly note that I have imported more libraries than I needed to, this happened because I was experimenting with different datasets and methods)
Read Data
Let’s use Rasterio to read the 12 bands and numpy.stack() to stack them into an n-dimensional array. After stacking, the resulting data has the shape (12, 954, 298).
Visualize Bands
As previously stated, the data has 12 bands. Let’s use the EarhPy library to visualize each band. The plot_bands() method accepts a stack of bands and plots as well as bespoke titles, which may be accomplished by passing unique titles for each image as a list of titles via the title= option.
RGB Composite Image
These Sundarbans data contain a large number of bands ranging from visible to infrared. As a result, humans have a difficult time visualizing the data. Making an RGB Composite Image makes it easier to interpret the data. To create RGB composite images, plot the red, green, and blue bands (bands 4, 3, and 2, respectively). Because Python employs a zero-based index system, you must deduct a value of 1 from each index. As a result, the index for the red band is 3, the index for the green band is 2, and the index for the blue band is 1.
The Composite pictures we made can be dark at times if the pixel brightness values are biased towards zero. This type of issue can occur.
Histograms
Visualizing the bands of the hyperspectral image dataset aids us in comprehending the distribution of pixels/values inside the bands. The hist method from earhtpy.plot handles the work by plotting histograms for the bands in the dataset/stack that we previously created. We can also change the column size, title, and color of each individual histogram. Let’s look at the code for creating the histograms.
Vegetation and Soil Indices
Images that are calculated from Multi-Spectral satellite images are called normalized satellite indices. These images minimize other factors that diminish the effects in the image while bringing attention to a specific phenomenon that is present. For example, a vegetation file will show sound vegetation as brilliant in the record picture, while undesirable vegetation has lower values and a desolate landscape is dim. The indices are created so that an object’s color is emphasized rather than its intensity or brightness because shading from terrain variation (hills and valleys) affects the intensity of images.
Normalized Difference Vegetation Index (NDVI) To determine the amount of vegetation present on a piece of land, scientists must observe the various colors (wavelengths) of visible (VIS) and near-infrared (NIR) sunlight that the plants reflect. The difference between near-infrared light, which vegetation strongly reflects, and red light, which vegetation absorbs, is measured using the Normalized Difference Vegetation Index (NDVI). NDVI generally goes from — 1 to +1.
Soil Adjusted Vegetation Index (SAVI)
The Soil-Adjusted Vegetation Index (SAVI) is a vegetation index that attempts to minimize soil brightness influences using a soil-brightness correction factor. This is often used in arid regions where vegetative cover is low.
SAVI = ((NIR - Red) / (NIR + Red + L)) x (1 + L)
The L value varies depending on the amount of green vegetative cover. Generally, in areas with no green vegetation cover, L=1; in areas of moderate green vegetative cover, L=0.5; and in areas with very high vegetation cover, L=0 (which is equivalent to the NDVI method). This index outputs values between -1.0 and 1.0. Let’s see the code for the implementation of SAVI.
Visible Atmospherically Resistant Index (VARI)
The Visible Atmospherically Resistant Index (VARI) is designed to emphasize vegetation in the visible portion of the spectrum while mitigating illumination differences and atmospheric effects. It is ideal for RGB or color images; it utilizes all three color bands.
VARI = (Green - Red)/ (Green + Red - Blue)
Water Indices
Surface water change is a very important indicator of environmental, climatic, and anthropogenic activities. Remote sensors, such as sentinel-2, and Landsat, have been providing data for the last four decades, which are useful for extracting land cover types such as forests and water. Researchers have proposed many surface water extraction techniques, among which index-based methods are popular owing to their simplicity and cost-effectiveness.
Modified Normalized Difference Water Index (MNDWI)
The Modified Normalized Difference Water Index (MNDWI) uses green and SWIR bands for the enhancement of open water features. It also diminishes built-up area features that are often correlated with open water in other indices.
MNDWI = (Green - SWIR) / (Green + SWIR)
The below code serves the purpose of implementing MNDWI and the output is shown below.
Normalized Difference Moisture Index (NDMI)
The Normalized Difference Moisture Index (NDMI) is sensitive to the moisture levels in vegetation. It is used to monitor droughts as well as monitor fuel levels in fire-prone areas. It uses NIR and SWIR bands to create a ratio designed to mitigate illumination and atmospheric effects.
NDMI = (NIR - SWIR1)/(NIR + SWIR1)
Let’s see the implementation and the output:
Geology Indices
Satellite imagery and aerial photography have proven to be important tools in support of mineral exploration projects. They can be used in a variety of ways. Firstly they provide geologists and field crews the location of tracks, roads, fences, and inhabited areas.
Clay Minerals
The clay ratio is a ratio of the SWIR1 and SWIR2 bands. This ratio leverages the fact that hydrous minerals such as clays, alunite absorb radiation in the 2.0–2.3 micron portion of the spectrum. This index mitigates illumination changes due to terrain since it is a ratio.
Clay Minerals Ratio = SWIR1 / SWIR2
Ferrous Minerals
The ferrous minerals ratio highlights iron-bearing materials. It uses the ratio between the SWIR band and the NIR band.
Ferrous Minerals Ratio = SWIR / NIR
Unsupervised learning
Unsupervised learning algorithms can be used for land cover classification, which involves categorizing different types of land cover, such as forests, crops, water bodies, and urban areas, in satellite imagery.
One approach is to use clustering algorithms, such as k-means or hierarchical clustering, to group pixels with similar spectral characteristics together. These clusters can then be assigned to different land cover classes based on visual interpretation or ground truth data.
Another approach is to use dimensionality reduction techniques, such as principal component analysis (PCA), to reduce the number of spectral bands in satellite imagery while retaining the most important information. The reduced dataset can then be classified using a supervised learning algorithm, such as a decision tree or random forest, or an unsupervised algorithm like clustering.
However, unsupervised learning methods may not always provide accurate results as they do not consider the spatial context of the pixels, and may group together pixels with different land cover types that have similar spectral characteristics. Therefore, it is often necessary to combine unsupervised and supervised learning approaches, along with ancillary data, such as topography and land use/land cover maps, to improve the accuracy of land cover classification.
PCA
K-means
Conclusion
This article introduces different methods such as data visualization and normalized vegetation, water, and geology indices and helps them learn Unsupervised learning to understand land cover classification to analyze Sundarbans satellite data using Python.