Research-to-impact process

Outlined below are several new ideas and ongoing research projects in our lab. This research feeds into a careful cadence for the development and deployment of products built around the CoRE stack. See this manual as an example of research outputs for many datasets and indicators related to water security, and how the outputs are now being used by field partners. Much opportunity of using data and technology for climate action and environmental sustainability requires careful translational work from research to deployment, and therefore this process works well only through a collaboration between academia, engineers, and implementation organizations. With each day, we are perfecting this process so that it can become a model for translating research to impact at scale.

Skillsets: Most of these projects involve the use of machine learning on geospatial data and statistical analysis, but also span human-computer interaction and systems building for scaling computation. Interested students should write to me.

Last updated – Oct 2024.

 

Forests and ecology

These projects are being integrated into a broader product vision of guiding forest restoration projects to provide decision support and evidence for communities to manage their forests in a sustainable manner. We want to be able to help spot specific locations requiring restoration, then provide guidance in terms of which tree species to plant in these areas, and build a process to do this in consultation with communities to leverage local ecological knowledge.

Tree Species Biodiversity Estimation and Traits Classification - Shiva, Dhruvi

Biodiverse ecosystems enhance resilience, allowing forests to adapt to changing conditions, such as shifting temperatures and increased pests. A mix of species can improve carbon sequestration, and support healthy soil and water systems, which are vital for ecosystem stability. In this project we aim to monitor tree species biodiversity using satellite imagery. We use PCA and unsupervised clustering methods to determine high and low biodiverse hotspots in different eco-regions. We are also working on identifying traits such as leaf-shed duration and leaf size, to build a finer assessment of tree species occurrence.

Co-creation partnerships: ATREE, NCF.

 

Species Distribution Modelling - Jyotiraditya, Shourya, Dhruvi

Utilizing Species Distribution Modeling (SDM), the project evaluates various environmental and climatic factors that influence the distribution and habitat preferences of different tree species. The aim of this study is to support forest restoration planning and conservation efforts by recommending specific species in degraded or less biodiverse sites to improve conservation strategies across India's diverse eco-regions.

Co-creation partnerships: ATREE, NCF.

 

Forecasting Forest Fires in India Using Climate and Social Variables - Raman, Anaad

In India, large forest fires are usually natural, while small forest fires are typically caused by human activity. Both types of fires can be detrimental to forest resources, biodiversity, and communities that live alongside the forests. Although large forest fires are easier to detect through remote sensing data, small forest fires often go unnoticed. Furthermore, while efforts are made to detect forest fires, there is currently no model capable of forecasting both small and large forest fires. A reliable forecast could provide communities and forest authorities with valuable time to reach the fire site, control the blaze, and evacuate people if necessary. We are building models to detect and forecast both types of fires. Climate variables, such as latent heat flux and long-wave radiation flux, indicate the conditions necessary for a fire to occur. Meanwhile, human factors, including the extent of settlements, development, and the proximity of indigenous populations to the forest, reflect the likelihood of human-caused fires. Our goal is to model these variables to predict the probability of forest fires in specific regions.

 

Multi-resolution mapping of trees - TBD

We want to develop a methodology to identify a forest patch of a few dozen acres and profile it repeatedly through drone monitoring on a fortnightly basis. When coupled with geo-tagging of trees done by field workers, this will provide us with a detailed imagery of changes in tree characteristics such as leaf shedding, flowering, fruiting, etc. We will use computer vision techniques to automatically annotate the forest patch. And then use these annotations as groundtruth data to train models using lower-resolution satellite data to estimate the tree characteristics. This way, through drone-assisted groundtruth data collection in a small area, we will be able to characterize changes in large forest areas. We want to start doing this in the IIT Delhi campus where we have already done a detailed geo-tagging of trees.

 

Agro-ecology

These projects are meant to feed into a broader product vision to match locations appropriately with agroforestry and cropping systems to build climate resilience, improved livelihood, and food security, for local communities.

Plantation Suitability Scoring Model - Anoushka, Shiva, Dhruvi

Tree plantations can be economically beneficial for farmers but could be undertaken in areas that cannot meet the demand of water or other resources for the specific tree species. It is therefore important to understand the suitability of a region for the plantation of a particular tree species. This project aims to assess the suitability of putting up a plantation at a given site by assigning a score that aggregates multiple ecological and social parameters related to climate, soil, topography etc. Remote sensing information of variables like mean temperature, soil texture and NDVI are aggregated using weights and suitability classes defined by the study and by domain experts.

Co-creation partnerships: Say Trees, FES.

 

Cropping Diversity Assessment - Prakkhyaat

The importance of diversity in cropping patterns is well recognized, to allow for soil regeneration, better nutrition, and higher resilience from climatic factors or pest attacks. In this study, we used district-wise and season-wise crop production statistics, supplemented with Sentinel-2 spectral data, to estimate cropping diversity in an area. We use PCA and unsupervised clustering methods to process and analyze the data, and calculate biodiversity indices that can be traced over time.

 

Hydrology

These projects aim to improve the underlying datasets and methods built into the Know Your Landscape and Commons Connect tools for water security planning by communities.

Evapotranspiration Downscaling - Vidushi, Utkarsh, Shivani

In countries like India which have historically been reliant on rainfed agriculture, the increasing need of water for irrigation to support greater cropping intensity and shifts towards horticulture, has largely been supported through groundwater-based irrigation. Cheap electricity has enabled a rapid increase in borewells almost across the country, which to some extent has enabled more equitable access to water than other irrigation approaches like canals but has also led to groundwater stress in many regions. One way to indirectly estimate groundwater abstraction is to estimate evapotranspiration from cropping areas as a proxy for crop water consumption. Remote sensing has been used to estimate evapotranspiration but existing open data products largely have a low spatial resolution which is not adequate to support local decision making for water use. In our initial experiments, we used machine learning methods to downscale MODIS data (originally at 500m) outputs of evapotranspiration at 30m using meteorological variables from GLDAS, and land surface characteristics from Landsat as input features. We observed that our method was not able to accurately match in situ data but was able to successfully provide relative differences in evapotranspiration. As a further effort to build an accurate and scalable Indian ET product, we plan to conduct the next set of experiments using surface energy balance methods, land cover specific models, and crop specific models to estimate ET at finer scales.

Co-creation partnerships: WELL Labs.

 

Impact of New Waterbodies on Downstream Waterbodies - Ankit, Shivani

Our CSO partner, FES, has built a method to assess the groundwater recharge potential of a site to determine its suitability for water conservation structures like checkdams, trenches, and percolation tanks. During field testing of several of these outputs, experienced CSO field staff and volunteers indicated that in addition to assessing this recharge potential, it will be useful to also assess the runoff accumulation capacity for the proposed site. We are building geospatial algorithms that simulate rainfall and runoff on digital elevation maps to determine potential runoff accumulation in drought and non-drought years. Further, once new water bodies are constructed, they may reduce runoff accumulation in downstream water bodies. We have built a method to obtain a connectivity graph of water bodies, and then study how the new water bodies may alter the runoff flow.

Co-creation partnerships: FES.

 

Modelling Changes in Groundwater Levels in Shallow and Deep Aquifers - Badrinath, Shivani

Changes in groundwater levels can be correlated with changes in cropping patterns, cropping intensity, construction of groundwater recharge structures, and rainfall patterns, to understand the interplay between hydrological and social aspects impacting groundwater use. The Central Ground Water Board (CGWB) conducts seasonal measurements across the country of about 25,000 observation wells. The data is known to have several limitations though. The spatial resolution is too coarse to understand groundwater patterns at finer scales such as villages and panchayats which is the level at which decision making is done. The observation wells mostly measure water table levels in the unconfined shallow aquifer zones. Most regions in India have mixed aquifer systems and CGWB data often does not tally with ground reports where borewells are used to access groundwater from deeper or confined aquifer zones. The Atal Bhujal Yojana (ABY), a Government of India scheme, focused on improving groundwater levels in India, monitors groundwater levels in deeper aquifers using borewell data. We have implemented a water-balance method to estimate changes in groundwater levels at a micro-watersheds scale on a fortnightly basis. We use the precipitation data over a micro-watershed, subtract from it the runoff, and subtract further the evapotranspiration from the micro-watershed, leaving the balance as change in groundwater. The methodology currently does not segregate between groundwater usage in the shallow and deeper aquifer zones and does not take lateral flow into account. In ongoing research, we are using CGWB well data and ABY borewell data to build a machine learning based calibration model that controls for aquifer properties to get more accurate measures of groundwater changes in shallow and deep aquifers.

 

Identifying groundwater recharge and discharge zones - TBD

Depth to water in wells when measured from a sufficiently high density of wells in an area can reveal the underlying aquifer structure and contours of groundwater availability. This is especially useful to identify groundwater recharge and discharge areas, and guide the planning of water structures.

Co-creation partnerships: WASSAN.

 

Flood hazard mapping and forecasting - Vibhanshu

Floods affect millions of people globally each year, causing significant loss of life, displacement, and economic damage, with regions like the Kosi River Basin being particularly vulnerable. Accurately identifying areas that face flood hazards and forecasting future flooding events are both useful. We are using methods to detect transient surface water presence using Sentinel data to first build a hazard map of areas prone to flooding. Using this hazard map, we are then building machine learning models to predict new areas that could see floods in the coming fornight based on weather forecasting data. We are also trying to learn about streamflow data availability in different river basins to factor that as well for early-warning systems.

 

Impact assessment and social-ecological interactions

Good modeling can lead to valuable new insights for planning. These projects are meant to integrate into the Know Your Landscape and Commons Connect tools to conduct impact projections and compare different water security plans with one another.

Site Level Impact Assessment of Farm Ponds - Sanya, Anika, Ramneek

NREGA is a significant initiative aimed at promoting sustainable livelihoods by providing demand-driven employment, with a focus on natural resource management projects such as farm ponds. Farm ponds play a crucial role in water conservation, improving irrigation, and enhancing agricultural resilience in drought-prone regions. However, evaluating the impact of these projects has been challenging, as past studies have often relied on traditional survey methods and were limited to small regions, making it difficult to capture broader variability. This project utilizes Double Machine Learning (DML) on remote sensing data to improve the precision of impact assessments and better capture the Heterogeneous Treatment Effects (HTE) of farm ponds on crop yields. DML, combined with Difference-in-Differences (DiD), allows us to control for confounding factors and generate more accurate, region-specific insights. Initial results indicate a significant increase in crop productivity at certain treated sites, with further analysis expected to uncover unobserved factors influencing these outcomes. Moving forward, this approach could optimize NREGA's strategies, enhance policy decision-making, and provide more reliable projections for future impacts, particularly in understanding how natural resource management interventions can be tailored to different environmental contexts.

 

Landscape Level Impact Assessment of NRM structures for Water Conservation - Aryansh, Ramneek

Natural Resource Management (NRM) interventions such as farm ponds, check dams, trenches cum bunds, and so on, are believed to positively impact the crop yield and drought resilience in the surrounding cropping region either through supplemental irrigation or through direct improvement in soil moisture in the immediate neighbourhood of the intervention. Our recent study on assessing the site-level impact of farm ponds substantiates this hypothesis. However, at a landscape-level, NRM structures could have a negative impact on downstream regions as they capture the runoff which would otherwise have flown into and benefitted the downstream regions, or improved water availability due to the NRM structures could prompt an increase in groundwater extraction and leave less water for other cropping areas. In this study, we aim to study the impact of NRM structures on the zone of influence of downstream water bodies and on cropping areas outside the zone of influence of water bodies by leveraging remote-sensing data products and employing the Difference-in-Differences and the Double ML methods. We are currently building models to estimate the zone of influence around water bodies and other NRM structures as a function of their size. Our preliminary tests confirm the hypothesis that the soil moisture around NRM structures decreases with increasing distance. The next step in the framework is to identify sites that are similar to treated sites but did not receive the intervention (counterfactual sites), and then compare the outcomes (vegetation indices) in their zone of influence.

 

Co-evolution of Economic Wealth and Environmental Sustainability in Indian Villages - Badrinath, Mrunal

The Government of India conducts nationwide surveys on socio-economic, health, and education indicators. However, these surveys are time-consuming, conducted at variable frequencies, and have significant delays in delivering results. Our project aims to address this by developing a Relative Wealth Index (RWI) using satellite data as an indicator to quantify the socio-economic status of villages. Additionally, we are working on identifying sustainability factors to assess how villages can develop economically while being environmentally sustainable as well. We are currently analyzing changes in RWI of villages against environmental factors such as forest cover, agricultural land-use, and groundwater stress, with population caste division and climate variable as additional co-variates. We classify villages into forest-fringe, rainfed, or irrigated (via canal or groundwater) categories, and assess RWI and sustainability changes over time. Eventually we also want to build indicators of ecological pressure to identify areas that are especially vulnerable to ecological stress.

 

System Dynamics and Modeling of Microwatersheds - Anamitra, Shivani, Ramneek

The sustainable management of microwatersheds is critical for addressing long-term water security, particularly in regions reliant on groundwater for agriculture. However, understanding the intricate dynamics of these systems, which summarize climatic, hydrological, physiological, and socio-economic factors at the microwatershed level, presents a complex challenge. In this study, we leverage remote sensing data to model and analyze the dynamics of microwatersheds, using which "what if" scenarios can be constructed to evaluate the impacts of various interventions like the creation of water bodies or the building of NREGA structures. These interventions aim to reduce groundwater stress and facilitate a transition toward rainwater-fed irrigation systems. Our simulations incorporate various climate-induced scenarios to assess the influence of prospective interventions on water availability and system sustainability.

 

Simplifying the practice of socio-ecological sustainability on the ground

We believe that technology will positively impact communities only when they are able to understand and operate it themselves. Building upon our experience of having done this successfully at Gram Vaani with communities being able to use voice-based participatory media for social accountability, knowledge sharing, and community awareness, we now envision a similar model wherein community stewards from within local communities can use the data and technology outputs of the CoRE stack for better planning and ecological management of their landscapes. To do this, we need training and knowledge building tools for community stewards.

Leveraging LLMs for Landscape Assessment and Climate Action Recommendations - Priyadarshini, Shivani

The planning for NRM assets needs to be locally contextual at fine scales of micro-watersheds, characterized by variables including rainfall patterns, groundwater stress, soil types, terrain, etc., and described in easy ways for field cadre and community stewards to follow. Large Language Models (LLMs) can potentially play a role in automatically generating such descriptions using their comprehensive knowledge base and ability to process time-series data. We intend to generate LLM assisted landscape level descriptions by using data of various socio-ecological variables and a series of prompts. Given the landscape characterization, we then intend to explore whether LLMs are also able to recommend landscape specific NRM assets by using their own knowledge base along with data on MGNREGA assets that have been approved in different areas, region-specific CSO reports on water security, and recently prepared DPRs consisting of proposed water assets. We have started with investigating the ability of using LLM to interpret univariate and multivariate time-series of various socio-ecological variables.

Co-creation partnerships: Viksit Labs, Saarland University.

 

Land-use and land-cover classification

"You can only manage what you can measure" -- high fidelity monitoring of land-use changes is critical to understand the current state of affairs and to use the data to uncover causal pathways that would have led to the current situation. These projects improve and add new elements to the extensive monitoring already available on the CoRE stack.

Crop Classification using Remote Sensing Data - Pratham

Accurate crop identification is essential to monitor agricultural practices. We are developing a model that predicts crop types by leveraging vegetation indices obtained form satellite imagery. The methodology involves training machine learning models using historical crop data, remote sensing inputs, and feature engineering to capture relevant patterns. Initial results demonstrate promising accuracy in being able to distinguish between major crops in an area, such as to distinguish maize, paddy, and other cereals grown commonly during the Kharif season in Karnataka. This research paves the way for more efficient crop monitoring systems that can help determine sustainable agricultural management.

Co-creation partnerships: WELL Labs.

 

Identification of Ponds, Wells, and Plantations from Google Maps - Aatif

The Mahatma Gandhi National Rural Employment Guarantee Act (MGNREGA) has led to the creation of numerous assets across India, including ponds, wells, and plantations. However, there is a lack of comprehensive information regarding these assets' precise locations and conditions. Our study presents an approach to automatically detect, delineate, and geographically locate MGNREGA assets using satellite maps. We download high-resolution image tiles from mapping sources at various zoom levels and apply object detection and image segmentation models to identify these assets. We then plan to use this mapping to assess (a) Equity in access to these assets within and across villages among different social groups; (b) Impact of these assets on agricultural productivity; and (c) potential future benefits of new assets. By providing a scalable solution for identifying MGNREGA assets, our research offers valuable insights for rural development policymakers and researchers. The developed methodology complements efforts to ensure quality planning and long-term monitoring of assets created over the years.

 

Entropy-Based Differentiation of Agricultural Land from Scrubland Using High-Resolution Geo-Spatial Data - Raman

In India, single-cropped agricultural land and scrubland (also referred to as shrublands, wastelands, etc.) often exhibit similar spectral signatures, making it challenging to distinguish between them using traditional methods applied on remote sensing data. However, the recent availability of high-resolution data and advancements in computer vision applied to geo-spatial data have shown promising results in delineating field boundaries. Our investigation reveals that when these models are applied to both agricultural land and scrubland, they provide satisfactory delineation of fields while also segmenting scrubland into approximate blobs. To further differentiate these blobs from fields, we have developed an entropy-based method. This method operates on the assumption that agricultural fields will exhibit less variation in texture or display structured variations due to human activity, whereas scrubland, being naturally formed, exhibits more random and unstructured texture. This insight is central to our approach, which we aim to model and refine to improve our land use and land cover (LULC) classification.

Co-creation partnerships: WELL Labs.

 

Identifying and monitoring Orans - TBD

Orans are community-managed large tracts of land in arid areas in Rajasthan, where a delicate balance of water use for cropping, grazing of trees and grasses for livestock, and socio-economic development to meet the aspirations of the local communities, needs to be maintained. KRAPAVIS has surveyed over 100 Orans and identified key stress markers in each of them. But there are over 5,000 Orans in Rajasthan that need to be mapped and understood to be able to guide further development.

Co-creation partnerships: KRAPAVIS.

 

Scaling computation

All of the above work requires significant computation on large datasets - satellite imagery, remote sensing multispectral data, machine learning, groundtruth validation, error correction, etc. Managing long data flow chains and being able to run the algorithms at scale is crucial to make this vision a reality.

Optimizing Satellite Image Processing Workloads - Vatsal, Rahul

Planetary-scale computation on large datasets, such as those covering India, presents significant challenges in terms of efficiency, particularly for long pixel dependency tasks that current platforms like Google Earth Engine do not handle efficiently. The problem is exacerbated by limited GPU memory and the need for distributed processing to manage the substantial memory requirements of some tasks. To address this, we are developing a framework utilizing GPUs and efficient kernels, along with GPU-based Python libraries like rasterio, cupy, and cucim, to perform hydrological and environmental analyses more effectively. Initial benchmarks for our runoff script have shown promising results, with accuracy and processing times comparable to Google Earth Engine, demonstrating the potential of our approach. Future work will focus on implementing long pixel dependency pipelines for flow accumulation, developing a distributed setup to overcome memory constraints, and utilizing Popper for efficient workflow execution. This approach has the potential to transform large-scale environmental and hydrological analyses by significantly enhancing performance and scalability.