Outlined below are several new ideas for PhD topics aligned with the CoRE Stack. Interested students should write to me.

Last updated: Apr 2025

 

 

Multi-resolution self-supervised learning

Every little piece of data can potentially contribute new information. Hi-res RGB map imagery can help with very good but current land-use classification. Multi-spectral time-series from satellites is at lower spatial resolution but can provide useful information on land-use changes. Other factors like terrain, rainfall, temperature, soil, etc. contribute to unique vegetation classes. Self-surpervised learning through transformers or masked auto-encoder architectures of such multi-resolution data can thus embed both historical and current information about patches of land, which can be helpful in many ways. For example, to understand the characteristics of restoration sites which have been shaped by various historical events to appear as what they are. Or, to build a cropping history profile of a farm plot that can explain the soil health of the plot. Many such applications can be conceived by training models on long time-series of historical data.

 

Data flow management and scalable architecture for geospatial data processing

Building geospatial datasets and indicators can involve complex data flow pipelines that start with base datasets, on which algorithms including data-driven algorithms like ML models are applied, to produce new downstream datasets, on which further algorithms are applied, and so on. This results in a directed acyclic graph on which data and algorithm version control is critical to implement to manage the datasets. If a base dataset changes or an intermediate algorithm is updated, it should trigger a re-computation in impacted downstream paths. Similarly, some of these datasets are temporal in nature and need to be updated on a regular frequency. A PhD thesis that builds out standards and implements the architecture, potentially even generalizes it to operate over the web so that the data flow graph can span multiple data hosting and computation nodes, and implements scalable versions of many algorithms that can leverage GPUs, will be a very relevant contribution.

 

Factoring climate projections in natural resource management and planning

Various modeling projects such as the ones outlined in ongoing work currently do not take climate projections into account. We want to build enhanced versions of these methods that react to climate projections, especially of extreme weather events, and factor their impact into their recommendations for planning and management. In addition, recent research outlines the effect of land-use changes on climate patterns such as the role that cropping cycles, forest restoration, and urbanization and air pollution can play in shaping climate patterns and highlights the need to consider feedback loops when planning at large geospatial scales. A useful model would be one that can operate at multiple levels to deem local action at regional scales.

 

Computationalizing a relational model for ecological management

Recent research in especially forest ecological systems has highlighted the complex relationships that exist between different trees, tree species, and other flora and fauna. Many of these relationships have been published in research papers but many are uncovered, however, some may be predictable based on observed species behavior. First, we can use the advances made in LLMs to extract these relationships automatically into a structured database. Second, we can use remote sensing and other data to track forest ecosystems and use these observations to predict new relationships, much like how new chemical discovery is done. Third, these relationships can be used to build better forest restoration plans and go beyond some of the ongoing work in our lab.

 

Understanding agroecology through extensive instrumentation

Imagine if you could do repeated drone runs over farming plots to observe, right from the time of field preparation, the extent of tilling that was done and ridge and furrow spacing that was maintained, to monitoring different growth stages of the exact time when flowers appeared, fruits and seeds appeared, and finally when harvesting was done, followed by whether post-harvest field treatment like mulching was done. Additionally, if you had sensors that regularly reported the soil moisture, air temperature, soil temperature, etc. plus soil health in terms of acidity, NPK, soil organic carbon, etc. If such data could be collected and matched with remote sensed satellite data then detailed crop growth models can be created and simulated to predict cropping yield from satellite data. This can be invaluable to provide precision advisory to smallholder farmers and help them make crop insurance claims without having to deploy expensive sensors in their fields.