Data flow management and scalable architecture for geospatial data processing
Building geospatial datasets and indicators can involve complex data flow pipelines that start with base datasets, on which algorithms including data-driven algorithms like ML models are applied, to produce new downstream datasets, on which further algorithms are applied, and so on. This results in a directed acyclic graph on which data and algorithm version control is critical to implement to manage the datasets. If a base dataset changes or an intermediate algorithm is updated, it should trigger a re-computation in impacted downstream paths. Similarly, some of these datasets are temporal in nature and need to be updated on a regular frequency. A PhD thesis that builds out standards and implements the architecture, potentially even generalizes it to operate over the web so that the data flow graph can span multiple data hosting and computation nodes, and implements scalable versions of many algorithms that can leverage GPUs, will be a very relevant contribution.
Factoring climate projections in natural resource management and planning
Various modeling projects such as the ones outlined in ongoing work currently do not take climate projections into account. We want to build enhanced versions of these methods that react to climate projections, especially of extreme weather events, and factor their impact into their recommendations for planning and management. In addition, recent research outlines the effect of land-use changes on climate patterns such as the role that cropping cycles, forest restoration, and urbanization and air pollution can play in shaping climate patterns and highlights the need to consider feedback loops when planning at large geospatial scales. A useful model would be one that can operate at multiple levels to deem local action at regional scales.
Computationalizing a relational model for ecological management
Recent research in especially forest ecological systems has highlighted the complex relationships that exist between different trees, tree species, and other flora and fauna. Many of these relationships have been published in research papers but many are uncovered, however, some may be predictable based on observed species behavior. First, we can use the advances made in LLMs to extract these relationships automatically into a structured database. Second, we can use remote sensing and other data to track forest ecosystems and use these observations to predict new relationships, much like how new chemical discovery is done. Third, these relationships can be used to build better forest restoration plans and go beyond some of the ongoing work in our lab.