May 31, 2023
InterviewIntroducing a new series of conversations with people in the intersection of AI and EO
Berend Schuit is my colleague and a PhD candidate at SRON Netherlands Institute for Space Research, and studies methane as a part of SRON’s Earth programme. My own research is directly based on the work he has done for methane plume detection. He helped me get up to speed on the challenges associated with detecting methane emissions using machine learning. This means he has loads of experience answering my questions, so I was excited to talk to him!
How did you join this research group?
I completed the Spaceflight master’s at TU Delft. During the course “Planetary Sciences”, where we learn about measuring properties of exoplanets and moons, we had a guest lecture by Prof.dr. Ilse Aben [Senior Scientist & methane expert at SRON] about detecting methane from space.I had a prior interest in ML and AI and a personal interest in climate This led to my master’s thesis project at SRON, studying the detection of methane plumes. I was very excited about this project because it unites those two interests with the topic of my master’s degree, I am happy that I am now able to continue my work as a PhD candidate. I now work on finding locations of large methane emissions by looking for plumes.
Can you tell me about you research?
I work with data from the TROPOMI instrument, which was launched in 2017. This instrument was initially intended to obtain long-term averages of gas concentrations above large regions. We also expected to be able to see some oil & gas leaks. Indeed, the quality of the data is so high that it is possible to see large methane plumes even in individual measurements. This is achievable even though the pixels cover a very large area, 7x5 km.
A lot of data is available. If you know a location with likely large emissions, like an oil or gas plant, it is very easy to go through all the data of the past 5 years and find exactly what you need. TROPOMI also gives us the potential to monitor the whole Earth, including locations we are not aware of yet. An example would be locations that do not have constant emissions. The disadvantage is that you would need to manually inspect all of the data, and that would take an incredible amount of work.
The goal of my research is to use the existing TROPOMI data to train a machine learning model that can automatically sift through the data to recognise methane plumes and distinguish real plumes from other signals that may look like methane plumes. The model can then be used to find plumes in the new data that is continuously coming in.
How do you use ML in your work?
In contrast to a case study, we want to find locations with high methane emissions which we do not know about yet. There simply is too much data to do this manually. We can use ML as a tool to efficiently find the data we’re interested in with minimal human time investment.However, to achieve that we first need to have a labelled dataset to train the model. It was a lot of work to create a dataset that’s usable for ML.
How did you learn about ML?
I took an elective course on Neural Networks at LIACS [the computer science department at Leiden University, employer of yours truly], because I was interested to learn more about neural networks and machine learning. But that was just a single course, so apart from that I read a lot online. The course gave me a base and taught me how to look for the information I needed.
Where do you find information about ML?
Mostly just googling, I don’t rely on a specific source. The documentation of Keras (link) and Tensorflow (link) is really good, and there are plenty of examples to be found of people who have done a similar thing for different applications. Apart from that, I watched online lectures that I found interesting, and I read a number of papers.
What is your biggest struggle with using ML?
One of the biggest challenges I encountered was related to the training data. The first version of my model struggled with classifying a lot of unseen images because they were different from the ones in the training set. For instance, the training set initially included fairly clear examples of images with plumes, or images without plumes. However, in the real data, there were many images that looked like plumes or were classified as plumes, but where something else was going on. Along the way, I found more of these challenging instances and was able to make the training set more representative. It was a very time-intensive process to build and improve this dataset.
What is your favourite thing about ML?
I think the coolest thing about ML is the potential to fully automate a task that is actually really complex. The task of labelling methane plumes is already challenging for humans. You need quite a lot of domain knowledge to accurately classify these images. What’s great about ML is that it makes it possible to automate this task. All you need is a CNN and enough training data. Models trained on existing data can be applied to new data surprisingly well, with pretty accurate results. It’s awesome that you can develop a monitoring system for a task that you initially thought would be too challenging to automate.
What are you missing to use ML better in your work?
More training data, which is always the problem, of course. Obtaining good data is an aspect of ML that just takes a lot of time. We do have that now, but that’s just for a specific problem and satellite. For a different problem, you will need to build a new dataset. Creating my methane plumes dataset really took up the most time in the research project, more time than developing the model. I expect this will be the same for my future research projects.
How do you see the future of ML in EO?
I think the use of ML in EO is going to increase a lot. If you look specifically at methane, the whole research field is really new. Most satellites produced only a few measurements per month before the launch of TROPOMI. Now we have daily global coverage. There is a lot happening with high-resolution satellites, and there will only be more methane satellite launches in the mid-term future. This is amazing news, but the amount of data coming in daily will increase rapidly, and it would be really difficult to analyse all of this new data manually. ML could be a solution to process all of this data. Another big advantage of using ML is that the model can be online 24/7 and alert you when it finds something interesting. This will allow you to detect large emissions quicker. People can’t do that. I think ML will be instrumental in both cases, with the growing amount of data and quicker flagging. It will help us keep making progress in methane emission monitoring.