In recent years, more and more satellites have been sent into orbit. This inherently implies more space data traffic going back and forth, significantly increasing the chances of misconfigurations and misalignments when coordinating all this communication. As a result, radio frequency interference became a severe problem for satellite operators.
We designed an intelligent AI-based satellite signal processing system with Antwerp Space and ESA. This edge AI solution that can detect interferences requires specific AI algorithms that can retrain themselves autonomously onboard the satellites. Building solutions for the space industry requires a different approach, as these systems must work autonomously in space. We cannot just go up there if something is wrong. While working on this and several other space projects, we picked up some best practices that apply to space projects, but you could also use them for other AI projects. Let’s dive in.
We had to train an AI model to detect interferences for the satellite signal processing system. In this case, we had the unique opportunity that our partner Antwerp Space could generate as much data as we wanted.
As a rule of thumb, the more data you can feed your models, the better your results will be. This is true for all AI projects, and we have observed that our results got better in this project as we added more data. What we also noticed, however, is that the more data, the longer our experiments took to complete—the lead time of our experiments got so long (think of 48 hours for one experiment) that we lost some of the freedom to experiment away.
You should aim to find the sweet spot between enough accuracy and lead time. First, you should use one efficient representative dataset to do your experiments. When you have these results, you can finally use these insights and conclusions and then adjust the model to retrain it with all available data. This way, you can iterate and make model architecture adjustments much faster.
Moreover, continuing on the idea that your partner can generate and simulate their data, is the possibility to take control of the simulation process. It is incredibly convenient to have a part of the simulation in your hands so that you can model noise, for example. You can manipulate the data in any way you want. The more control you have over your data, the better you can augment it.
We have the necessary expertise in artificial intelligence. Our partners have space-specific domain knowledge. And we needed to build a bridge between these two worlds to deliver the best results possible.
As we worked on our model, we noticed we needed more information about the space industry to build the best possible solution. You can get to a solution in many different ways, but you need domain-specific knowledge to get there the most efficiently and quickly. Having the necessary information goes both ways. Your client might need to learn a lot about AI, but if they want to understand the solution, you should give them the necessary information to know why something does or does not work.
We think it’s essential to educate the client so they know what the limitations of a project are and which data they should deliver so you can achieve the most optimal results. Again, this goes both ways: in the end, we should have enough space knowledge to interpret the results to fine-tune the model correctly. When you don’t have the necessary knowledge, you probably won’t be able to identify why your model does not perform as well as you thought it would.
Imagine that your results aren’t as good as you thought they would be. As an AI engineer without domain knowledge, you might think your model is bad. But, if you had some expertise in the domain, you would know it’s a typical error. As explaining this with signal interference terms is quite complex, we have tried to explain this in an oversimplified way. Imagine you’re building a model that has to identify animals. You have a lot of domain knowledge — you can identify all animals yourself — and notice that the model can’t differentiate between lions and tigers. It does differentiate between lions and elephants, though. As an expert, you know that the difference between lions and tigers is minimal, so it’s logical this error is made.
To counter this problem, you can do many things. You can educate yourself or hire an expert. At Edgise, we are actively expanding our space knowledge and building bridges between the AI world and the space industry.
You should always get to know your data thoroughly. Learn about, analyze, and visualize it before you work. You shouldn’t just take the data from the client and expect it to be quality data from the get-go. As an AI engineer, you are responsible for verifying that your data is as qualitative and complete as possible in all aspects. If it isn’t, then you need to capture more data or capture it differently.
We get it. We are often eager to build the actual model instead of focusing on the data first. Focusing on bringing results quickly often overshadows the necessary beginning steps of getting the required data. However, in the end, you won’t get good results anyway if you don’t start where you should.
This case taught us that when you can simulate the data yourself, analyzing it is more straightforward. We asked ourselves if everything we needed was available in the simulated data, and if not, how we could add what we needed.
Or the well-known phrase: keep it simple, stupid. Keep this in mind when you start building your model's architecture. In this case, we realized it too late, but you shouldn't have to reinvent the wheel when considering architecture, as there are already many reliable, existing architectures. Find papers to see what other people are using to build their solutions.
We already made an architecture first, and in the end, we tested another existing architecture which worked almost as well. If we started the project all over again, we would choose something that had already proven its worth. Of course, we did not start from scratch as we already have a lot of experience building model architectures for different solutions. Still, as this is a different domain, other things could work just as well or even better.
There might be some discussion about this best practice, but automated reporting in Slack is a true game-changer. This way, you can set up a framework to log and analyze results more quickly and hop in if something is off.
For instance, we started an experiment on Friday afternoon as we knew it would take the weekend to run, but unfortunately, a bug caused it to stop. We only noticed this on Monday, so we lost much time. If we had automated reporting in Slack, we might have seen the issue sooner, as it is an app you might open a few times over the weekend. The barrier to checking your Slack notifications is much smaller than opening up your laptop and checking out the results manually.
Of course, Slack reporting is also efficient in keeping all stakeholders up-to-date. Think of a project manager wanting to check the results of the last experiments and always disturbing the AI engineer. Centralizing the results at the AI engineer would be a thing of the past with automated reporting. You could add your client to the channel if this is relevant.
It is a relatively small effort to set up this kind of automation. Moreover, this works well because we already used it for other projects, but unfortunately, we did not think of it immediately when we started. A lesson learned!
The idea of regression testing is that if you are still iterating the model, you set up a test flow to test the model from start to end. If the regression test fails, you know your model doesn't do what it should do. This way, you can iterate, so your model has no bugs, and you know you don't put anything with bugs into production. This is a necessary step if you deliver the model to your client, as, of course, you do not want to provide a model that isn't tested. Unfortunately, this was a common practice for only software in the past, but we believe models should be tested too.
In addition, every component in the machine learning pipeline should be delivered as a whole. This means pre- and post-processing are included in the AI model and not as separate parts. Imagine that a client updates their model without pre-processing and is surprised it doesn't work anymore… You want to avoid this scenario at all costs. So, deliver the whole pipeline and not just the model!
Remember our six best practices for bringing artificial intelligence to space, but also to other industries. Find the sweet spot between enough accuracy and lead time, try to gain some domain-specific knowledge, get to know your data, keep it simple, set up automated reporting in Slack, and test models as well as you would test software.
Contact us or check out our other insights.