SOUTH AFRICA
![]()
Astronomers, says Professor Mattia Vaccari, “over the last few years have found themselves in the unenviable position of moving from a relatively small data problem to a very big data problem – perhaps in the fastest way ever experienced by a science domain”. AI is crucial to handling massive amounts of data and astronomers are preparing for a data tsunami as new radio telescopes come on stream.
The data challenge for astronomy was driven by advances in cameras and electronics that make it possible to store petabytes of data. Around a decade ago, artificial intelligence began to be deployed to make sense of the burgeoning data, said Vaccari, an Italian-born astroinformatics research professor and director of eResearch at the University of Cape Town, South Africa.
Today, AI is indispensable to astronomy and astrophysics, with their ‘blue skies’ research.
Astronomical telescopes are often best located in the southern hemisphere, from where many of the most interesting or important astronomical phenomena may be seen. This comparative advantage spawned South Africa’s long-standing and major research into the universe.
An enormous increase in astronomy research infrastructure available to South African researchers has been underway for the past decade, alongside the ongoing development of cutting-edge multi-wavelength systems that are aimed at revolutionising understanding of the universe. The telescopes are located in South Africa, under international partnerships of countries that share the funding and research.
Thus, Vaccari told University World News: “Data is a much bigger problem we’re going to face in a few years time. That’s what we are trying to prepare for.”
Cutting edge multi-wavelength telescope systems
The Southern African Large Telescope (SALT) has been in full operation since 2011 and is the southern hemisphere’s largest single optical telescope, with a 11.1 by 9.8 metre mirror. It is at the South African Astronomical Observatory near Sutherland, and is funded by an international consortium including India, Poland, South Africa, the United States and Britain.
The MeerKAT radio telescope comprises 64 radio dishes spread over eight kilometres in the Northern Cape. It was completed in 2018, enabling deep surveys of the sky at radio wavelengths. Its observations look among other things at galaxy formation and dark matter.
MeerKAT is the precursor to the Square Kilometre Array (SKA), an international mega-science project that will be the world’s largest radio telescope when completed in 2028. It will consist of many hundreds of antennas across sites in Southern Africa and Australia, with the SKA Observatory global headquarters in the United Kingdom. SKA will be the leading infrastructure for radio astronomy globally.
Yet another major telescope astronomy project is the Vera C Rubin Observatory in Chile, funded by the American government with headquarters in Tucson, Arizona. From next year, Rubin will kick off a 10-year Legacy Survey of Space and Time (LSST) designed to investigate four areas: probing dark energy and dark matter; taking an inventory of the solar system; exploring the transient optical sky; and mapping the Milky Way.
“When switched on, the LSST will find a million transients in the sky every night. How do you make sense of that? It is only possible using machine learning and AI,” said astronomy Professor Patrick Woudt, interim director of the Inter-University Institute for Data Intensive Astronomy (IDIA).
Data intensive astronomy
The institute came about because of these astronomy telescope developments. IDIA’s goal is to build within the research community the capacity and expertise in data-intensive research to enable South Africa’s leadership of the large SALT, MeerKAT and SKA projects.
“The realisation was that we need to have an infrastructure that can process and manage the tremendous amount of data coming from these telescopes in a way that facilitates researchers to work with it,” Woudt told University World News. IDIA has supported researchers across the world, but particularly in Africa and South Africa, to work with these amazing data products.
“‘Ilifu’ (the Khosa language word for cloud) is a federated cloud computing resource that’s set up in collaboration with other universities. No single university can do this alone,” said Woudt. The partner universities are Cape Town, Western Cape and Pretoria, with input from the South African Radio Astronomy Observatory and others.
The IDIA platform serves astronomers as well as other researchers, such as bioinformaticians, and for this reason South Africa’s Department of Science and Innovation supported setting up a broader infrastructure that services the needs of multiple research communities.
Researchers run algorithms on the platform, said Woudt, “enabling the discovery of new space phenomena through the data science, machine learning and artificial intelligence tools that we have at our disposal. It is where researchers challenge their data with their models and through machine learning and other tools”.
He gave an example of the massive data challenge, an image produced of a supermassive black hole with a shadow around it. This was made by combining telescopes all around the world, and was highly data intensive – this is the speciality of IDIA partner the University of Pretoria.
“Very important is our focus on postgraduate training,” Woudt continued. “We teach a module, data science for astronomy, at postgraduate level. We also use the IDIA ilifu cloud-based platform so that the students have first-hand exposure to what it’s like to work in that kind of environment. So, IDIA is for researchers, but it’s also for training the next generation of researchers.”
Dr Sally Macfarlane is associate director of development and outreach at IDIA. The scale of the data that is and will be coming through the MeerKAT and SKA, she told University World News, is “so unfathomable that we’ve had to develop new systems to be able to handle it”.
As researchers are developing these new systems, they are also developing data skills that have uses beyond astronomy. Among other things Macfarlane looks at how to take these skills and use them in other research areas, as well as to empower the next generation and the public.
For example, she said: “It is pretty amazing how you can apply these skills to the United Nations’ Sustainable Development Goals.” The institute organises initiatives such as hackathons and other short, intensive data courses that are run for different target groups.
AI and blue sky
Vaccari painted a big picture: “Some of the problems we face are ridiculously naive and simple, but they involve some of the most powerful software and hardware technology that humans have developed. It’s an interesting, and humbling, experience, if you think of it. Astronomers are asking pretty boring questions, such as whether a given new star ‘switched on’, but of an incredible amount of data. Our challenge is to find a way to focus on what is interesting – to be able to find the ‘needles’ in an effective manner.”
Soon, said Vaccari: “we will be able to observe most of the sky most of the time, repeating observations every few nights. So, obtaining a snapshot of the sky every few nights in order to see whether there are things that ‘go boom’ in the night and follow them up, because exotic, rapidly exploding objects are very interesting and also point to interesting new physics.
“Exploring images of the full sky with billions of objects on a routine basis, every week, is obviously impossible to do with the naked eye, or with conventional methods because they’re not fast enough. AI is a way to speed things up by a massive factor”.
For instance, researchers developed algorithms that put galaxies into different classes and allow their properties to be studied separately. “We are reusing the very same algorithms in order to tell different galaxy shapes from each other,” said Vaccari.
Discovering phenomena such as supernovas that pop up rapidly, is another problem that AI is well suited to solve. And last but not the least, being able to discern patterns in data allows researchers to identify very peculiar sources.
Astronomaly – Machine-learning discovery
One example comes from Dr Michelle Lochner, a rising star in the world of astronomy based at the University of the Western Cape and South African Radio Astronomy Observatory. Indeed, she agreed, machine learning “is changing the way science is done and is quickly becoming the tool of choice for handling masses of incoming data from current and future telescopes.
“It is only with advanced machine learning technology that we can cope with the data deluge,” including from the Vera C Rubin Observatory and SKA, she told University World News. She is also a South African principal investigator for the Rubin Observatory.
“This incredible influx of data represents both a challenge and an opportunity: groundbreaking scientific discoveries are almost guaranteed if the ‘needle-in-a-haystack’ problem can be solved. Anomaly detection is an active area of research in machine learning,” she said.
Lochner made use of machine learning and AI tools to build an automated anomaly detection framework, which is publicly available, called Astronomaly.
“Astronomaly uses several machine learning algorithms, including cutting-edge deep learning tools to automatically simplify complex data into features as well as machine learning-based anomaly detection algorithms such as isolation forest, to detect rare or never-seen-before sources in astronomical datasets,” Lochner said.
“Astronomaly uses a web-based frontend to obtain input from a human user to improve its anomaly detection capabilities, a branch of machine learning known as active learning. As such, Astronomaly is the first recommendation engine for astronomical anomalies and homes in on interesting sources to present to scientists in a similar way to the recommendations given by popular streaming services,” she added.
Astronomaly, Lochner continued, has been used to find strong lenses and merging galaxies among over four million optical images of galaxies, flare stars and variable stars among tens of thousands of optical time series data and most excitingly, a new type of radio galaxy which could be the remnant of a supermassive black hole merger.”
This last object, called SAURON – a Steep and Uneven Ring of Nonthermal Radiation – was discovered in data from MeerKAT. “The dataset was not particularly large and yet SAURON was not detected by the human scientists working with it. This amazing discovery highlights the potential of combining the raw processing power of machine learning with the experience and intuition of a human scientist,” said Lochner.
University support for AI-related research
IDIA is strongly linked to the eResearch initiative at the University of Cape Town, which supports researchers to use big data and AI to accelerate their research.
The UCT eResearch initiative has been around for a decade. It falls under the deputy vice-chancellor for research and internationalisation, and is a partnership between the university’s ICT services, libraries and research office – all of which have developed IT-based services for researchers.
“The idea was to have a very strong grounding in the research community, and have a two-way communication with researchers,” said Vaccari. “Researchers are often the ones who find out about new tricks and slowly these become accepted and used by the large part of the community. Eventually they become part of standard university support.
“It’s a discovery space where you try and keep your ears on the ground. What is happening both on the industry side of things, because obviously now the cloud industry and the computing industry is big business, and also on the research side,” he explained.
Vaccari said eResearch runs a forum where researchers can share expertise, experience and opportunities for students. “We organise workshops that allow people to start learning about AI, particularly for those who do not have run-of-the-mill courses happening in their faculties. This has been rather helpful. We’ve partnered with other organisations for machine learning and AI algorithms courses in the university for both students and researchers,” he said.
While eResearch engagements are quite informal in nature and mostly geared towards bringing people together and providing a platform where they can read about the latest developments and resources available for training, it also has several initiatives underway, for instance around AI and ethics and in the field of bioinformatics.
However, unsurprisingly, the computing infrastructure at Cape Town is dominated by astronomy and also bioinformatics – the hybrid science that develops methods and software tools to support multiple areas of research across scientific disciplines.
“We found lots of synergies in the data processing and data storage needs between astronomers and bioinformaticians. They’re both big data sciences,” said Vaccari. Astronomy is ‘blue skies’ research while bioinformatics is applied research that is crucial for the livelihoods, health and future of South Africa. They collaborate and work well together, and use similar tools.
Searching around the university, the eResearch unit found “a small but very productive, very well researched pool of staff who are doing hardcore AI research. Developing their own software and leading research in the interpretation of AI algorithms and the interpretability of AI results”, said Vaccari.
But there are also many researchers who dabble in AI, often working with research collaborators in other institutions and countries. “They have interesting questions they would like to ask of their data. They needed a statistical and interpretative toolbox, but that is difficult to acquire,” added Vaccari.
“So we focused on this large long tail of researchers who do not have the technical expertise, to try and assess their needs better. Of course, their needs are partly to understand the overall framework of where AI is going and what is currently possible, but also to train their junior researchers in these disciplines, even if they don’t have an overly technical background.
“I guess the playing field is becoming a little more even, because there are lots of tools that claim to be able to provide a no-code AI experience; being able to run AI and interpret results without any exposure to writing code, to software programming,” he explained.
Vaccari worries that researchers might be lured into the idea of throwing data and an algorithm at a problem, without worrying enough about what the results look like and without interpreting the results properly. On the other hand, there is deep understanding among researchers that AI can be tricky and can upset ways of working that have been developed over decades – or centuries.
“I’m not 100% sure I trust this approach, but it’s interesting to try things out, and to see whether this leads to new insights or whether it leads to bad science. The jury’s still out. Unfortunately, the pace of progress is sometimes too quick for us researchers to keep track of it.”
