As satellites collect larger and larger amounts of data, engineers and researchers are implementing solutions to manage these huge increases.
The cutting-edge Earth science satellites launching in the next couple of years will give more detailed views of our planet than ever before. We’ll be able to track small-scale ocean features like coastal currents that move nutrients vital to marine food webs, monitor how much fresh water flows through lakes and rivers, and spot movement in Earth’s surface of less than half an inch (a centimeter). But these satellites will also produce a deluge of data that has engineers and scientists setting up systems in the cloud capable of processing, storing, and analyzing all of that digital information.
“About five or six years ago, there was a realization that future Earth missions were going to be generating a huge volume of data and that the systems we were using would become inadequate very quickly,” said Suresh Vannan, manager of the Physical Oceanography Distributed Active Archive Center based at NASA’s Jet Propulsion Laboratory in Southern California.The center is one of several under NASA’s Earth Science Data Systems program responsible for processing, archiving, documenting, and distributing data from the agency’s Earth-observing satellites and field projects. The program has been working for several years on a solution to the information-volume challenge by moving its data and data-handling systems from local servers to the cloud – software and computing services that run on the internet instead of locally on someone’s machine.
The Sentinel-6 Michael Freilich satellite, part of the U.S.-European Sentinel-6/Jason-CS (Continuity of Service) mission, is the first NASA satellite to utilize this cloud system, although the amount of data the spacecraft sends back isn’t as large as the data many future satellites will return.Two of those forthcoming missions, SWOT and NISAR, will together produce roughly 100 terabytes of data a day. One terabyte is about 1,000 gigabytes – enough digital storage for approximately 250 feature-length movies. SWOT, short for Surface Water and Ocean Topography, will produce about 20 terabytes of science data a day while the NISAR (NASA-Indian Space Research Organisation Synthetic Aperture Radar) mission will generate roughly 80 terabytes daily. Data from SWOT will be archived with the Physical Oceanography Distributed Active Archive Center while data from NISAR will be handled by the Alaska Satellite Facility Distributed Active Archive Center. NASA’s current Earth science data archive is around 40 petabyes (1 petabyte is 1,000 terabytes), but by 2025 – a couple of years after SWOT and NISAR are launched – the archive is expected to hold more than 245 petabytes of data.
Both NISAR and SWOT will use radar-based instruments to gather information. Targeting a 2023 launch, NISAR will monitor the planet’s surface, collecting data on environmental characteristics including shifts in the land associated with earthquakes and volcanic eruptions, changes to Earth’s ice sheets and glaciers, and fluctuations in agricultural activities, wetlands, and the size of forests.Set for a 2022 launch, SWOT will monitor the height of the planet’s surface water, both ocean and freshwater, and will help researchers compile the first survey of the world’s fresh water and small-scale ocean currents. SWOT is being jointly developed by NASA and the French space agency Centre National d’Etudes Spatial.
“This is a new era for Earth observation missions, and the huge amount of data they will generate requires a new era for data handling,” said Kevin Murphy, chief science data officer for NASA’s Science Mission Directorate. “NASA is not just working across the agency to facilitate efficient access to a common cloud infrastructure, we’re also training the science community to access, analyze, and use that data.”
Currently, Earth science satellites send data back to ground stations where engineers turn the raw information from ones and zeroes into measurements that people can use and understand. Processing the raw data increases the file size, but for older missions that send back relatively smaller amounts of information, this isn’t a huge problem. The measurements are then sent to a data archive that keeps the information on servers. In general, when a researcher wants to use a dataset, they log on to a website, download the data they want, and then work with it on their machine.
However, with missions like SWOT and NISAR, that won’t be feasible for most scientists. If someone wanted to download a day’s worth of information from SWOT onto their computer, they’d need 20 laptops, each capable of storing a terabyte of data. If a researcher wanted to download four days’ worth of data from NISAR, it would take about a year to perform on an average home internet connection. Working with data stored in the cloud means scientists won’t have to buy huge hard drives to download the data or wait months as numerous large files download to their system. “Processing and storing high volumes of data in the cloud will enable a cost-effective, efficient approach to the study of big-data problems,” said Lee-Lueng Fu, JPL project scientist for SWOT.Infrastructure limitations won’t be as much of a concern, either, since organizations won’t have to pay to store mind-boggling amounts of data or maintain the physical space for all those hard drives. “We just don’t have the additional physical server space at JPL with enough capacity and flexibility to support both NISAR and SWOT,” said Hook Hua, a JPL science data systems architect for both missions.
NASA engineers have already taken advantage of this aspect of cloud computing for a proof-of-concept product using data from Sentinel-1. The satellite is an ESA (European Space Agency) mission that also looks at changes to Earth’s surface, although it uses a different type of radar instrument than the ones NISAR will use. Working with Sentinel-1 data in the cloud, engineers produced a colorized map showing the change in Earth’s surface from more vegetated areas to deserts. “It took a week of constant computing in the cloud, using the equivalent of thousands of machines,” said Paul Rosen, JPL project scientist for NISAR. “If you tried to do this outside the cloud, you’d have had to buy all those thousands of machines.”
Cloud computing won’t replace all of the ways in which researchers work with science datasets, but at least for Earth science, it’s certainly gaining ground, said Alex Gardner, a NISAR science team member at JPL who studies glaciers and sea level rise. He envisions that most of his analyses will happen elsewhere in the near future instead of on his laptop or personal server. “I fully expect in five to 10 years, I won’t have much of a hard drive on my computer and I will be exploring the new firehose of data in the cloud,” he said.