June 4, 2024

Why Satellogic Released Six Million Images for Free

Co-Founder Gerardo Richarte outlines the rationale behind a recent open release of a massive dataset designed for AI model creation.
Shutterstock

In the midst of this current boom around artificial intelligence, it's becoming increasingly clear that access to high-quality data is paramount to any given solution's success. Of course a strong and functional algorithm is important, but it doesn’t matter without the data. These models need good data – “garbage in, garbage out” is a common AI-centric saying – and they also need a lot of data. That last part can be particularly difficult for sectors like Earth Observation, which has a limited number of satellites and high costs to acquire data. There’s little doubt that the Earth Observation sector would benefit from a variety of AI models, but for those outside of the largest companies and organizations, it’s easier said than done getting hands on the data necessary to create these models.

That is, unless someone happens to just give it away for free.

This is exactly what Satellogic, a Montevideo-based Earth Observation company, announced last month. In early May, the company sent out an announcement indicating that they had released a large dataset of high-resolution satellite imagery from their archive to support the training of foundation EO models. Specifically, they’ve released six million images of three million unique locations from a variety of areas all around the world, each 384 x 384 pixels. The data, available on Hugging Face, was released under a Creative Commons CC-BY 4.0 license, which allows for commercial use of the data with attribution.

It’s natural to wonder why a company like Satellogic, whose business in large part centers around providing this data to end users for a price, would release all of this data. Gerardo Richarte, one of Satellogic’s co-founders and current Chief Innovation Officer as well as Chief Information Security Officer, tells Geo Week News that it’s a common sentiment. At a recent summit he attended in Washington, DC, he said people came up and were asking them if they were crazy.

Image from Satellogic's open release

Perhaps they are, but there are valid reasons for the decision passed along by Richarte, which largely tie back to what we talked about at the top. The industry needs these models - there is simply too much data for a  and if they are to serve the industry’s needs they need to be trained on high-quality data. Richarte believes that visual language models (VLMs) are going to “get big,” and that while current models perform well on some images it’s not exactly what the EO sector really needs.

“We’ve been trying many VLMs out there, and though they perform really well on general images, on satellite images – specifically of our resolution and detail – they don’t perform so well. Even models that were trained on satellite images, usually they train on Sentinel [with] 10 meter resolution, so they don’t perform very well when you want to see features that you can only see at high resolution.”

Satellogic, according to Richarte, delivers imagery at 70 centimeter native resolution and 50 centimeters for super resolved imagery. The released dataset also includes imagery from Sentinel and other open data satellites.

When putting together this release, there was intentionality behind the included data, Richarte says. For one thing, they had to decide if they were going to include near infrared data – which they did. They also had to decide whether or not they would filter out some images to only put out the best images they had to offer. Ultimately, they decided against that, including some images that still had artifacts and some other attributes that would not always make them ideal imagery to collect. That was a deliberate choice.

“We decided not to filter from a quality point of view. We are releasing images with artifacts, because we want people to see the real thing. They’re going to train models, and maybe they want to train more to detect and improve on the artifacts. So, we put it out and if they want to make a model that even works when there are artifacts, we give them that material.”

Since this is going out to the general public for anyone to utilize for model creation, it’s hard to say exactly what kind of models will be created from this data. Because of that, diversity in the dataset was important. In addition to including some images with artifacts still in the data, they also tried to include imagery from all parts of the world and all kinds of landscapes (and marine areas) so any kinds of asset detection model, or others, could be covered. 

Image from Satellogic's open release

Additionally, as mentioned above, the release includes six million images covering three million areas, which means that there is some overlap where certain areas have multiple images included in the release. It’s not the case that there are exactly two for each area – some may have one, others may have three or more – but there is overlap so that change detection models can be created via this release as well. Richarte told Geo Week News that it is their “intention to update the dataset periodically,” though he didn’t make a commitment as to what the cadence of those releases would look like.

It’s clear from talking to Richarte that Satellogic views the development of these models as an important next step in the Earth Observation industry, and views this release as a crucial step toward that goal. That said, he’s also clear that they see this as a way to build their business as well, with a couple of avenues toward profit.

For the first, he points to larger companies and organizations who may be utilizing this open data for their own models. This dataset may be sufficient to start building these models, but ultimately the organizations are going to want to build large models, for which they’ll need more data. In that case, they would have to go to Satellogic to purchase more of that data.

Additionally, Richarte points to smaller organizations creating models, and in those cases it would be the end users of said models who would likely turn to Satellogic.

“We expect people to start publishing models, right? When the end user wants to use those models, they really want to run them on fresh data,” Richarte told Geo Week News. “You only get fresh data – not from this dataset – from us or from our partners.”

That revenue will likely require some lead time, but Satellogic believes the bet will pay off. And even beyond that, Richarte does make clear that they see true value for the industry, and the world more broadly, around the creation of these models.

“We really believe this is the beginning and we believe that the only way work through a massive archive of data is through AI."

Want more stories like this? Subscribe today!



Read Next

Related Articles

Comments

Join the Discussion