Scientists, academics, programmers, librarians, journalists, activists, and concerned citiziens came togther at Northeastern University’s Snell Library recently to preserve at-risk federal data.
The hack-a-thon, organized by Data Rescue Boston, was one of several such events that have taken place across the country in the last few months as public interest in protecting scientific data increases along with the risks posed by the President Donald Trump anti-science rhetoric.
Since Trump took office, government agency websites have removed references to climate change, Environmental Protection Agency employees have been forbidden from talking to the press or publishing new research, and harsh budget cuts have been proposed for the EPA and the National Oceanic and Atmospheric Administration.
To counter the fear and despair many are feeling at the idea of the government sabotaging its own scientific data, a community dedicated to data rescue has risen up in resistance.
“For participants, it is something really concrete that you can do that is not just feeling that you’re unsettled by the way this administration approaches science,” said Sara Wylie, an assistant professor of sociology and health science at Northeastern and one of the organizers of the event. “It is a really good, tangible activity for people to become a part of where they meet people of like minds. That is a really important outcome, the community-building part of it.”
Born of a collaboration between the Environmental Data & Governance Initiative (EDGI), DataRefuge, Boston Civic Media, the Engagement Lab at Emerson College, and Northeastern’s Social Science Environmental Health Research Institute, the event was held in order to archive data from the U.S. Fish and Wildlife Service.
Hack-a-thon attendees split into three groups: harvesters, seeders, and storytellers.
The harvesters wrote code in three different languages to scrape data on topics ranging from water quality and snow cover to grain phenotype and genotype. The seeders collected 1,100 URLs from the U.S. Fish and Wildlife Service pages and nominated them to the Internet Archive — a nonprofit digital library that has saved more than 286 billion web pages. The storytellers created signs for the March for Science, which took place on April 22, made a visualization of #MyEPA tweets over time, and worked on redesigning the EDGI website.
The event was primarily the work of EDGI, which formed in December with the goal of demonstrating that there is a public interest in the existence and the preservation of federal data. In addition to putting on data rescue events, EDGI is monitoring 25,000 federal websites for any change to data and conducting interviews with people leaving federal agencies to record the human experience of this historical transition.
“The term ‘data rescue’ has been uniquely picked up for this moment, but the concept of preserving data is certainly not new,” said Wylie, who is also a founding member of EDGI.
There has always been some concern about preserving data when the White House transitions from one president to another. In 2008, the End of Term Harvest Project was formed to harvest federal government domains in order to create a portrait of the George W. Bush administration and track changes made after Barack Obama became president. But their goal was to protect against more benign threats that those federal agencies are currently facing.
“We’re used to hearing the occasional story of lost research data, but those are generally situations that happen either accidentally or through benign neglect, when data hardware or formats become obsolete and can no longer be accessed,” said Jen Ferguson, the research data management librarian at Northeastern and organizer of the event. She cited the famous incident in which NASA accidentally taped over footage of the moon landing.
“But this is on a much larger scale, the idea that data could be disappeared en masse to serve a point of view — in other words, to remove or obscure evidence of climate change,” said Ferguson.
According to Ferguson, this is the first time in American history that federal data and the scientific research it supports have been so at risk from the government itself. However, one only has to look as far as Canada to see historical similarities.
In 2006 Stephen Harper became the prime minister of Canada and implemented a broad policy forbidding scientists from speaking to the press about their work. Under Harper, the government cut funding for scientific research slashed the size of agencies. With Prime Minister Harper — as with President Trump — anti-intellectual sentiment was used to discredit the research that scientists were doing for government agencies.
“There was a feeling that the government was not interested in expert opinion, and I think it’s the same kind of thing that you are probably going to see with the new [Trump] administration” David Tarasick, a senior research scientist at Environment and Climate Change Canada (the equivalent of the U.S. EPA), told Scientific American in December.
The parallels between what is happening now and what happened under the Harper administration is one of the reasons that the data rescue movement caught on so quickly in America, according to Ferguson. And because they have experienced it before, Canadian scientists have been quick to join the data rescue effort.
It is illegal to destroy government data, but access can always be denied in other ways.
“Because the data that we’re talking about is accessible over the internet, no one would even necessarily have to destroy the data. Just breaking links to it would serve the same purpose,” said Ferguson. “And while other copies of a given data set no doubt exist in the world — it’s been backed up and downloaded by people — in today’s political climate, can you imagine the argument that would ensue over the legitimacy of a data set that was in the hands of a scientist studying climate change?”
Lost scientific data has the potential to set scientific research back very far, very quickly — and in some cases permanently. It would also change the lives of the millions of Americans who work with and rely on data.
“All of science depends on data if you think about it,” said Wylie. “Climate science completely depends on our knowledge of past records of temperature or past records of water depth. There is harm to any kind of scientific enterprise if there is a loss of a data set that is key to being able to follow patterns in the world.”
Ultimately, the data rescue movement it not about fear, but about civic engagement with public data (data funded by tax dollars) and building a community that will last well beyond the current threats to science.
Data rescue “is certainly something that was started because of the election, but I think the ideas of it are hopefully must longer lasting than just a few years,” said EDGI member and Harvard doctoral candidate Maya Anjur-Dietrich. “There is also the question of federal data being something that people are aware of and interested in and taking ownership of because this your data, this is public data.”
She added: “And so while this idea that a Trump presidency and an anti-science perspective has scared people, I think having civic engagement with public knowledge is something that ideally we would have all the time.”
This multimedia story was produced as part of WGBH News contributor Dan Kennedy's class in Digital Storytelling and Social Media at Northeastern University.