The Deluge of Data

by Michael Mullaney on July 26, 2011

So, are the family photos I burned to a DVD and uploaded to Shutterfly going to last forever?

Sadly, probably not. This is the kind of problem that Rensselaer Vice President for Research and Computer Science Professor Fran Berman has been studying since she was in grad school. But instead of photos on a DVD, she’s thinking about the digitized hospital records, government data, and other MASSIVE data sets.

Making another backup your wedding snapshots is one thing. Making a backup copy of the digital reserves of the Library of congress is quite another.

Berman talks about this tricky challenge of digital data preservation in today’s Academic Minute, which aired this afternoon on NPR affiliates all across the country.

You can listen to the wonderful 90-second piece here. The local NPR affiliate, WAMC Northeast Public Radio, launched its Academic Minute segment last year to great success. What started as a regional endeavor is now making waves nationally.

Here is the transcript of Berman’s Academic Minute:

Do you have digital photos of friends and family? Do you keep your tax records on a computer? Do you listen to music on your iPod?

Today, digital information is everywhere, and comes in many forms: text, video, audio, sensor data. Those in the know talk about the “deluge of data.”

In fact, by current estimates, we are generating over a billion trillion bytes of data – equivalent to a stack of books reaching from the sun to Pluto and back.

Some of this data isn’t worth keeping, but some of it is very valuable and needs to be retained. This includes official e-records, important scientific data, electronic health information, etc.

The problem is that data is fragile. To access digital information for decades or more, we need to keep it moving, again and again, from last generation storage to next generation storage.

Just as our music has moved from records to CDs, to DVDs, and to iPods, our digital information will move from the hard drives, discs, and tape we use today to the storage media we will be using tomorrow.

All of this comes with a price: That of beginning to think of data storage as critical infrastructure for the Information Age. And with it, how we will support it.

Today, government, businesses, and libraries are creating and supporting data repositories to host valuable data. To ensure that the data on which we depend today will be there when we need it in the future.

Read more about Berman’s research in this great feature story, and in this news release. Oh and here, too.