One of my Berkeley colleagues, Brian Hamlin, recently attended the 5th International Symposium on Digital Earth (ISDE) and returned to lend me some very interesting speculations on data collection and presentation that are helping catalyze a few very slowly emerging thoughts.
Everywhere I look in the natural sciences, there is a sudden, significant maturing of large-scale distributed science projects that involve active real-time sensing of one of more aspects of the physical planet and its environs. These projects include Neon, the first widely distributed ecologically-based sensing project; the Keck Hydrowatch project based in the American West, and a burgeoning number of geological and space sensing systems. Together, these efforts are often coalesced together under the sobriquet Global Earth Observation System of Systems (GEOSS), in the ultimate hope that their data and insights may inform each other.
Individually, network-intensive science projects of this type are usually labeled “cyberinfrastructure,” after a report that impressively summarized some of their most significant common elements and pressing requisites, Revolutionizing Science and Engineering Through Cyberinfrastructure [pdf], prepared by Daniel Atkins, now the Director of the Office of Cyber-Infrastructure at the U.S. National Science Foundation.
This is an emergent form of Big Science; it is vitally important — because it is potentially planet-saving; and it is on the threshold of revolutionizing digital data generation and management. As Greg White of the U.S. National Oceanic and Atmospheric Agency (NOAA) observed at ISDE, the development of small, autonomous sensor devices often capable of joining self-forming and -healing networks has revolutionized systems monitoring, delivering a minimum of a 100 times increase in the amount of information being gathered.
We don’t know how to deal with this type of data generation, and its onslaught, and we will have to learn. Fast. We can acquire it; store it; increasingly, we can preserve it. We are struggling to learn how to describe it; publish it; search across it; make it available for access and re-use, for teaching and disparate forms of research; and most critically, we must learn how to learn from the data itself.
This is not about data mining. It is about making data architectures and systems that are as much alive as the earth they capture, that permit us to dynamically understand, manipulate, and research new propositions about our living and increasingly dying planet. This is hard work, it is new work, and as I fly east on the tail of a weather dragon that has left 100+ degree Fahrenheit temperatures in Idaho, and flooding in South Texas, I have to believe that it is urgent work.
The insight that my friend Brian brought back from the ISDE conference is that there is an increasingly visible “bright line of digital information” that — like a great river — cuts between two wholly different ranges of data. On one side, there is already extant (either actively digitized, or digitally prepared) data gathered, harvested, and presented for discovery and use. This is the land of Google and other search engines, grabbing the world’s available online data, indexing it, mining it, integrating it with other data sources, and provide compelling windows into a comparatively static and viscous digitized world. That’s where a good measure of CS/EE and IR attention rests now.
The other side of the Bright Line are the data lying latent upon the earth, sky, and space, sleeping quietly until they are woken with sensing, and now flooding real-time like a sea, imminently bursting forth across our international network of high speed science grids.
There are tremendous opportunities here, new ways of thinking about data, about how to develop usable interfaces on a wide range of devices. GEOSS requires us to rethink systems design from the ground up. Scales are refactored: hundreds of large-scale distributed systems, with thousands of sensors linked in community networks, each producing gigabytes or more per second, continuously delivered, and susceptible to combination.
GEOSS projects are seeking radically new forms of systems architectures for data management, on the very edge of science. All of these projects are a click away.