• Print

The Sensing Earth

One of my Berkeley colleagues, Brian Hamlin, recently attended the 5th International Symposium on Digital Earth (ISDE) and returned to lend me some very interesting speculations on data collection and presentation that are helping catalyze a few very slowly emerging thoughts.

Everywhere I look in the natural sciences, there is a sudden, significant maturing of large-scale distributed science projects that involve active real-time sensing of one of more aspects of the physical planet and its environs. These projects include Neon, the first widely distributed ecologically-based sensing project; the Keck Hydrowatch project based in the American West, and a burgeoning number of geological and space sensing systems. Together, these efforts are often coalesced together under the sobriquet Global Earth Observation System of Systems (GEOSS), in the ultimate hope that their data and insights may inform each other.

Individually, network-intensive science projects of this type are usually labeled “cyberinfrastructure,” after a report that impressively summarized some of their most significant common elements and pressing requisites, Revolutionizing Science and Engineering Through Cyberinfrastructure [pdf], prepared by Daniel Atkins, now the Director of the Office of Cyber-Infrastructure at the U.S. National Science Foundation.

This is an emergent form of Big Science; it is vitally important — because it is potentially planet-saving; and it is on the threshold of revolutionizing digital data generation and management. As Greg White of the U.S. National Oceanic and Atmospheric Agency (NOAA) observed at ISDE, the development of small, autonomous sensor devices often capable of joining self-forming and -healing networks has revolutionized systems monitoring, delivering a minimum of a 100 times increase in the amount of information being gathered.

We don’t know how to deal with this type of data generation, and its onslaught, and we will have to learn. Fast. We can acquire it; store it; increasingly, we can preserve it. We are struggling to learn how to describe it; publish it; search across it; make it available for access and re-use, for teaching and disparate forms of research; and most critically, we must learn how to learn from the data itself.

This is not about data mining. It is about making data architectures and systems that are as much alive as the earth they capture, that permit us to dynamically understand, manipulate, and research new propositions about our living and increasingly dying planet. This is hard work, it is new work, and as I fly east on the tail of a weather dragon that has left 100+ degree Fahrenheit temperatures in Idaho, and flooding in South Texas, I have to believe that it is urgent work.

The insight that my friend Brian brought back from the ISDE conference is that there is an increasingly visible “bright line of digital information” that — like a great river — cuts between two wholly different ranges of data. On one side, there is already extant (either actively digitized, or digitally prepared) data gathered, harvested, and presented for discovery and use. This is the land of Google and other search engines, grabbing the world’s available online data, indexing it, mining it, integrating it with other data sources, and provide compelling windows into a comparatively static and viscous digitized world. That’s where a good measure of CS/EE and IR attention rests now.

The other side of the Bright Line are the data lying latent upon the earth, sky, and space, sleeping quietly until they are woken with sensing, and now flooding real-time like a sea, imminently bursting forth across our international network of high speed science grids.

There are tremendous opportunities here, new ways of thinking about data, about how to develop usable interfaces on a wide range of devices. GEOSS requires us to rethink systems design from the ground up. Scales are refactored: hundreds of large-scale distributed systems, with thousands of sensors linked in community networks, each producing gigabytes or more per second, continuously delivered, and susceptible to combination.

GEOSS projects are seeking radically new forms of systems architectures for data management, on the very edge of science. All of these projects are a click away.

Let’s engage.

tags: ,
  • Peder Burgaard

    Very interesting information and I hope lots of people get engaged. I was part of the review/comment team for the Metaverse Roadmap Project were one of its forecasted projections is sensors everywhere (ware) and the implications of data mining all those information.

    As a future extension of that work, I suggested including what I call the BioWeb or Internet of Living Things as a scenario to coexist with all the manmade sensors. Technological advances will enable utilization of living organism as sensors and connect those on the network e.g. trees, plants etc. to gather real time information of the health state of ecological systems.

    I think this area of science is really important because it gives data on pollution directly correlated with human environmental actions, and not the currently delay before actions becomes visible.

  • http://verabass.blogspot.com Vera Bass

    Here’s my first image:

    Picture a plain old fashioned spreadsheet. One (or many) for each discipline or sub-discipline or project. Instead of mapping revenue and expense projections, though, these spreadsheets calculate data generated by observations against anything you wish (ie. proven or hypothetical theoretical values). Connect the multiple spreadsheets at common points in a 3 dimensional model, and add an interfaced function for theorists to merge any element or data fields chosen.

    Vera

  • Ross Stapleton-Gray

    We have lots of evidence of the effects of pollution; the problem is less one of a lack of data, than a lack of logic, or short-term profit taking at everyone’s collective expense. More talking trees won’t address that.

    On the other hand, it’ll be interesting when one’s foliage can rat you out to neighbors and the authorities: “My owner’s running a meth lab! Hey, over here!”

  • Peder Burgaard

    @Ross

    I agree that the problem also is a lack of logic and that there are lots of data to support human actions of pollution but that data is not widely available in real time local surroundings to a mainstream audience by the click of mobile device.

    Accessing the pollution level and other data of organic structures outside your house or down the street would create real time local awareness and close the feedback delay and inaccessibility to the data. Thus, hopefully by pass the lack of human logic.

  • barry.b

    @Ross
    @Peder

    actually guys, I disagree.

    “the problem is less one of a lack of data, than a lack of logic”

    id say it’s the lack of visualising how meaningful that data is so people can asses the full ramifications of actions, choices, etc.

    If we can see the trends in say, lead poisening of fish and tie those economic loses to diversion of revenue to clean it up from, say, education back to the impact of lower literacy levels and a drop of productivity, I’m sure a few Flash-based graphs could cover it.

    How we model complex data simply and clearly is not at all commonplace, probably because it’s just to expensive to do properly. Or the tools are still evolving

    nah, enough data is there, we just can’t see the wood for the trees…