IDEA – Building a Digital Infrastructure to Make Sense of Environmental Health

by Mary Martialay on May 5, 2016

(This is the first in a series of guest posts about research related to the Rensselaer IDEA — the Institute for Data Exploration and Applications — a campus-wide institute dedicated to helping researchers navigate the increasingly data-driven landscape of scientific enquiry. In this post, Lindsay Poirier, a doctoral candidate and research assistant in the Department of Science and Technology Studies, explains her work building a data infrastructure for research in the humanities.)

For the past three years, I’ve been working to build a digital platform that supports humanistic research on complex problems. This platform, the Platform for Experimental Collaborative Ethnography, or PECE, grew out of research on the global asthma epidemic for a project called The Asthma Files. Led by Science and Technology Studies professors Kim Fortun and Mike Fortun, The Asthma Files aims to analyze and document the history and culture of environmental health research and governance in cities across the globe. We have been developing PECE to support The Asthma Files, and it now is available to other research groups striving to build interdisciplinary perspective on complex global challenges.

In building PECE, The Asthma Files is working closely with the Rensselaer IDEA, which brings together key research areas and advanced technologies to revolutionize the way we use data. As we face complex global problems — climate change, water scarcity, pollution, and rising public health concerns — we need better information. New computational capabilities and data infrastructures offer opportunities for using data to sense and make sense of our environment, and developing those opportunities may help us to tackle those problems with greater insights and better solutions.

While the science and engineering disciplines have led the way in developing approaches to collect, exchange, and analyze data, in the humanities and social sciences we have been considering how digital infrastructures can best support our own methods and research workflows to help us better understand global problems.

For example, from a science and engineering perspective on asthma, typical “big data” might include air quality data collected from monitors, traffic data collected from satellites, or hospitalization data collected from admissions information systems. All of this data comes from different sources, and is measured in different scales with different terminology, yet to characterize the environmental health of a city it must all be tracked in real time.

The data that we work with on The Asthma Files — historical and cultural data — is a notable departure. For one thing, it’s not as “big” and doesn’t move as quickly as the data I just described. But it is equally tricky to work with.

The Asthma Files project collects and analyzes empirical artifacts — interviews with scientific experts, images of smog, news articles about climate deniers, publications on increasing asthma rates, etc. All of this data comes in a variety of formats, from a variety of locales, and has been collected for a variety of purposes. Making sense of this data through social analysis requires bringing lots of different theories and perspectives to the table — a type of analysis that, dare I say it, would be impossible for a computer to perform. Building a digital platform to support this work is complicated; it demands that we re-imagine the types of data analysis that a computer can afford and the ways in which researchers engage with digital tools. We are approaching these challenges in building PECE.

PECE is a digital platform designed to facilitate a different type of data analysis than many other projects in IDEA.  In one sense, PECE is an archival space for storing artifacts.  Through our affiliation with IDEA, we have worked closely with representatives from the Research Data Alliance and data scientists at the Tetherless World Constellation to ensure that the platform contextualizes this data with proper metadata and historical data.  These standards are important in our work; it is all too often that computationally enabled analysis on social data forgets the significance of richly capturing where data comes from — at times ignoring data gaps (such as communities that have been ignored in scientific research or locales where data hasn’t been made publicly available for various political reasons).  We aim for our platform to richly capture and help us see gaps in data; in fact, noting the politics of data availability in various locales tells us something important about the culture of environmental health research.

In another sense, PECE facilitates collaborative analysis of the data stored on the platform.  The “collaborative” part is important here.  In many scientific disciplines, building data infrastructure to support collaborative analysis has an explicit aim to ensure data reproducibility — to ensure that, as different researchers analyze the same data points, they are able to produce the same results.  This is notably not the aim for PECE.  We work in a humanities tradition that contends that the best way to understand a social phenomenon is not to find the one right way to explain it.  Instead, we aim to draw together diverse and sometimes even conflicting perspectives to analyze artifacts stored on the platform. From this we can build layered descriptions of a social phenomenon; each layer adds a new lens from which we can interpret the data, enriching our understanding of it.

For instance, consider a map of asthma rates in Pennsylvania — an artifact that could be added to The Asthma Files platform.  PECE has designed an annotation tool that encourages diverse “readings” of the map.  Using this tool, a social scientist who studies Pennsylvania may “read” from the map that areas where asthma is prevalent plot onto the state’s low-income neighborhoods.  A social scientist who studies the politics of data may read that the scale at which the data is visualized hides some important details about which demographics are disproportionately exposed to environmental burdens.  A social scientist who studies the effects of industrial pollution may read that asthma rates are more prevalent in areas next to shale gas sites.  All will likely have shared questions:  Why was the map produced in the first place?  What expertise and modes of data collection were required to produce it?  Who funded the creation of the map?  As researchers use the annotation tool to interrogate and record their diverse insights, their annotations then also become platform data.  On PECE, annotations can be juxtaposed and re-mixed, creating a kaleidoscopic view of environmental health in geographic space.

In our work, there’s always more data analysis that can be done: descriptions can always be thickened; perspectives can always be further multiplied.  We hope that by creating a space where diverse insights can collaborate to theorize the history and culture of global problems, such as asthma, we can make better sense of environmental health.