NCSA Brown Dog

Brown Dog Users/Contributers Cheatsheet

 

The Super Mutt

of Software

Brown Dog seeks to develop a service that will make past and present un-curated data accessible and useful to scientists while also demonstrating the novel science and scholarship that can be conducted from such data.

Brown Dog will not attempt to construct a single piece of software that magically understands all data, but instead will use every possible source of automatable help already in existence (e.g. software, tools, libraries, other services) in a robust and provenance preserving manner to create a service that can deal with as much of this data as possible. Brown Dog is the proverbial “super mutt” of software, serving as a low-level data infrastructure to interface with digital data content across the web and enabling a new era of science and applications at large. The broader impact of this work is in its potential to serve not just the scientific community but the general public as a “DNS for data”, transforming data on the fly to more accessible forms through a distributed and extensible collection of data manipulation tools, moving civilization towards an era where a user’s access to data is not limited by a file’s format or un-curated collections.

 

Towards a Data Cyberinfrastructure

Brown Dog is part of the DataNet/DIBBs program funded by NSF beginning in 2008. DataNet was conceived to address the increasingly digital and data-intensive nature of science and engineering research and education. Digital data are not only the output of research but provide input to new hypotheses, enabling new scientific insights and driving innovation. Therein lies one of the major challenges of this scientific generation: how to develop the new methods, management structures and technologies to manage the diversity, size, and complexity of current and future data sets and data streams. DataNet addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.

Brown Dog is, more specifically, part of a follow-on effort called DIBBs (Data Infrastructure Building Blocks), focused on building software cyberinfrastructure to support current and foreseen scientific data needs, stuff lots of people can use.  All of the DIBBs projects are meant to provide complementary services, each building on the others capabilities.

More about the DataNet and DIBBs program

 

DataNet & DIBBs

PARTNERS

NSF Program: DIBBs

 

NSF Program: DIBBs

 

NSF Program: DIBBs

Software: DAP, DTS

 

NSF Program: DIBBs

Software: SkyServer

Data: Sloan Digital Sky Survey

 

NSF Program: DIBBs

Software: HUBzero

 

NSF Program: DIBBs

Software: SLASH2

 

NSF Program: DataNet

Software: DataONE

Data: Biology and Environmental

 

 

NSF Program: DataNet

Software: ACR, Virtual Archive

Data: Social and Environmental

 

Software: iRODS

Data: Ocean Observatory, Hydrology, Genome, Social Science, Education

 

Data: Census/Survey, Remote Sensing, Climate

 

 

Project Updates

The Team

Photo of Kenton Mchenry

Kenton McHenry ,Ph.D.

PI

Senior Research Scientist, NCSA

mchenry@illinois.edu

Photo of Jong Lee

Jong Lee, Ph.D.

Co-PI

Senior Research Scientist, NCSA

jonglee1@illinois.edu

 

Photo of Michael Dietze

Michael Dietze, Ph.D.

Co-PI

Assistant Professor of Biology, Boston University

dietze@bu.edu

 

Photo of Barbara Minsker

Barbara Minsker, Ph.D.

Co-PI

Professor of Civil & Environmental Engineering, UIUC

minsker@illinois.edu

 

Praveen Kumar, Ph.D.

Co-PI

Professor of Civil & Environmental Engineering, UIUC

kumar1@illinois.edu

 

Art Schmidt, Ph.D.

Research Assistant Professor of Civil & Environmental Engineering, UIUC

aschmidt@illinois.edu

 

Jay Roloff

Senior Project Manager, NCSA

jayr@illinois.edu

 

Jay Alameda

Senior Technical Program Manager, NCSA

alameda@illinois.edu

Rob Kooper

Senior Research Programmer, NCSA

kooper@illinois.edu

 

Jason Votava

Project Manager, NCSA

jvotava@illinois.edu

 

Jerome McDonough,  Ph.D.

Associate Professor of Library & Information Science, UIUC

jmcdonou@illinois.edu

 

Chris Navarro

Senior Research Programmer, NCSA

cmnavarr@illinois.edu

 

Bill Sullivan, Ph.D.

Professor of Landscape Architecture, UIUC

wcsulliv@illinois.edu

 

Richard Marciano, Ph.D.

Professor of Information Studies, Director Digital Curation Innovation Center, UMD

marciano@umd.edu

 

Luigi Marini

Senior Research Programmer, NCSA

marini@illinois.edu

 

Smruti Padhy

Research Programmer, NCSA

spadhy@illinois.edu

 

Eugene Roeder

Programmer, NCSA

eroeder@illinois.edu

 

Rui Liu

Software Developer, NCSA

ruiliu@illinois.edu

 

Josh Manthooth

Graduate Student, Biology, Boston University

jam2767@gmail.com

 

Marcus Slavenas

Research Programmer,  NCSA

slavenas@illinois.edu

 

Betsy Cowdery

Graduate Student, Biology, Boston University

emcowdery@gmail.com

 

Greg Jansen

Research Software Architect, University of Maryland

jansen@umd.edu

 

Mostafa Elag

Postdoctoral Research Associate, Civil & Environmental Engineering, UIUC

mostafaelag@gmail.com

 

Inna Zharnitsky

Programmer, NCSA

inna@illinois.edu

 

 

Dongkook Woo

Graduate Student, Civil & Environmental Engineering, UIUC

dwoo5@illinois.edu

 

Qina Yan

Graduate Student, Civil & Environmental Engineering, UIUC

qinayan2@illinois.edu

 

Kunxuan Wang

Graduate Student, Civil & Environmental Engineering, UIUC

kswang3@illinois.edu

 

Ankit Rai

Graduate Student, Civil & Environmental Engineering, UIUC

rai5@illinois.edu

 

Sun Young Park

Graduate Student, Civil & Environmental Engineering, UIUC

spark185@illinois.edu

 

Pongsakorn (Tum) Suppakittpaisarn

Graduate Student, Landscape Architecture, UIUC

psuppak2@illinois.edu

 

Wenqi Ji

Graduate Student, Landscape Architecture, UIUC

wenqiji2@illinois.com

 

Xiangrong Jiang

Graduate Student, Landscape Architecture, UIUC

xjiang19@illinois.edu

 

Dongying Li

Graduate Student, Landscape Architecture, UIUC

dli13@illinois.com

 

Advisory Board

Photo of Kenton Mchenry

David Forsyth ,Ph.D.

Professor, Computer Science , UIUC

daf@illinois.edu

Photo of Jong Lee

Jysoo Lee, Ph.D.

Director, Korean Institute of Science and Technology Information (KISTI)

jysoo@kisti.re.kr

 

Photo of Michael Dietze

Norma Kenyon, Ph.D.

Professor, Surgery, Medicine, Microbioligy and Immunology and Biomedical Engineering, University of Miami

nkenyon@med.miami.edu

 

Photo of Barbara Minsker

Tschangho Kim, Ph.D.

Professor of Civil, Environmental, and Infrastructure Engineering, George Mason University

tjohnkim@gmail.com

 

Brian Wee, Ph.D.

Chief of Strategic Alliances, NEON

bwee@neoninc.org

 

Bringing

Long-Tail Data

Into the Light

Much of the data generated by science, social science, and the humanities is smaller, unstructured, un-curated and thus not easily shared. Taken together, however, this “long-tail” data, both past and present, represents a vast amount of research data with the potential to greatly impact future research in many areas of study.

The unstructured, un-curated nature of this data, however, means that once the data is gathered and the research published, the data often never sees the light of day again. In addition, contemporary science relies on digital data and software that evolves and disappears quickly as underlying technology changes. Thus we are entering a period where scientific results are no longer easily reproducible.  Since reproducibility is foundational to scientific discovery, development of a method for easily accessing legacy data and software is essential to maintaining the viability of large bodies of research.

Example Use Cases

The success of Brown Dog, in part, depends on the data and use cases that we have to build and test the system against. The inaccessibility of long-tail data is a problem that has been identified as necessary to address by groups within the EarthCube communities. Developers and researchers from some of these communities will work hand-in-hand to explore three compelling scientific use cases that span geoscience, engineering, biology and social science.

Photo showing an archived photo of land surveyors at workLong Tail Vegetation Data in Ecology and Global Change Biology

Michael Dietze, Boston University

Data on the abundance, species composition, and size structure of vegetation is critically important for a wide array of sub-disciplines in ecology, conservation, natural resource management, and global change biology. However, addressing many of the pressing questions in these disciplines will require that terrestrial biosphere and hydrologic models are able to assimilate the large amount of long-tail data that exists but is largely inaccessible. The Brown Dog team in cooperation with these researchers will facilitate the capture of a huge body of smaller research-oriented vegetation data sets collected over many decades and historical vegetation data embedded in Public Land Survey data dating back to 1785. This data will be used as initial conditions for models, to make sense of other large data sets and for model calibration and validation.

Photo showing green infrastructure in an urban environment.Designing Green Infrastructure Considering Storm Water and Human Requirements

Barbara Minsker, UIUC

William Sullivan, UIUC

Arthur Schmidt, UIUC

This case study involves developing novel green infrastructure design criteria and models that integrate requirements for storm water management and ecosystem and human health and wellbeing. To address the scientific and social problems associated with the design of green spaces, data accessibility and availability is a major challenge.  This study will focus on identified areas of the Green Healthy Neighborhood Planning region within the City of Chicago where existing local sewer performance is most deficient and where changes in impervious area through green infrastructure would be beneficial to underserved neighborhoods. Brown Dog will be used to extract long-tail experimental data on human landscape preferences and health impacts. This data will be used to develop a human health impacts model that will then be linked together with a terrestrial biosphere model and a storm water model using Brown Dog technology.

Photo showing layers soil stratificationDevelopment and Application for Critical Zone Studies

Praveen Kumar, UIUC

Critical Zone (CZ) is the “skin” of the earth that extends from the treetops to the bedrock that is created by life processes working at scales from microbes to biomes and it supports all terrestrial living systems. Its upper part is the biomantle. This is where terrestrial biota live, reproduce, use and expend energy, and where their wastes and remains accumulate and decompose. It encompasses the soil, which acts as a geomembrane through which water and solutes, energy, gases, solids, and organisms interact with the atmosphere, biosphere, hydrosphere, and lithosphere. A variety of drivers affect this biodynamic zone, ranging from climate and deforestation to agriculture, grazing and human development. Understanding and predicting these effects is central to managing and sustaining vital ecosystem services such as soil fertility, water purification, and production of food resources, and, at larger scales, global carbon cycling and carbon sequestration.

The CZ provides a unifying framework for integrating terrestrial surface and near-surface environments, and reflects an intricate web of biological and chemical processes and human impacts occurring at vastly different temporal and spatial scales. The nature of these data create significant challenges for inter-disciplinary studies of the CZ because integration of the variety and number of data products and models has been a barrier. On the other hand, CZ data provides an excellent opportunity for defining, testing and implementing Brown Dog technologies. In this context “unstructured” data is viewed broadly as comprising of a collection of heterogeneous data with formats that reflect temporal and disciplinary legacies, data from emerging low cost open hardware based sensors and embedded sensor networks that lack well defined metadata and sensor characteristics, as well as data that are available as maps, images and text.

Photo showing a webpage for downloading dataGeneral Public Use Case

 

In the same way the internet has opened up information sharing for people around the world, the broader impact of Brown Dog will be to make the ever-growing stores of data on the web as easy to search and access as a webpage is now. It will accomplish this through its two core component technologies: the Data Access Proxy (DAP) and  the Data Tilling Service (DTS).

Brown Dog’s DAP will allow users to seamlessly access data files that would otherwise be unreadable on their client devices.  Similar to an internet gateway or Domain Name Service (DNS), the DAP configuration would be entered into a user’s machine settings and forgotten thereafter. From then on, with modifications in the form of plugins to most browsers, data requests over HTTP would first be examined by the DAP to determine if the native file format is readable on the client device. If not, the DAP would be called in the background to convert the file into the best possible format readable by the client machine.  Alternatively, the user would have the option of specifying the desired format themselves, instead of the DAP doing it automatically.

The second component, the DTS, will allow users to search collections of data using an existing file to discover other similar files in the data. Again, once the machine and browser settings are configured, a search field will be appended to the browser where example files can be dropped in by the user. Doing this triggers the DTS to search the contents of all the files under a given URL for files similar to the one provided by the user. For example, while browsing an online image collection, a user could drop an image of three people into the search field, and the DTS would return all images in the collection that also contains three people. If the DTS encounters a file format it is unable to parse, it will utilize the DAP to make the file accessible. The DTS will also perform general indexing of the data and extract and append metadata to files and collections enabling users to gain some sense of the type of data they are encountering.

Together, these components will greatly expand general access and understanding of data on the web.

Once developed, general public users will be able to download the browser plugins and other tools from the Brown Dog tool catalog.

Data Transformation as a Service

How It Works . . .

Infographic illustrating how a person will access and use the Brown Dog technologies: DAP and DTS.

Mouseover the diagram for more details

Early User Workshop Demo

Extensible

Brown Dog

Partners

University of Illinois Dept of Civil & Environmental Engineering

University of Illinois Dept. of Landscape Architecture

Boston University

University of Maryland

 

 

Kenton McHenry, PI    mchenry@illinois.edu

Jason Votava, Project Manager    jvotava@illinois.edu

Mailing list     browndog@ncsa.illinois.edu

 

This material is based upon work supported by the National Science Foundation under Grant No. ACI-1261582.

Nsf.gov

Learn more about this award

 

 

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.