Home
/
Authors
/
Jong Lee

Author

Jong Lee

University of Illinois at Urbana–Champaign

Other affiliations: National Center for Supercomputing Applications

Bio: Jong Lee is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Cyberinfrastructure & Data management. The author has an hindex of 6, co-authored 15 publications receiving 123 citations. Previous affiliations of Jong Lee include National Center for Supercomputing Applications.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Towards Sustainable Curation and Preservation: The SEAD Project's Data Services Approach

[...]

James D. Myers¹, Margaret Hedstrom¹, Dharma Akmon¹, Sandy Payette¹, Beth Plale², Inna Kouper², Scott McCaulay², Robert H. McDonald², Isuru Suriarachchi², Aravindh Varadharaju², Praveen Kumar³, M. Elag³, Jong Lee³, Rob Kooper³, Luigi Marini³ - Show less +11 more•Institutions (3)

Inter-university Consortium for Political and Social Research¹, Indiana University², University of Illinois at Urbana–Champaign³

31 Aug 2015

TL;DR: This paper describes the conceptual framework motivating the SEAD project and the suite of data services developed and deployed as an initial implementation of this approach and identifies some key architectural features of the approach as well as open challenges to fully realizing the value of this Approach in the broad ecosystem of cyberinfrastructure.

...read moreread less

Abstract: When the effort to curate and preserve data is made at the end of a project, there is little opportunity to leverage ongoing research work to reduce curation costs or conversely, to leverage curation efforts to improve research productivity. In the Sustainable Environment Actionable Data (SEAD) project, we have envisioned a more active approach to data curation and preservation in which these processes occur in parallel with research and generate sufficient short and long-term return on researcher investments for self-interest to drive their adoption. In this paper, we describe the conceptual framework motivating the SEAD project and the suite of data services we have developed and deployed as an initial implementation of this approach. Use cases in which these services can reduce curation effort and aid ongoing research are highlighted and, based on our experience to date, we identify some key architectural features of our approach as well as open challenges to fully realizing the value of this approach in the broad ecosystem of cyberinfrastructure.

...read moreread less

29 citations

Journal Article•DOI•

Development of a participatory Green Infrastructure design, visualization and evaluation system in a cloud supported jupyter notebook computing environment

[...]

Lorne Leonard¹, Brian Miles², Bardia Heidari³, Laurence Lin⁴, Anthony M. Castronova, Barbara S. Minsker⁵, Jong Lee³, Charles I. Scaife⁴, Lawrence E. Band⁴ - Show less +5 more•Institutions (5)

Pennsylvania State University¹, CGI Group², University of Illinois at Urbana–Champaign³, University of Virginia⁴, Southern Methodist University⁵

01 Jan 2019-Environmental Modelling and Software

TL;DR: A set of workflows is demonstrated to facilitate rapid and repeatable creation of GI landscape designs which are incorporated into complex models using web applications and services that are sufficient to address diverse ecosystem service goals.

...read moreread less

Abstract: Land use planners, landscape architects, and water resource managers are using Green Infrastructure (GI) designs in urban environments to promote ecosystem services including mitigation of storm water flooding and water quality degradation. An expanded set of urban sustainability goals also includes increasing carbon sequestration, songbird habitat, reducing urban heat island effects, and improvement of landscape aesthetics. GI is conceptualized to improve water and ecosystem quality by reducing storm water runoff at the source, but when properly designed, may also benefit these expanded goals. With the increasing use of GI in urban contexts, there is an emerging need to facilitate participatory design and scenario evaluation to enable better communication between GI designers and groups impacted by these designs. Major barriers to this type of public participation is the complexity of both parameterizing, operating, visualizing and interpreting results of complex ecohydrological models at various watershed scales that are sufficient to address diverse ecosystem service goals. This paper demonstrates a set of workflows to facilitate rapid and repeatable creation of GI landscape designs which are incorporated into complex models using web applications and services. For this project, we use the RHESSys (Regional Hydro-Ecologic Simulation System) ecohydrologic model to evaluate participatory GI landscape designs generated by stakeholders and decision makers, but note that the workflow could be adapted to a set of other watershed models.

...read moreread less

22 citations

Proceedings Article•DOI•

Clowder: Open Source Data Management for Long Tail Data

[...]

Luigi Marini¹, Indira Gutierrez-Polo¹, Rob Kooper¹, Sandeep Puthanveetil Satheesan¹, M. Burnette¹, Jong Lee¹, Todd Nicholson¹, Yan Zhao¹, Kenton McHenry¹ - Show less +5 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

22 Jul 2018

TL;DR: Some of the challenges encountered in designing and developing a system that can be easily adapted to different scientific areas are discussed, including support for large amounts of data, horizontal scaling of domain specific preprocessing algorithms, and ability to provide new data visualizations in the web browser.

...read moreread less

Abstract: Clowder is an open source data management system to support data curation of long tail data and metadata across multiple research domains and diverse data types. Institutions and labs can install and customize their own instance of the framework on local hardware or on remote cloud computing resources to provide a shared service to distributed communities of researchers. Data can be ingested directly from instruments or manually uploaded by users and then shared with remote collaborators using a web front end. We discuss some of the challenges encountered in designing and developing a system that can be easily adapted to different scientific areas including digital preservation, geoscience, material science, medicine, social science, cultural heritage and the arts. Some of these challenges include support for large amounts of data, horizontal scaling of domain specific preprocessing algorithms, ability to provide new data visualizations in the web browser, a comprehensive Web service API for automatic data ingestion and curation, a suite of social annotation and metadata management features to support data annotation by communities of users and algorithms, and a web based front-end to interact with code running on heterogeneous clusters, including HPC resources.

...read moreread less

20 citations

Proceedings Article•DOI•

Brown Dog: Leveraging everything towards autocuration

[...]

Smruti Padhy¹, Greg Jansen¹, Jay Alameda¹, Edgar F. Black¹, Liana Diesendruck¹, Michael Dietze¹, Praveen Kumar¹, Rob Kooper¹, Jong Lee¹, Rui Liu¹, Richard Marciano¹, Luigi Marini¹, Dave Mattson¹, Barbara S. Minsker¹, Chris Navarro¹, Marcus Slavenas¹, William C. Sullivan¹, Jason Votava¹, Inna Zharnitsky¹, Kenton McHenry¹ - Show less +16 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

29 Oct 2015

TL;DR: Brown Dog is presented, two highly extensible services that aim to leverage any existing pieces of code, libraries, services, or standalone software towards providing users with a simple to use and programmable means of automated aid in the curation and indexing of distributed collections of uncurated and/or unstructured data.

...read moreread less

Abstract: We present Brown Dog, two highly extensible services that aim to leverage any existing pieces of code, libraries, services, or standalone software (past or present) towards providing users with a simple to use and programmable means of automated aid in the curation and indexing of distributed collections of uncurated and/or unstructured data. Data collections such as these encompassing large varieties of data, in addition to large amounts of data, pose a significant challenge within modern day "Big Data" efforts. The two services, the Data Access Proxy (DAP) and the Data Tilling Service (DTS), focusing on format conversions and content based analysis/extraction respectively, wrap relevant conversion and extraction operations within arbitrary software, manages their deployment in an elastic manner, and manages job execution from behind a deliberately compact REST API. We describe both the motivation and need/scientific drivers for such services, the constituent components that allow for arbitrary software/code to be used and managed, and lastly an evaluation of the systems capabilities and scalability.

...read moreread less

17 citations

Standing together for reproducibility in large-scale computing: report on reproducibility@XSEDE

[...]

Doug James, Nancy Wilkins-Diehr, Victoria Stodden, Dirk Colbry, Carlos Rosales, Mark R Fahey, Justin Y. Shi, Rafael Ferreira da Silva, Kyo Lee, Ralph Roskies, Laurence Loewe, Susan Lindsey, Rob Kooper, Lorena A. Barba, David H. Bailey, Jonathan M. Borwein, Oscar Corcho, Ewa Deelman, Michael C. Dietze, Benjamin Gilbert, Jan Harkes, Seth Keele, Praveen Kumar, Jong Lee, Erika Linke, Richard Marciano, Luigi Marini, Chris Mattmann, Dave Mattson, Kenton McHenry, Robert McLay, Sheila Miguez, Barbara S. Minsker, María S. Pérez-Hernández, Dan Ryan, Mats Rynge, Idafen Santana-Perez, Mahadev Satyanarayanan, Gloriana St. Clair, Keith Webster, Eivind Hovig, Daniel S. Katz, Sophie Kay, Geir Kjetil Sandve, David Skinner, Gabrielle Allen, John Cazes, Kym Won Cho, James Fonseca, Lorraine J. Hwang, Lars Koesterke, Pragnesh Patel, Line Pouchard, Edward Seidel, Isuru Suriarachchi - Show less +51 more

01 Jan 2014

TL;DR: The Reproducibility@xsede workshop as discussed by the authors focused on reproducibility in large-scale computational research and highlighted four areas of particular interest to XSEDE: documentation and training that promotes reproducible research; system-level tools that provide build and run-time information at the level of the individual job; the need to model best practices in research collaborations involving Extreme Science and Engineering Discovery Environment staff; and continued work on gateways and related technologies.

...read moreread less

Abstract: This is the final report on reproducibility@xsede, a one-day workshop held in conjunction with XSEDE14, the annual conference of the Extreme Science and Engineering Discovery Environment (XSEDE). The workshop's discussion-oriented agenda focused on reproducibility in large-scale computational research. Two important themes capture the spirit of the workshop submissions and discussions: (1) organizational stakeholders, especially supercomputer centers, are in a unique position to promote, enable, and support reproducible research; and (2) individual researchers should conduct each experiment as though someone will replicate that experiment. Participants documented numerous issues, questions, technologies, practices, and potentially promising initiatives emerging from the discussion, but also highlighted four areas of particular interest to XSEDE: (1) documentation and training that promotes reproducible research; (2) system-level tools that provide build- and run-time information at the level of the individual job; (3) the need to model best practices in research collaborations involving XSEDE staff; and (4) continued work on gateways and related technologies. In addition, an intriguing question emerged from the day's interactions: would there be value in establishing an annual award for excellence in reproducible research? Overview

...read moreread less

15 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Computing environments for reproducibility: Capturing the “Whole Tale”

[...]

Adam Brinckman¹, Kyle Chard², Kyle Chard³, Niall Gaffney⁴, Mihael Hategan², Mihael Hategan³, Matthew B. Jones⁵, Kacper Kowalik⁶, Sivakumar Kulasekaran⁴, Bertram Ludäscher⁶, Bryce Mecum⁵, Jarek Nabrzyski¹, Victoria Stodden⁶, Ian Taylor¹, Ian Taylor⁷, Matthew J. Turk⁶, Kandace Turner⁶ - Show less +13 more•Institutions (7)

University of Notre Dame¹, University of Chicago², Argonne National Laboratory³, University of Texas at Austin⁴, University of California, Santa Barbara⁵, University of Illinois at Urbana–Champaign⁶, Cardiff University⁷

01 May 2019-Future Generation Computer Systems

TL;DR: The Whole Tale project as discussed by the authors aims to connect computational, data-intensive research efforts with the larger research process, transforming the knowledge discovery and dissemination process into one where data products are united with research articles to create "living publications" or tales.

...read moreread less

99 citations

Journal Article•DOI•

Reproducibility in Scientific Computing

[...]

Peter Ivie¹, Douglas Thain¹•Institutions (1)

University of Notre Dame¹

16 Jul 2018-ACM Computing Surveys

TL;DR: The many objectives and meanings of reproducibility are discussed within the context of scientific computing, and technical barriers to reproducible work are described.

...read moreread less

Abstract: Reproducibility is widely considered to be an essential requirement of the scientific process. However, a number of serious concerns have been raised recently, questioning whether today’s computational work is adequately reproducible. In principle, it should be possible to specify a computation to sufficient detail that anyone should be able to reproduce it exactly. But in practice, there are fundamental, technical, and social barriers to doing so. The many objectives and meanings of reproducibility are discussed within the context of scientific computing. Technical barriers to reproducibility are described, extant approaches surveyed, and open areas of research are identified.

...read moreread less

91 citations

Journal Article•DOI•

Scientific Workflows: Moving Across Paradigms

[...]

Chee Sun Liew¹, Malcolm Atkinson², Michelle Galea², Tan Fong Ang¹, Paul Martin³, Jano van Hemert - Show less +2 more•Institutions (3)

University of Malaya¹, University of Edinburgh², University of Amsterdam³

12 Dec 2016-ACM Computing Surveys

TL;DR: A taxonomy of workflow management system (WMS) characteristics is proposed, including aspects previously overlooked, that frames a review of prevalent WMSs used by the scientific community, elucidates their evolution to handle the challenges arising with the emergence of the “fourth paradigm,” and identifies research needed to maintain progress.

...read moreread less

Abstract: Modern scientific collaborations have opened up the opportunity to solve complex problems that require both multidisciplinary expertise and large-scale computational experiments. These experiments typically consist of a sequence of processing steps that need to be executed on selected computing platforms. Execution poses a challenge, however, due to (1) the complexity and diversity of applications, (2) the diversity of analysis goals, (3) the heterogeneity of computing platforms, and (4) the volume and distribution of data. A common strategy to make these in silico experiments more manageable is to model them as workflows and to use a workflow management system to organize their execution. This article looks at the overall challenge posed by a new order of scientific experiments and the systems they need to be run on, and examines how this challenge can be addressed by workflows and workflow management systems. It proposes a taxonomy of workflow management system (WMS) characteristics, including aspects previously overlooked. This frames a review of prevalent WMSs used by the scientific community, elucidates their evolution to handle the challenges arising with the emergence of the “fourth paradigm,” and identifies research needed to maintain progress in this area.

...read moreread less

82 citations

Journal Article•DOI•

Does density of green infrastructure predict preference

[...]

Pongsakorn Suppakittpaisarn¹, Bin Jiang², Marcus Slavenas³, William C. Sullivan³•Institutions (3)

Chiang Mai University¹, University of Hong Kong², University of Illinois at Urbana–Champaign³

01 Apr 2019-Urban Forestry & Urban Greening

TL;DR: The authors found that, overall, tree density and understory vegetation density are positively associated with preference in a power-curve relationship, even though the nature of the relationship between bioretention density and preference remains unclear.

...read moreread less

45 citations

Journal Article•DOI•

Assessing and tracing the outcomes and impact of research infrastructures

[...]

Matthew S. Mayernik¹, David Hart¹, K. E. Maull¹, Nicholas Weber•Institutions (1)

National Center for Atmospheric Research¹

01 Jun 2017-Journal of the Association for Information Science and Technology

TL;DR: This article discusses how research infrastructures are identified and referenced by scholars in the research literature and how those references are being collected and analyzed for the purposes of evaluating impact and identifies notable challenges that impede the analysis of impact metrics.

...read moreread less

Abstract: Recent policy shifts on the part of funding agencies and journal publishers are causing changes in the acknowledgment and citation behaviors of scholars. A growing emphasis on open science and reproducibility is changing how authors cite and acknowledge “research infrastructures”—entities that are used as inputs to or as underlying foundations for scholarly research, including data sets, software packages, computational models, observational platforms, and computing facilities. At the same time, stakeholder interest in quantitative understanding of impact is spurring increased collection and analysis of metrics related to use of research infrastructures. This article reviews work spanning several decades on tracing and assessing the outcomes and impacts from these kinds of research infrastructures. We discuss how research infrastructures are identified and referenced by scholars in the research literature and how those references are being collected and analyzed for the purposes of evaluating impact. Synthesizing common features of a wide range of studies, we identify notable challenges that impede the analysis of impact metrics for research infrastructures and outline key open research questions that can guide future research and applications related to such metrics.

...read moreread less

43 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Collapse