Home
/
Authors
/
Aidan Curtis

Author

Aidan Curtis

Other affiliations: Massachusetts Institute of Technology

Bio: Aidan Curtis is an academic researcher from Rice University. The author has contributed to research in topics: Computer science & Motion planning. The author has an hindex of 3, co-authored 11 publications receiving 64 citations. Previous affiliations of Aidan Curtis include Massachusetts Institute of Technology.

Papers

PDF

Open Access

More filters

Posted Content•

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

[...]

Chuang Gan¹, Jeremy Schwartz, Seth Alter, Martin Schrimpf², James Traer¹, Julian De Freitas³, Jonas Kubilius², Abhishek Bhandwaldar⁴, Nick Haber⁵, Megumi Sano⁵, Kuno Kim⁵, Elias Wang, Damian Mrowca⁵, Michael Lingelbach, Aidan Curtis⁶, Kevin T. Feigelis⁵, Daniel M. Bear⁵, Dan Gutfreund⁴, David D. Cox⁴, James J. DiCarlo², Josh H. McDermott¹, Joshua B. Tenenbaum¹, Daniel L. K. Yamins⁵ - Show less +19 more•Institutions (6)

Massachusetts Institute of Technology¹, McGovern Institute for Brain Research², Harvard University³, IBM⁴, Stanford University⁵, Rice University⁶

09 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: Initial experiments enabled by the ThreeDWorld platform are presented, including multi-modal physical scene understanding, multi-agent interactions, models that "learn like a child", and attention studies in humans and neural networks.

...read moreread less

Abstract: We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. With TDW, users can simulate high-fidelity sensory data and physical interactions between mobile agents and objects in a wide variety of rich 3D environments. TDW has several unique properties: 1) realtime near photo-realistic image rendering quality; 2) a library of objects and environments with materials for high-quality rendering, and routines enabling user customization of the asset library; 3) generative procedures for efficiently building classes of new environments 4) high-fidelity audio rendering; 5) believable and realistic physical interactions for a wide variety of material types, including cloths, liquid, and deformable objects; 6) a range of "avatar" types that serve as embodiments of AI agents, with the option for user avatar customization; and 7) support for human interactions with VR devices. TDW also provides a rich API enabling multiple agents to interact within a simulation and return a range of sensor and physics data representing the state of the world. We present initial experiments enabled by the platform around emerging research directions in computer vision, machine learning, and cognitive science, including multi-modal physical scene understanding, multi-agent interactions, models that "learn like a child", and attention studies in humans and neural networks. The simulation platform will be made publicly available.

...read moreread less

151 citations

Posted Content•

Planning with Learned Object Importance in Large Problem Instances using Graph Neural Networks

[...]

Tom Silver¹, Rohan Chitnis¹, Aidan Curtis¹, Joshua B. Tenenbaum¹, Tomás Lozano-Pérez¹, Leslie Pack Kaelbling¹ - Show less +2 more•Institutions (1)

Massachusetts Institute of Technology¹

11 Sep 2020-arXiv: Learning

TL;DR: This work proposes a graph neural network architecture for predicting object importance in a single pass, thereby incurring little overhead while substantially reducing the number of objects that must be considered by the planner.

...read moreread less

Abstract: Real-world planning problems often involve hundreds or even thousands of objects, straining the limits of modern planners. In this work, we address this challenge by learning to predict a small set of objects that, taken together, would be sufficient for finding a plan. We propose a graph neural network architecture for predicting object importance in a single inference pass, thus incurring little overhead while greatly reducing the number of objects that must be considered by the planner. Our approach treats the planner and transition model as black boxes, and can be used with any off-the-shelf planner. Empirically, across classical planning, probabilistic planning, and robotic task and motion planning, we find that our method results in planning that is significantly faster than several baselines, including other partial grounding strategies and lifted planners. We conclude that learning to predict a sufficient set of objects for a planning problem is a simple, powerful, and general mechanism for planning in large instances. Video: this https URL Code: this https URL

...read moreread less

43 citations

Posted Content•DOI•

A Spatiotemporal Map of Reading Aloud

[...]

Oscar Woolnough¹, Cristian Donos², Aidan Curtis, Patrick S. Rollo¹, Roccaforte Zj¹, Stanislas Dehaene³, Stanislas Dehaene⁴, Simon Fischer-Baum⁵, Nitin Tandon¹, Nitin Tandon⁶ - Show less +6 more•Institutions (6)

University of Texas Health Science Center at Houston¹, University of Bucharest², Université Paris-Saclay³, Collège de France⁴, Rice University⁵, Memorial Hermann Healthcare System⁶

23 May 2021-bioRxiv

TL;DR: This article found that lexicality is encoded by early activity in mid-fusiform (mFus) cortex and precentral sulcus, followed by later engagement of the inferior frontal gyrus (IFG) and inferior parietal sulcus (IPS), and orthographic neighborhood is encoded solely in the IPS.

...read moreread less

Abstract: Reading words aloud is a foundational aspect of the acquisition of literacy. The rapid rate at which multiple distributed neural substrates are engaged in this process can only be probed via techniques with high spatiotemporal resolution. We used direct intracranial recordings in a large cohort to create a holistic yet fine-grained map of word processing, enabling us to derive the spatiotemporal neural codes of multiple word attributes critical to reading: lexicality, word frequency and orthographic neighborhood. We found that lexicality is encoded by early activity in mid-fusiform (mFus) cortex and precentral sulcus. Word frequency is also first represented in mFus followed by later engagement of the inferior frontal gyrus (IFG) and inferior parietal sulcus (IPS), and orthographic neighborhood is encoded solely in the IPS. A lexicality decoder revealed high weightings for electrodes in the mFus, IPS, anterior IFG and the pre-central sulcus. These results elaborate the neural codes underpinning extant dual-route models of reading, with parallel processing via the lexical route, progressing from mFus to IFG, and the sub-lexical route, progressing from IPS to anterior IFG.

...read moreread less

8 citations

Proceedings Article•

ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

[...]

Massachusetts Institute of Technology¹, McGovern Institute for Brain Research², Harvard University³, IBM⁴, Stanford University⁵, Rice University⁶

09 Jul 2020

TL;DR: ThreeDWorld as mentioned in this paper is a platform for interactive multi-modal physical simulation, allowing users to simulate high-fidelity sensory data and physical interactions between mobile agents and objects in a wide variety of rich 3D environments.

...read moreread less

7 citations

Proceedings Article•DOI•

HealthSense: Software-defined Mobile-based Clinical Trials

[...]

Aidan Curtis¹, Amruta Pai¹, Jian Cao¹, Nidal Moukaddam², Ashutosh Sabharwal¹ - Show less +1 more•Institutions (2)

Rice University¹, Baylor College of Medicine²

11 Oct 2019

TL;DR: A software-inspired viewpoint of clinical trial designs is taken to design, develop and validate HealthSense to enable expressibility of complex ideas, composability with diverse devices and services while maximally maintaining simplicity for a clinical research user.

...read moreread less

Abstract: With the rise of ever-more sophisticated wearables and sensing technologies, mobile health continues to be an active area of research. However, from a clinical researcher point of view, testing novel use of the mobile health innovations remains a major hurdle, as composing a clinical trial using a combination of technologies still remains in the realm of computer scientists. We take a software-inspired viewpoint of clinical trial designs to design, develop and validate HealthSense to enable expressibility of complex ideas, composability with diverse devices and services while maximally maintaining simplicity for a clinical research user. A key innovation in HealthSense is the concept of a study state manager (SSM) that modifies parameters of the study over time as data accumulates and can trigger external events that affect the participant; this design allows us to implement nearly arbitrary clinical trial designs. The SSM can funnel data streams to custom or third-party cloud processing pipelines and the result can be used to give interventions and modify parameters of the study. HealthSense supports both Android and iOS platforms and is secure, scalable and fully operational. We outline three trials (two with clinical populations) to highlight simplicity, composability, and expressibility of HealthSense.

...read moreread less

5 citations

Cited by

PDF

Open Access

More filters

Posted Content•

iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes.

[...]

Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi Fan, Guanzhi Wang, Shyamal Buch, Claudia D'Arpino, Sanjana Srivastava, Lyne P. Tchapmi, Micael Tchapmi, Kent Vainio, Li Fei-Fei, Silvio Savarese - Show less +10 more

05 Dec 2020-arXiv: Artificial Intelligence

TL;DR: It is shown that the full interactivity of the scenes enables agents to learn useful visual representations that accelerate the training of downstream manipulation tasks, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of human demonstrated (mobile) manipulation behaviors.

...read moreread less

Abstract: We present iGibson 1.0, a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains 15 fully interactive home-sized scenes with 108 rooms populated with rigid and articulated objects. The scenes are replicas of real-world homes, with distribution and the layout of objects aligned to those of the real world. iGibson 1.0 integrates several key features to facilitate the study of interactive tasks: i) generation of high-quality virtual sensor signals (RGB, depth, segmentation, LiDAR, flow and so on), ii) domain randomization to change the materials of the objects (both visual and physical) and/or their shapes, iii) integrated sampling-based motion planners to generate collision-free trajectories for robot bases and arms, and iv) intuitive human-iGibson interface that enables efficient collection of human demonstrations. Through experiments, we show that the full interactivity of the scenes enables agents to learn useful visual representations that accelerate the training of downstream manipulation tasks. We also show that iGibson 1.0 features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of human demonstrated (mobile) manipulation behaviors. iGibson 1.0 is open-source, equipped with comprehensive examples and documentation. For more information, visit our project website: this http URL

...read moreread less

115 citations

Posted Content•

On the Opportunities and Risks of Foundation Models.

[...]

Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ B. Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri S. Chatterji, Annie Chen, Kathleen Creel, Jared Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel¹, Noah D. Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Ahmad Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf H. Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Yang Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang - Show less +110 more•Institutions (1)

Stanford University¹

16 Aug 2021-arXiv: Learning

TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.

...read moreread less

Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

...read moreread less

76 citations

Proceedings Article•DOI•

Kubric: A scalable dataset generator

[...]

Klaus Greff, Francois Belletti, Lucas Beyer, Charles Doersch, Yilun Du, Daniel Duckworth, David J. Fleet, Danushen Gnanapragasam, Florian Golemo, Charles Herrmann, T. Kipf, Abhijit Kundu, Dmitry Lagun, Issam H. Laradji, Hsueh-Ti Liu, H. Meyer, Yishu Miao, Derek Nowrouzezahrai, Cengiz Oztireli, Etienne Pot, Noha Hamdy Radwan, Daniel Rebain, Sara Sabour, Mehdi Sajjadi, Matan Sela, Vincent Sitzmann, Austin Stone, Deqing Sun, Suhani Vora, Ziyu Wang, Tianhao Wu, Kwang Moo Yi, Fangcheng Zhong, Andrea Tagliasacchi - Show less +30 more

07 Mar 2022

TL;DR: Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data is introduced.

...read moreread less

Abstract: Data is the driving force of machine learning, with the amount and quality of training data often being more important for the performance of a system than architecture and training details. But collecting, processing and annotating real data at scale is difficult, expensive, and frequently raises additional privacy, fairness and legal concerns. Synthetic data is a powerful tool with the potential to address these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent or mitigate problems regarding bias, privacy and licensing. Unfortunately, software tools for effective data generation are less mature than those for architecture design and training, which leads to fragmented generation efforts. To address these problems we introduce Kubric, an open-source Python framework that interfaces with PyBullet and Blender to generate photo-realistic scenes, with rich annotations, and seamlessly scales to large jobs distributed over thousands of machines, and generating TBs of data. We demonstrate the effectiveness of Kubric by presenting a series of 13 different generated datasets for tasks ranging from studying 3D NeRF models to optical flow estimation. We release Kubric, the used assets, all of the generation code, as well as the rendered datasets for reuse and modification.

...read moreread less

71 citations

Book Chapter•DOI•

Foley Music: Learning to Generate Music from Videos

[...]

Chuang Gan¹, Chuang Gan², Deng Huang¹, Peihao Chen¹, Joshua B. Tenenbaum², Antonio Torralba² - Show less +2 more•Institutions (2)

IBM¹, Massachusetts Institute of Technology²

21 Jul 2020

TL;DR: Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments, is introduced and a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements is presented.

...read moreread less

Abstract: In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments. We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings. We then formulate music generation from videos as a motion-to-MIDI translation problem. We present a Graph−Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements. The MIDI event can then be converted to realistic music using an off-the-shelf music synthesizer tool. We demonstrate the effectiveness of our models on videos containing a variety of music performances. Experimental results show that our model outperforms several existing systems in generating music that is pleasant to listen to. More importantly, the MIDI representations are fully interpretable and transparent, thus enabling us to perform music editing flexibly. We encourage the readers to watch the supplementary video with audio turned on to experience the results.

...read moreread less

71 citations

Proceedings Article•DOI•

ManipulaTHOR: A Framework for Visual Object Manipulation

[...]

Kiana Ehsani¹, Winson Han¹, Alvaro Herrasti¹, Eli VanderBilt¹, Luca Weihs¹, Eric Kolve¹, Aniruddha Kembhavi¹, Roozbeh Mottaghi¹ - Show less +4 more•Institutions (1)

Allen Institute for Artificial Intelligence¹

22 Apr 2021

TL;DR: In this paper, the authors propose a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and present a new challenge to the Embodied AI community known as ArmPointNav.

...read moreread less

Abstract: The domain of Embodied AI has recently witnessed substantial progress, particularly in navigating agents within their environments. These early successes have laid the building blocks for the community to tackle tasks that require agents to actively interact with objects in their environment. Object manipulation is an established research domain within the robotics community and poses several challenges including manipulator motion, grasping and long-horizon planning, particularly when dealing with oft-overlooked practical setups involving visually rich and complex scenes, manipulation using mobile agents (as opposed to tabletop manipulation), and generalization to unseen environments and objects. We propose a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework and present a new challenge to the Embodied AI community known as ArmPointNav. This task extends the popular point navigation task [2] to object manipulation and offers new challenges including 3D obstacle avoidance, manipulating objects in the presence of occlusion, and multi-object manipulation that necessitates long term planning. Popular learning paradigms that are successful on PointNav challenges show promise, but leave a large room for improvement.

...read moreread less

63 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

Collapse