scispace - formally typeset
Search or ask a question
Author

Shan Jiang

Bio: Shan Jiang is an academic researcher from Tufts University. The author has contributed to research in topics: Materials science & Medicine. The author has an hindex of 14, co-authored 28 publications receiving 1836 citations. Previous affiliations of Shan Jiang include Massachusetts Institute of Technology.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: This work presents methods to estimate average daily origin–destination trips from triangulated mobile phone records of millions of anonymized users, which form the basis for much of the analysis and modeling that inform transportation planning and investments.
Abstract: In this work, we present methods to estimate average daily origin–destination trips from triangulated mobile phone records of millions of anonymized users. These records are first converted into clustered locations at which users engage in activities for an observed duration. These locations are inferred to be home, work, or other depending on observation frequency, day of week, and time of day, and represent a user’s origins and destinations. Since the arrival time and duration at these locations reflect the observed (based on phone usage) rather than true arrival time and duration of a user, we probabilistically infer departure time using survey data on trips in major US cities. Trips are then constructed for each user between two consecutive observations in a day. These trips are multiplied by expansion factors based on the population of a user’s home Census Tract and divided by the number of days on which we observed the user, distilling average daily trips. Aggregating individuals’ daily trips by Census Tract pair, hour of the day, and trip purpose results in trip matrices that form the basis for much of the analysis and modeling that inform transportation planning and investments. The applicability of the proposed methodology is supported by validation against the temporal and spatial distributions of trips reported in local and national surveys.

500 citations

Journal ArticleDOI
TL;DR: This research provides an innovative data mining framework that synthesizes the state-of-the-art techniques in extracting mobility patterns from raw mobile phone CDR data, and design a pipeline that can translate the massive and passive mobile phone records to meaningful spatial human mobility patterns readily interpretable for urban and transportation planning purposes.
Abstract: In this study, with Singapore as an example, we demonstrate how we can use mobile phone call detail record (CDR) data, which contains millions of anonymous users, to extract individual mobility networks comparable to the activity-based approach. Such an approach is widely used in the transportation planning practice to develop urban micro simulations of individual daily activities and travel; yet it depends highly on detailed travel survey data to capture individual activity-based behavior. We provide an innovative data mining framework that synthesizes the state-of-the-art techniques in extracting mobility patterns from raw mobile phone CDR data, and design a pipeline that can translate the massive and passive mobile phone records to meaningful spatial human mobility patterns readily interpretable for urban and transportation planning purposes. With growing ubiquitous mobile sensing, and shrinking labor and fiscal resources in the public sector globally, the method presented in this research can be used as a low-cost alternative for transportation and planning agencies to understand the human activity patterns in cities, and provide targeted plans for future sustainable development.

351 citations

Journal ArticleDOI
TL;DR: This work analyzes an activity-based travel survey conducted in the Chicago metropolitan area over a demographic representative sample of its population and finds that the population can be clustered into 8 and 7 representative groups according to their activities during weekdays and weekends, respectively.
Abstract: Data mining and statistical learning techniques are powerful analysis tools yet to be incorporated in the domain of urban studies and transportation research. In this work, we analyze an activity-based travel survey conducted in the Chicago metropolitan area over a demographic representative sample of its population. Detailed data on activities by time of day were collected from more than 30,000 individuals (and 10,552 households) who participated in a 1-day or 2-day survey implemented from January 2007 to February 2008. We examine this large-scale data in order to explore three critical issues: (1) the inherent daily activity structure of individuals in a metropolitan area, (2) the variation of individual daily activities—how they grow and fade over time, and (3) clusters of individual behaviors and the revelation of their related socio-demographic information. We find that the population can be clustered into 8 and 7 representative groups according to their activities during weekdays and weekends, respectively. Our results enrich the traditional divisions consisting of only three groups (workers, students and non-workers) and provide clusters based on activities of different time of day. The generated clusters combined with social demographic information provide a new perspective for urban and transportation planning as well as for emergency response and spreading dynamics, by addressing when, where, and how individuals interact with places in metropolitan areas.

263 citations

Proceedings ArticleDOI
11 Aug 2013
TL;DR: Three classes of methods to extract information from triangulated mobile phone signals are presented, and applications with different goals in spatiotemporal analysis and urban modeling are described, to demonstrate the state-of-the-art algorithms that can be adapted to triangulation data for the context of urban computing and modeling applications.
Abstract: In this work, we present three classes of methods to extract information from triangulated mobile phone signals, and describe applications with different goals in spatiotemporal analysis and urban modeling. Our first challenge is to relate extracted information from phone records (i.e., a set of time-stamped coordinates estimated from signal strengths) with destinations by each of the million anonymous users. By demonstrating a method that converts phone signals into small grid cell destinations, we present a framework that bridges triangulated mobile phone data with previously established findings obtained from data at more coarse-grained resolutions (such as at the cell tower or census tract levels). In particular, this method allows us to relate daily mobility networks, called motifs here, with trip chains extracted from travel diary surveys. Compared with existing travel demand models mainly relying on expensive and less-frequent travel survey data, this method represents an advantage for applying ubiquitous mobile phone data to urban and transportation modeling applications. Second, we present a method that takes advantage of the high spatial resolution of the triangulated phone data to infer trip purposes by examining semantic-enriched land uses surrounding destinations in individual's motifs. In the final section, we discuss a portable computational architecture that allows us to manage and analyze mobile phone data in geospatial databases, and to map mobile phone trips onto spatial networks such that further analysis about flows and network performances can be done. The combination of these three methods demonstrate the state-of-the-art algorithms that can be adapted to triangulated mobile phone data for the context of urban computing and modeling applications.

222 citations

01 Jan 2015
TL;DR: This study demonstrates a complete process that comprises the collection, unification, classification and validation of a type of VGI—online point-of-interest (POI) data—and develops methods to utilize such POI data to estimate disaggregated land use (i.e., employment size by category) at a very high spatial resolution (census block level).
Abstract: Singapore-MIT Alliance for Research and Technology (Singapore. National Research Foundation)

216 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal Article
TL;DR: A Treatise on the Family by G. S. Becker as discussed by the authors is one of the most famous and influential economists of the second half of the 20th century, a fervent contributor to and expounder of the University of Chicago free-market philosophy, and winner of the 1992 Nobel Prize in economics.
Abstract: A Treatise on the Family. G. S. Becker. Cambridge, MA: Harvard University Press. 1981. Gary Becker is one of the most famous and influential economists of the second half of the 20th century, a fervent contributor to and expounder of the University of Chicago free-market philosophy, and winner of the 1992 Nobel Prize in economics. Although any book with the word "treatise" in its title is clearly intended to have an impact, one coming from someone as brilliant and controversial as Becker certainly had such a lofty goal. It has received many article-length reviews in several disciplines (Ben-Porath, 1982; Bergmann, 1995; Foster, 1993; Hannan, 1982), which is one measure of its scholarly importance, and yet its impact is, I think, less than it may have initially appeared, especially for scholars with substantive interests in the family. This book is, its title notwithstanding, more about economics and the economic approach to behavior than about the family. In the first sentence of the preface, Becker writes "In this book, I develop an economic or rational choice approach to the family." Lest anyone accuse him of focusing on traditional (i.e., material) economics topics, such as family income, poverty, and labor supply, he immediately emphasizes that those topics are not his focus. "My intent is more ambitious: to analyze marriage, births, divorce, division of labor in households, prestige, and other non-material behavior with the tools and framework developed for material behavior." Indeed, the book includes chapters on many of these issues. One chapter examines the principles of the efficient division of labor in households, three analyze marriage and divorce, three analyze various child-related issues (fertility and intergenerational mobility), and others focus on broader family issues, such as intrafamily resource allocation. His analysis is not, he believes, constrained by time or place. His intention is "to present a comprehensive analysis that is applicable, at least in part, to families in the past as well as the present, in primitive as well as modern societies, and in Eastern as well as Western cultures." His tone is profoundly conservative and utterly skeptical of any constructive role for government programs. There is a clear sense of how much better things were in the old days of a genderbased division of labor and low market-work rates for married women. Indeed, Becker is ready and able to show in Chapter 2 that such a state of affairs was efficient and induced not by market or societal discrimination (although he allows that it might exist) but by small underlying household productivity differences that arise primarily from what he refers to as "complementarities" between caring for young children while carrying another to term. Most family scholars would probably find that an unconvincingly simple explanation for a profound and complex phenomenon. What, then, is the salient contribution of Treatise on the Family? It is not literally the idea that economics could be applied to the nonmarket sector and to family life because Becker had already established that with considerable success and influence. At its core, microeconomics is simple, characterized by a belief in the importance of prices and markets, the role of self-interested or rational behavior, and, somewhat less centrally, the stability of preferences. It was Becker's singular and invaluable contribution to appreciate that the behaviors potentially amenable to the economic approach were not limited to phenomenon with explicit monetary prices and formal markets. Indeed, during the late 1950s and throughout the 1960s, he did undeniably important and pioneering work extending the domain of economics to such topics as labor market discrimination, fertility, crime, human capital, household production, and the allocation of time. Nor is Becker's contribution the detailed analyses themselves. Many of them are, frankly, odd, idiosyncratic, and off-putting. …

4,817 citations

Journal ArticleDOI
TL;DR: The concept of urban computing is introduced, discussing its general framework and key challenges from the perspective of computer sciences, and the typical technologies that are needed in urban computing are summarized into four folds.
Abstract: Urbanization's rapid progress has modernized many people's lives but also engendered big issues, such as traffic congestion, energy consumption, and pollution. Urban computing aims to tackle these issues by using the data that has been generated in cities (e.g., traffic flow, human mobility, and geographical data). Urban computing connects urban sensing, data management, data analytics, and service providing into a recurrent process for an unobtrusive and continuous improvement of people's lives, city operation systems, and the environment. Urban computing is an interdisciplinary field where computer sciences meet conventional city-related fields, like transportation, civil engineering, environment, economy, ecology, and sociology in the context of urban spaces. This article first introduces the concept of urban computing, discussing its general framework and key challenges from the perspective of computer sciences. Second, we classify the applications of urban computing into seven categories, consisting of urban planning, transportation, the environment, energy, social, economy, and public safety and security, presenting representative scenarios in each category. Third, we summarize the typical technologies that are needed in urban computing into four folds, which are about urban sensing, urban data management, knowledge fusion across heterogeneous data, and urban data visualization. Finally, we give an outlook on the future of urban computing, suggesting a few research topics that are somehow missing in the community.

1,290 citations