scispace - formally typeset
Search or ask a question

Showing papers by "Zhenhui Li published in 2021"


Journal ArticleDOI
TL;DR: In this paper, the authors focus on investigating the recent advances in using reinforcement learning (RL) techniques to solve the traffic signal control problem and classify the known approaches based on the RL techniques they use and provide a review of existing models with analysis on their advantages and disadvantages.
Abstract: Traffic signal control is an important and challenging real-world problem that has recently received a large amount of interest from both transportation and computer science communities. In this survey, we focus on investigating the recent advances in using reinforcement learning (RL) techniques to solve the traffic signal control problem. We classify the known approaches based on the RL techniques they use and provide a review of existing models with analysis on their advantages and disadvantages. Moreover, we give an overview of the simulation environments and experimental settings that have been developed to evaluate the traffic signal control methods. Finally, we explore future directions in the area of RLbased traffic signal control methods. We hope this survey could provide insights to researchers dealing with real-world applications in intelligent transportation systems

78 citations


Journal ArticleDOI
TL;DR: The findings suggest that connected communities can influence each other from a distance and that connectivity to less disadvantaged work hubs may decrease local crime – with implications for advancing knowledge on the relational ecology of crime, social isolation, and ecological networks.
Abstract: Research on communities and crime has predominantly focused on social conditions within an area or in its immediate proximity. However, a growing body of research shows that people often travel to ...

17 citations


Proceedings ArticleDOI
19 Apr 2021
TL;DR: Zhang et al. as mentioned in this paper proposed to use pervasive speed data to recover the temporal origin-destination (TOD) of vehicles and use other mobility data as auxiliary data.
Abstract: Understanding city-wide traffic problems may benefit many downstream applications, such as city planning and public transportation development. One key step to understand traffic is to reveal how many people travel from one location to another during one period (we call TOD, short for temporal origin-destination). With TOD, we can rebuild the city-wide traffic by simulating the volume and speed on each road segment.Frequently used mobility data, e.g., GPS trajectories, surveillance cameras, can only cover a subset of vehicles or selected regions of the city. Hence, we propose to use pervasive speed data to recover TOD, and use other mobility data as auxiliary data. To the best of our knowledge, we are the first to work on this challenging problem. It is highly challenging because the speed is generated from a complex process from TOD, and there exists multiple TOD distributions that may generate similar city-wide road speed observations. We propose a new method that models the complex process via separate modules and takes auxiliary data to eliminate infeasible solutions. Extensive experiments on synthetic and real datasets have shown the superior performance of our model over baselines.

4 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a method to understand when and where certain apps are used by users, but it has been a challenging problem due to the highly skewed annealing of the data.
Abstract: Both app developers and service providers have strong motivations to understand when and where certain apps are used by users. However, it has been a challenging problem due to the highly skewed an...

3 citations


Posted Content
Hua Wei1, Deheng Ye1, Zhao Liu1, Hao Wu1, Bo Yuan1, Qiang Fu1, Wei Yang1, Zhenhui Li2 
TL;DR: In this paper, a residual generative model is proposed to reduce policy approximation error for offline RL, which can learn more accurate policy approximations in different benchmark datasets and can learn competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game Honor of Kings.
Abstract: Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration. Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game Honor of Kings.

2 citations


DOI
02 Nov 2021
TL;DR: In this paper, a probabilistic spatial demand simulator (PSD-sim) is proposed to learn important spatial patterns in offline retail, which can be used to train policy estimators that discover intelligent, spatial allocation strategies.
Abstract: Connecting consumers with relevant products is a very important problem in both online and offline commerce. In many offline retail settings, product distributors extend bids to place and manage product displays within a retail outlet. The distributor aims to choose a spatial allocation strategy that maximizes revenue given a preset budget constraint. Prior work shows that carefully selecting product locations within a store can minimize search costs and and induce consumers to make "impulse" purchases. Such impulse purchases are influenced by the spatial configuration of the store. However, learning important spatial patterns in offline retail is challenging due to the scarcity of data and the high cost of exploration and experimentation in the physical world. To address these challenges, we propose a stochastic model of spatial demand in physical retail, which we call the Probabilistic Spatial Demand Simulator (PSD-sim). PSD-sim is an effective mirror of the real environment because it exploits the structure of common retail datasets through a hierarchical parameter sharing structure, and is able to incorporate spatial and economic knowledge through informative priors. We show that PSD-sim can both recover ground truth test data better than baselines, and generate new data for unseen states. The simulator can naturally be used to train policy estimators that discover intelligent, spatial allocation strategies. Finally, we perform a preliminary study into different optimization techniques and find that Deep Q-Learning can learn an effective spatial allocation policy.

1 citations


Posted Content
TL;DR: In this article, the authors propose a novel framework ImInGAIL to address the problem of learning to simulate the driving behavior from sparse real-world data, which incorporates data interpolation with the behavior learning process of imitation learning.
Abstract: Simulation of the real-world traffic can be used to help validate the transportation policies. A good simulator means the simulated traffic is similar to real-world traffic, which often requires dense traffic trajectories (i.e., with a high sampling rate) to cover dynamic situations in the real world. However, in most cases, the real-world trajectories are sparse, which makes simulation challenging. In this paper, we present a novel framework ImInGAIL to address the problem of learning to simulate the driving behavior from sparse real-world data. The proposed architecture incorporates data interpolation with the behavior learning process of imitation learning. To the best of our knowledge, we are the first to tackle the data sparsity issue for behavior learning problems. We investigate our framework on both synthetic and real-world trajectory datasets of driving vehicles, showing that our method outperforms various baselines and state-of-the-art methods.

1 citations


Proceedings Article
18 May 2021
TL;DR: In this article, the authors propose Neural Utility Functions, which directly optimize the gradients of a neural network so that they are more consistent with utility theory, a mathematical framework for modeling choice among items.
Abstract: Current neural network architectures have no mechanism for explicitly reasoning about item trade-offs. Such trade-offs are important for popular tasks such as recommendation. The main idea of this work is to give neural networks inductive biases that are inspired by economic theories. To this end, we propose Neural Utility Functions, which directly optimize the gradients of a neural network so that they are more consistent with utility theory, a mathematical framework for modeling choice among items. We demonstrate that Neural Utility Functions can recover theoretical item relationships better than vanilla neural networks, analytically show existing neural networks are not quasi-concave and do not inherently reason about trade-offs, and that augmenting existing models with a utility loss function improves recommendation results. The Neural Utility Functions we propose are theoretically motivated, and yield strong empirical results.

Posted Content
TL;DR: Zhang et al. as discussed by the authors proposed a theory-guided residual network model, where the theoretical part can emphasize the general principles for human routing decisions, and the residual part can capture drivable condition preferences (e.g., local road or highway).
Abstract: The heavy traffic and related issues have always been concerns for modern cities. With the help of deep learning and reinforcement learning, people have proposed various policies to solve these traffic-related problems, such as smart traffic signal control systems and taxi dispatching systems. People usually validate these policies in a city simulator, since directly applying them in the real city introduces real cost. However, these policies validated in the city simulator may fail in the real city if the simulator is significantly different from the real world. To tackle this problem, we need to build a real-like traffic simulation system. Therefore, in this paper, we propose to learn the human routing model, which is one of the most essential part in the traffic simulator. This problem has two major challenges. First, human routing decisions are determined by multiple factors, besides the common time and distance factor. Second, current historical routes data usually covers just a small portion of vehicles, due to privacy and device availability issues. To address these problems, we propose a theory-guided residual network model, where the theoretical part can emphasize the general principles for human routing decisions (e.g., fastest route), and the residual part can capture drivable condition preferences (e.g., local road or highway). Since the theoretical part is composed of traditional shortest path algorithms that do not need data to train, our residual network can learn human routing models from limited data. We have conducted extensive experiments on multiple real-world datasets to show the superior performance of our model, especially with small data. Besides, we have also illustrated why our model is better at recovering real routes through case studies.



Posted Content
TL;DR: In this article, the authors formulate traffic simulation as an inverse reinforcement learning problem, and propose a parameter sharing adversarial inverse RL model for dynamics-robust simulation learning, which is able to imitate a vehicle's trajectories in the real world while simultaneously recovering the reward function that reveals the vehicle's true objective which is invariant to different dynamics.
Abstract: Traffic simulators act as an essential component in the operating and planning of transportation systems. Conventional traffic simulators usually employ a calibrated physical car-following model to describe vehicles' behaviors and their interactions with traffic environment. However, there is no universal physical model that can accurately predict the pattern of vehicle's behaviors in different situations. A fixed physical model tends to be less effective in a complicated environment given the non-stationary nature of traffic dynamics. In this paper, we formulate traffic simulation as an inverse reinforcement learning problem, and propose a parameter sharing adversarial inverse reinforcement learning model for dynamics-robust simulation learning. Our proposed model is able to imitate a vehicle's trajectories in the real world while simultaneously recovering the reward function that reveals the vehicle's true objective which is invariant to different dynamics. Extensive experiments on synthetic and real-world datasets show the superior performance of our approach compared to state-of-the-art methods and its robustness to variant dynamics of traffic.