scispace - formally typeset
Search or ask a question
Author

Jianfeng Liu

Bio: Jianfeng Liu is an academic researcher. The author has contributed to research in topics: DBSCAN & Public transport. The author has an hindex of 4, co-authored 5 publications receiving 548 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an efficient and effective data-mining procedure that models the travel patterns of transit riders in Beijing, China and identified trip chains based on the temporal and spatial characteristics of their smart card transaction data.
Abstract: To mitigate the congestion caused by the ever increasing number of privately owned automobiles, public transit is highly promoted by transportation agencies worldwide. A better understanding of travel patterns and regularity at the “magnitude” level will enable transit authorities to evaluate the services they offer, adjust marketing strategies, retain loyal customers and improve overall transit performance. However, it is fairly challenging to identify travel patterns for individual transit riders in a large dataset. This paper proposes an efficient and effective data-mining procedure that models the travel patterns of transit riders in Beijing, China. Transit riders’ trip chains are identified based on the temporal and spatial characteristics of their smart card transaction data. The Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm then analyzes the identified trip chains to detect transit riders’ historical travel patterns and the K-Means++ clustering algorithm and the rough-set theory are jointly applied to cluster and classify travel pattern regularities. The performance of the rough-set-based algorithm is compared with those of other prevailing classification algorithms. The results indicate that the proposed rough-set-based algorithm outperforms other commonly used data-mining algorithms in terms of accuracy and efficiency.

510 citations

01 Jan 2013
TL;DR: This paper proposes an efficient and effective data-mining procedure that models the travel patterns of transit riders in Beijing, China and indicates that the proposed rough-set-based algorithm outperforms other commonly used data- mining algorithms in terms of accuracy and efficiency.
Abstract: To mitigate congestion caused by the increasing number of privately owned automobiles, public transit is highly promoted by transportation agencies worldwide. With a better understanding of the travel patterns and regularity (the “magnitude” level of travel pattern) of transit riders, transit authorities can evaluate the current transit services to adjust marketing strategies, keep loyal customers and improve transit performance. However, it is fairly challenging to identify travel pattern for each individual transit rider in a large dataset. Therefore, this paper proposes an efficient and effective data-mining approach that models the travel patterns of transit riders using the smart card data collected in Beijing, China. Transit riders’ trip chains are identified based on the temporal and spatial characteristics of smart card transaction data. Based on the identified trip chains, the Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used to detect each transit rider’s historical travel patterns. The K-Means++ clustering algorithm and the rough-set theory are jointly applied to clustering and classifying the travel pattern regularities. The rough-set-based algorithm is compared with other classification algorithms, including Naive Bayes Classifier, C4.5 Decision Tree, K-Nearest Neighbor (KNN) and three-hidden-layers Neural Network. The results indicate that the proposed rough-set-based algorithm outperforms other prevailing data-mining algorithms in terms of accuracy and efficiency.

90 citations

Journal ArticleDOI
TL;DR: To extract passengers’ origin data from recorded SC transaction information, a Markov chain based Bayesian decision tree algorithm is developed in this study and verified with transit vehicles equipped with global positioning system (GPS) data loggers.
Abstract: The automated fare collection (AFC) system, also known as the transit smart card (SC) system, has gained more and more popularity among transit agencies worldwide. Compared with the conventional manual fare collection system, an AFC system has its inherent advantages in low labor cost and high efficiency for fare collection and transaction data archival. Although it is possible to collect highly valuable data from transit SC transactions, substantial efforts and methodologies are needed for extracting such data because most AFC systems are not initially designed for data collection. This is true especially for the Beijing AFC system, where a passenger’s boarding stop (origin) on a flat-rate bus is not recorded on the check-in scan. To extract passengers’ origin data from recorded SC transaction information, a Markov chain based Bayesian decision tree algorithm is developed in this study. Using the time invariance property of the Markov chain, the algorithm is further optimized and simplified to have a linear computational complexity. This algorithm is verified with transit vehicles equipped with global positioning system (GPS) data loggers. Our verification results demonstrated that the proposed algorithm is effective in extracting transit passengers’ origin information from SC transactions with a relatively high accuracy. Such transit origin data are highly valuable for transit system planning and route optimization.

89 citations

Patent
04 May 2011
TL;DR: In this paper, a method for reckoning getting-on stops on the basis of data of a one-ticket public-transport integrated circuit (IC) card is presented, which is based on a Bayesian decision tree method.
Abstract: The invention discloses a method for reckoning getting-on stops on the basis of data of a one-ticket public-transport integrated circuit (IC) card, comprising the following steps: a, carrying out clustering on card-swiping records of a target bus according to the time of the card-swiping records, taking the time-adjacent card-swiping records as a cluster, each cluster corresponds to one stop, and forming a stop sequence to be recognized; b, determining transfer information according to adjacent card-swiping records of each IC card; c, according to transfer the information, crosspoint data information between bus routes and stop information of the bus routes, reckoning the practical stops corresponding to the cluster of the IC card and forming a recognized stop sequence; d, according to the recognized stop sequence, reckoning a practical stop to be recognized corresponding to a cluster to be recognized, and when the recognized stop quantity is less than 2, reckoning by a Bayesian decision tree method featured by mobile step; and when the recognized cluster quantity is more than or equal to 2, adopting a mode recognition method for reckoning. By utilizing the method, the requirement to data is low and the accuracy is higher.

11 citations

Proceedings ArticleDOI
29 Jun 2016
TL;DR: Wang et al. as mentioned in this paper categorize Beijing subway ridership characteristics into seven different groups based on their temporal distributions and corresponding land use types by analyzing Beijing subway smart card data, and the heterogeneity among stop-level, line-level and network-level ridership temporal distributions is analyzed.
Abstract: Similar to other metropolitan cities in China, urban rail transportation has been highly emphasized in Beijing in the past decades. However, the growing subway system is seriously challenged by several critical issues, one of which is that the ridership must be restricted at certain stations during peak hours due to crowded conditions. This suppresses the transport demand and pushes passengers to other modes of transport. Therefore, the characteristics of subway ridership should be carefully studied to increase the appeal of subways and maximize their potential. This paper categorizes Beijing subway ridership characteristics into seven different groups based on their temporal distributions and corresponding land use types by analyzing Beijing’s subway smart card data. In addition, the heterogeneity among stop-level, line-level, and network-level ridership temporal distributions is analyzed. Temporal distribution characteristics should be incorporated into ridership prediction and subway network optimization, and can thereby improve subway demand forecasting, planning, and design.

2 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case and presents a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information.

690 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an efficient and effective data-mining procedure that models the travel patterns of transit riders in Beijing, China and identified trip chains based on the temporal and spatial characteristics of their smart card transaction data.
Abstract: To mitigate the congestion caused by the ever increasing number of privately owned automobiles, public transit is highly promoted by transportation agencies worldwide. A better understanding of travel patterns and regularity at the “magnitude” level will enable transit authorities to evaluate the services they offer, adjust marketing strategies, retain loyal customers and improve overall transit performance. However, it is fairly challenging to identify travel patterns for individual transit riders in a large dataset. This paper proposes an efficient and effective data-mining procedure that models the travel patterns of transit riders in Beijing, China. Transit riders’ trip chains are identified based on the temporal and spatial characteristics of their smart card transaction data. The Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm then analyzes the identified trip chains to detect transit riders’ historical travel patterns and the K-Means++ clustering algorithm and the rough-set theory are jointly applied to cluster and classify travel pattern regularities. The performance of the rough-set-based algorithm is compared with those of other prevailing classification algorithms. The results indicate that the proposed rough-set-based algorithm outperforms other commonly used data-mining algorithms in terms of accuracy and efficiency.

510 citations

Journal ArticleDOI
TL;DR: The purpose of this paper is to introduce datasets, concepts, knowledge and methods used in these two fields, and most importantly raise cross-discipline ideas for conversations and collaborations between the two.
Abstract: The last decade has witnessed very active development in two broad, but separate fields, both involving understanding and modeling of how individuals move in time and space (hereafter called "travel behavior analysis" or "human mobility analysis"). One field comprises transportation researchers who have been working in the field for decades and the other involves new comers from a wide range of disciplines, but primarily computer scientists and physicists. Researchers in these two fields work with different datasets, apply different methodologies, and answer different but overlapping questions. It is our view that there is much, hidden synergy between the two fields that needs to be brought out. It is thus the purpose of this paper to introduce datasets, concepts, knowledge and methods used in these two fields, and most importantly raise cross-discipline ideas for conversations and collaborations between the two. It is our hope that this paper will stimulate many future cross-cutting studies that involve researchers from both fields.

425 citations

Journal ArticleDOI
TL;DR: In this article, the authors present a taxonomy of machine learning algorithms that can be applied to the data in order to extract higher level information, and a use case of applying Support Vector Machine (SVM) on Aarhus Smart City traffic data is presented for more detailed exploration.
Abstract: Rapid developments in hardware, software, and communication technologies have allowed the emergence of Internet-connected sensory devices that provide observation and data measurement from the physical world. By 2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50 billion. As the numbers grow and technologies become more mature, the volume of data published will increase. Internet-connected devices technology, referred to as Internet of Things (IoT), continues to extend the current Internet by providing connectivity and interaction between the physical and cyber worlds. In addition to increased volume, the IoT generates Big Data characterized by velocity in terms of time and location dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of this Big Data is the key to developing smart IoT applications. This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case. The key contribution of this study is presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information. The potential and challenges of machine learning for IoT data analytics will also be discussed. A use case of applying Support Vector Machine (SVM) on Aarhus Smart City traffic data is presented for a more detailed exploration.

375 citations

01 Jan 2001
TL;DR: The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon it’s 2 happening.
Abstract: Problem Given the number of times in which an unknown event has happened and failed: Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named. SECTION 1 Definition 1. Several events are inconsistent, when if one of them happens, none of the rest can. 2. Two events are contrary when one, or other of them must; and both together cannot happen. 3. An event is said to fail, when it cannot happen; or, which comes to the same thing, when its contrary has happened. 4. An event is said to be determined when it has either happened or failed. 5. The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon it’s 2 happening.

368 citations