Bio: Tao Li is an academic researcher from Nankai University. The author has contributed to research in topics: Language identification & Qualitative research. The author has an hindex of 2, co-authored 5 publications receiving 15 citations.
TL;DR: A mathematical programming model based method is proposed, in which environmental, social and economical factors are considered and the GA is used to optimize AQMS, and results indicate that the proposed method outperforms the method of random site of AQMS.
Abstract: The design of air quality monitoring stations (AQMS) is very important for environmental protection department. The task, however, has been proven to be in the class of nondeterministic polynomial (NP)-hard problem. The powerful search capability of the genetic algorithm (GA) is helpful for selecting optimal monitoring sites. A mathematical programming model based method is proposed, in which environmental, social and economical factors are considered and the GA is used to optimize AQMS. Modelling results indicate that the proposed method outperforms the method of random site of AQMS. The proposed method/framework is also suitable for the design of communication network. For example, the wireless base stations can be well placed by similar method, providing sufficient signal strength for better service with lower cost.
••01 Feb 2017
TL;DR: STRHOG, an extended version of HOG that is helpful for filtering spam images on cloud and a fair comparison with other methods, nearest neighbor classifier is used for the intelligent character recognition.
Abstract: Cloud storage has become an important way for data sharing in recent years. Data protection for data owner and harmful data filtering for data recipients are two non-negligible problems in cloud storage. Illegal or unsuitable messages on cloud have a negative impact on minors and they are easily converted into images to avoid text-based filtering. To detect the spam image with the embedded harmful messages on cloud, soft computing methods are required for intelligent character recognition. HOG, proposed by Dalal and Triggs, has been demonstrated so far to be one of the best features for intelligent character recognition. A pre-defined sliding window is always used for the generation of candidate character images when HOG is applied to recognize the whole word. However, due to the difference in character sizes, the pre-defined window cannot exactly match with each character. Variations on scale and translation usually occur in the character image to be recognized, which have a great influence on the performance of intelligent character recognition. Aiming to solve this problem, STRHOG, an extended version of HOG, is proposed in this paper. Experiments on two public datasets and one our dataset have shown encouraging results for our work. The improved intelligent character recognition is helpful for filtering spam images on cloud. To make a fair comparison with other methods, nearest neighbor classifier is used for the intelligent character recognition. It is expected that the performance should be further improved by using better classifiers such as fuzzy neural network.
TL;DR: This article explored the research topics and paradigms of questionnaire-based quantitative research on MOOCs by reviewing 126 articles available in the SCI and Social Sciences Citation Index (SSCI) databases from January 2015 to August 2020.
Abstract: Massive open online courses (MOOCs) have attracted much interest from educational researchers and practitioners around the world. There has been an increase in empirical studies about MOOCs in recent years, most of which used questionnaire surveys and quantitative methods to collect and analyze data. This study explored the research topics and paradigms of questionnaire-based quantitative research on MOOCs by reviewing 126 articles available in the Science Citation Index (SCI) and Social Sciences Citation Index (SSCI) databases from January 2015 to August 2020. This comprehensive overview showed that: (a) the top three MOOC research topics were the factors influencing learners’ performance, dropout rates and continuance intention to use MOOCs, and assessing MOOCs; (b) for these three topics, many studies designed questionnaires by adding new factors or adjustments to extant theoretical models or survey instruments; and (c) most researchers used descriptive statistics to analyze data, followed by the structural equation model, and reliability and validity analysis. This study elaborated on the relationship of research topics and key factors in the research models by building factors-goals (F-G) graphs. Finally, we proposed some directions and recommendations for future research on MOOCs.
••13 Aug 2015
TL;DR: A framework is proposed for improving OCR performance with the adaptive dictionary, in which text categorization is utilized to construct dictionaries using web data and identify the category of the imaged documents.
Abstract: It has been proven by previous works that OCR is beneficial from reducing dictionary size. In this paper, a framework is proposed for improving OCR performance with the adaptive dictionary, in which text categorization is utilized to construct dictionaries using web data and identify the category of the imaged documents. To facilitate comparison with other existing methods that focus on language identification, an implementation is presented to improve the OCR performance with language adaptive dictionaries. Experimental results demonstrate that the performance of OCR system is significantly improved by the reduced dictionary. Compared with other existing methods for language identification, the proposed method shows a better performance. Also, any other categorization methodology is expected to further reduce the dictionary size. For example, an imaged document with specific language can be further categorized into sport, law, entertainment, etc. by its content.
••23 Aug 2015
TL;DR: A real-time method, named MRRI, is proposed in this paper to identify the machine-readable region from partially blurred document images, which shows a better performance than other two image quality assessment methods.
Abstract: Partial blur sometimes occurs in the document images captured by a camera, which will influence the performance of OCR on the non-blurred text region. A real-time method, named MRRI, is proposed in this paper to identify the machine-readable region from partially blurred document images. Firstly, a reference image is generated by low-pass filtering on the given document image. Secondly, a weight matrix is generated by calculating the structural similarity for each patch. Thirdly, a cost function is minimized to identify the maximum machine-readable region that can be well-recognized by OCR. In experiments, two applications are considered with the identified machine-readable region. On one hand, Tesseract-OCR is used for the word recognition to build index for a given document image. Compared with the results by applying OCR on the whole image, more words are correctly recognized by applying OCR on the identified region. On the other hand, the identified machine-readable region is used to assess the quality of a document image. Compared with other two image quality assessment methods, the machine-readable region based method shows a better performance. Also, MRRI is light and time-saving, which can meet the requirement of real-time applications.
TL;DR: This study presents an effective and feasible procedure for air quality network optimization at a city scale and demonstrated that the algorithm was slightly sensitive to the parameter settings, with the number of generations presented the most significant effect.
Abstract: Air quality monitoring networks play a significant role in identifying the spatiotemporal patterns of air pollution, and they need to be deployed efficiently, with a minimum number of sites. The revision and optimal adjustment of existing monitoring networks is crucial for cities that have undergone rapid urban expansion and experience temporal variations in pollution patterns. The approach based on the Weather Research and Forecasting–California PUFF (WRF-CALPUFF) model and genetic algorithm (GA) was developed to design an optimal monitoring network. The maximization of coverage with minimum overlap and the ability to detect violations of standards were developed as the design objectives for redistributed networks. The non-dominated sorting genetic algorithm was applied to optimize the network size and site locations simultaneously for Shijiazhuang city, one of the most polluted cities in China. The assessment on the current network identified the insufficient spatial coverage of SO2 and NO2 monitoring for the expanding city. The optimization results showed that significant improvements were achieved in multiple objectives by redistributing the original network. Efficient coverage of the resulting designs improved to 60.99% and 76.06% of the urban area for SO2 and NO2, respectively. The redistributing design for multi-pollutant including 8 sites was also proposed, with the spatial representation covered 52.30% of the urban area and the overlapped areas decreased by 85.87% compared with the original network. The abilities to detect violations of standards were not improved as much as the other two objectives due to the conflicting nature between the multiple objectives. Additionally, the results demonstrated that the algorithm was slightly sensitive to the parameter settings, with the number of generations presented the most significant effect. Overall, our study presents an effective and feasible procedure for air quality network optimization at a city scale.
••09 Jan 2019
TL;DR: The proposed enhanced HOG feature extraction method has been used so that the optical character recognition system of spam has been enhanced by using the HOGfeature extraction method in such a way to be both resistant against the character variations on scale and translation and to be computationally cost-effective.
Abstract: Generally, a spam image is an unsolicited message electronically sent to a wide group of arbitrary addresses. Due to attractiveness and more difficult detection, spam images are the most complicated type of spam. One of the ways to encounter the spam images is an optical character recognition, OCR, method. In this paper, the proposed enhanced HOG feature extraction method has been used so that the optical character recognition system of spam has been enhanced by using the HOG feature extraction method in such a way to be both resistant against the character variations on scale and translation and to be computationally cost-effective. For these purposes, two steps of the cropped image and input image size normalization have been added to pre-processing stages. Support vector machine, SVM, was employed for classification. Two heuristic modifications including thickening of the thin characters in the pre-processing stage and non-discrimination in detecting the uppercase and lowercase letters with the same shapes in the classification stage have been also proposed to increase the system recognition accuracy. In the first heuristic modification, when all pixels of the output image are empty (the character is eliminated), the original image was made thicker by one layer. In the second modification, when recognizing the letters, no differentiation was considered between the uppercase and lowercase letters with the same shapes. An average recognition accuracy of the modified HOG method with two heuristic modifications equals 91.61% on Char74K database. Then, an optimum threshold for classification was investigated by ROC curve. The optimal cutoff point was 0.736 with the highest average accuracy, 94.20%, and AUC, area under curve, for ROC and precision–recall, PR, curves were 0.96 and 0.73, respectively. The proposed method was also examined on ICDAR2003 database, and the average accuracy and its optimum using ROC curve were 82.73% and 86.01%, respectively. These results of recognition accuracy and AUC for ROC and PR curve showed an outstanding enhancement in comparison with the best recognition rate of the previous methods.
TL;DR: The sensor deployment optimization problem is solved by a practical and feasible polynomial algorithm, where its solutions are theoretically proven to be guaranteed and the effectiveness of the proposed algorithm is demonstrated by implementation in a real tested space in a university building.
Abstract: This paper addresses the problem of efficiently deploying sensors in spatial environments, e.g., buildings, for the purposes of monitoring spatio-temporal environmental phenomena. By modeling the environmental fields using spatio-temporal Gaussian processes, a new and efficient optimality-cost function of minimizing prediction uncertainties is proposed to find the best sensor locations. Though the environmental processes spatially and temporally vary, the proposed approach of choosing sensor positions is proven not to be affected by time variations, which significantly reduces computational complexity of the optimization problem. The sensor deployment optimization problem is then solved by a practical and feasible polynomial algorithm, where its solutions are theoretically proven to be guaranteed. The proposed method is also theoretically and experimentally compared with the existing works. The effectiveness of the proposed algorithm is demonstrated by implementation in a real tested space in a university building, where the obtained results are highly promising.
TL;DR: This study provided an integration of hybrid multi-criteria decision-making (MCDM) theories and geographical information system (GIS) processes in order to determine suitable areas to establish air quality monitoring stations within Tehran Province.
Abstract: Air pollution is a major concern in some megacities of Iran. Specific cities in the country have reached an extremely harmful level of air pollution which poses a serious risk to the daily lives of Iranians. According to news reports, the air quality index of the city of Tehran hovers around 159, which is more than three times the World Health Organization's advised maximum. For the purpose of air pollution abatement, it is necessary to precisely know the air pollution distribution in the area. In order to obtain this figure, it is necessary to properly locate the city's air quality monitoring stations that measure the spatial pollutant distribution. According to various reports, the city must have at least 56 air quality monitoring stations to properly measure Tehran's air quality. However, there are currently only 20 stations within the city. Thus, the main purpose of this study was to identify the most sufficient areas for deploying new air quality monitoring stations. This study provided an integration of hybrid multi-criteria decision-making (MCDM) theories and geographical information system (GIS) processes in order to determine suitable areas to establish air quality monitoring stations. Unlike traditional models, the proposed MCDM method, ANP-OWA, is an efficient decision analysis which considers dependencies between criteria and defines different scenarios between pessimistic and optimistic conditions for decision makers. This method was applied to several parameters such as point, area, and line sources; population density; sensitive receptors; distance from current air quality stations; prediction error; and spatial distribution of CO, NO2, SO2, and PM10 pollutants. The output results specified several suitable locations to establish air pollution monitoring stations within Tehran Province. The stability and reliability of the output results were evaluated with a robust sensitivity analysis method. Moreover, the results demonstrated that the proposed method can produce stable results. Obtaining knowledge regarding population density, distance from current air quality stations, and spatial distribution of CO pollutant criteria is essential when selecting locations for air quality monitoring stations.