Author
Hong Zhao
Bio: Hong Zhao is an academic researcher from Nankai University. The author has contributed to research in topics: Cloud storage & Character (mathematics). The author has an hindex of 1, co-authored 1 publications receiving 5 citations.
Papers
More filters
01 Feb 2017
TL;DR: STRHOG, an extended version of HOG that is helpful for filtering spam images on cloud and a fair comparison with other methods, nearest neighbor classifier is used for the intelligent character recognition.
Abstract: Cloud storage has become an important way for data sharing in recent years. Data protection for data owner and harmful data filtering for data recipients are two non-negligible problems in cloud storage. Illegal or unsuitable messages on cloud have a negative impact on minors and they are easily converted into images to avoid text-based filtering. To detect the spam image with the embedded harmful messages on cloud, soft computing methods are required for intelligent character recognition. HOG, proposed by Dalal and Triggs, has been demonstrated so far to be one of the best features for intelligent character recognition. A pre-defined sliding window is always used for the generation of candidate character images when HOG is applied to recognize the whole word. However, due to the difference in character sizes, the pre-defined window cannot exactly match with each character. Variations on scale and translation usually occur in the character image to be recognized, which have a great influence on the performance of intelligent character recognition. Aiming to solve this problem, STRHOG, an extended version of HOG, is proposed in this paper. Experiments on two public datasets and one our dataset have shown encouraging results for our work. The improved intelligent character recognition is helpful for filtering spam images on cloud. To make a fair comparison with other methods, nearest neighbor classifier is used for the intelligent character recognition. It is expected that the performance should be further improved by using better classifiers such as fuzzy neural network.
7 citations
Cited by
More filters
09 Jan 2019
TL;DR: The proposed enhanced HOG feature extraction method has been used so that the optical character recognition system of spam has been enhanced by using the HOGfeature extraction method in such a way to be both resistant against the character variations on scale and translation and to be computationally cost-effective.
Abstract: Generally, a spam image is an unsolicited message electronically sent to a wide group of arbitrary addresses. Due to attractiveness and more difficult detection, spam images are the most complicated type of spam. One of the ways to encounter the spam images is an optical character recognition, OCR, method. In this paper, the proposed enhanced HOG feature extraction method has been used so that the optical character recognition system of spam has been enhanced by using the HOG feature extraction method in such a way to be both resistant against the character variations on scale and translation and to be computationally cost-effective. For these purposes, two steps of the cropped image and input image size normalization have been added to pre-processing stages. Support vector machine, SVM, was employed for classification. Two heuristic modifications including thickening of the thin characters in the pre-processing stage and non-discrimination in detecting the uppercase and lowercase letters with the same shapes in the classification stage have been also proposed to increase the system recognition accuracy. In the first heuristic modification, when all pixels of the output image are empty (the character is eliminated), the original image was made thicker by one layer. In the second modification, when recognizing the letters, no differentiation was considered between the uppercase and lowercase letters with the same shapes. An average recognition accuracy of the modified HOG method with two heuristic modifications equals 91.61% on Char74K database. Then, an optimum threshold for classification was investigated by ROC curve. The optimal cutoff point was 0.736 with the highest average accuracy, 94.20%, and AUC, area under curve, for ROC and precision–recall, PR, curves were 0.96 and 0.73, respectively. The proposed method was also examined on ICDAR2003 database, and the average accuracy and its optimum using ROC curve were 82.73% and 86.01%, respectively. These results of recognition accuracy and AUC for ROC and PR curve showed an outstanding enhancement in comparison with the best recognition rate of the previous methods.
27 citations
TL;DR: A reliability analysis method of computer network based on intelligent cloud computing method is proposed that is integrated with the theoretical idea of analytic hierarchy process and shows that it is highly accurate and robust.
Abstract: In the process of reliability analysis of computer network, there is the high probability of network failure in the application process by using the current analysis method. In this paper, a reliab...
13 citations
TL;DR: A novel technique is proposed for automated OCR based on multi-properties features fusion and selection that will help for license plate recognition and text conversion from a printed document to machine-readable.
Abstract: PurposeIn artificial intelligence, the optical character recognition (OCR) is an active research area based on famous applications such as automation and transformation of printed documents into machine-readable text document. The major purpose of OCR in academia and banks is to achieve a significant performance to save storage space.Design/methodology/approachA novel technique is proposed for automated OCR based on multi-properties features fusion and selection. The features are fused using serially formulation and output passed to partial least square (PLS) based selection method. The selection is done based on the entropy fitness function. The final features are classified by an ensemble classifier.FindingsThe presented method was extensively tested on two datasets such as the authors proposed and Chars74k benchmark and achieved an accuracy of 91.2 and 99.9%. Comparing the results with existing techniques, it is found that the proposed method gives improved performance.Originality/valueThe technique presented in this work will help for license plate recognition and text conversion from a printed document to machine-readable.
13 citations
TL;DR: A fully-distributed pattern recognition system within P2P networks using the distributed associative memory tree (DASMET) algorithm to detect spam which is cost-efficient and not prone to a single point of failure, unlike the server-based systems.
Abstract: Spam appears in various forms and the current trend in spamming is moving towards multimedia spam objects. Image spam is a new type of spam attacks which attempts to bypass the spam filters that mostly text-based. Spamming attacks the users in many ways and these are usually countered by having a server to filter the spammers. This paper provides a fully-distributed pattern recognition system within P2P networks using the distributed associative memory tree (DASMET) algorithm to detect spam which is cost-efficient and not prone to a single point of failure, unlike the server-based systems. This algorithm is scalable for large and frequently updated data sets, and specifically designed for data sets that consist of similar occurring patterns.We have evaluated our system against centralised state-of-the-art algorithms (NN, k-NN, naive Bayes, BPNN and RBFN) and distributed P2P-based algorithms (Ivote-DPV, ensemble k-NN, ensemble naive Bayes, and P2P-GN). The experimental results show that our method is highly accurate with a 98 to 99% accuracy rate, and incurs a small number of messages—in the best-case, it requires only two messages per recall test. In summary, our experimental results show that the DAS-MET performs best with a relatively small amount of resources for the spam detection compared to other distributed methods.
9 citations
8 citations