Bio: Jing Zheng is an academic researcher from Jinan University. The author has contributed to research in topics: Cluster analysis & A* search algorithm. The author has an hindex of 2, co-authored 3 publications receiving 16 citations.
TL;DR: A feature-matching algorithm based on the character recognition via establishing the database of the letters is presented, reconstructing the shredded document by row clustering, intrarow splicing, and interrow splicing.
Abstract: The reconstruction of destroyed paper documents is of more interest during the last years. This topic is relevant to the fields of forensics, investigative sciences, and archeology. Previous research and analysis on the reconstruction of cross-cut shredded text document (RCCSTD) are mainly based on the likelihood and the traditional heuristic algorithm. In this paper, a feature-matching algorithm based on the character recognition via establishing the database of the letters is presented, reconstructing the shredded document by row clustering, intrarow splicing, and interrow splicing. Row clustering is executed through the clustering algorithm according to the clustering vectors of the fragments. Intrarow splicing regarded as the travelling salesman problem is solved by the improved genetic algorithm. Finally, the document is reconstructed by the interrow splicing according to the line spacing and the proximity of the fragments. Computational experiments suggest that the presented algorithm is of high precision and efficiency, and that the algorithm may be useful for the different size of cross-cut shredded text document.
12 Nov 2014
TL;DR: A fragment restoring method based on the genetic algorithm and the character identification technology is proposed in this paper. But the method is not suitable for longitudinally and transversely cut fragments generated by various paper shredders and cannot guarantee efficiency and accuracy to a certain degree at the same time.
Abstract: The invention discloses a fragment restoring method based on the genetic algorithm and the character identification technology. The method comprises the following steps that S1, an English character database is built, and a grayscale matrix of each character is obtained through the character obtaining technology; S2, the characters are identified, and whether the characters needing to be identified are cut or not is judged; S3, fragments are clustered according to lines, line clustering is based on the character identification technology and datum line distance information, and the line clustering process is completed through clustering vectors, clustering centers and clustering distances; S4, after the line clustering technology is executed, the fragments in each line are spliced through the inline splicing technology; S5, fragment splicing is conducted through the interline splicing technology. The fragment restoring method can be widely applied to restoring of longitudinally and transversely cut fragments generated by various paper shredders, guarantee efficiency and accuracy to a certain degree at the same time, and provide support for the fields of judicial evidence obtaining, historical document restoring and military intelligence obtaining.
••24 Jul 2016
TL;DR: The experiment for DIMACS Benchmark shows that this algorithm can not only solve the general graphs, but also figure out the optimal solution to the flat series which cannot be solved well by most evolutionary heuristic algorithms.
Abstract: Graph coloring is one of the most significant problems in combinatorial optimization. On the basis of the traditional evolutionary heuristic algorithm, this paper presents a memetic algorithm with partial solutions (MAP) for solving this problem. Moreover, we combine a special crossover operator based on the independent sets and a tabu search algorithm with a strong capacity of local search. Meanwhile the score function for individuals and the fitness function for updating process are improved. The experiment for DIMACS Benchmark shows that this algorithm can not only solve the general graphs, but also figure out the optimal solution to the flat series which cannot be solved well by most evolutionary heuristic algorithms. It proves that the memetic algorithm with partial solutions (MAP) has a good stability.
TL;DR: The present study introduces a wide-ranging reporting of nature- stimulated meta-heuristic methods, which are used throughout the graph coloring, and focuses on emphasizing the optimization algorithms to handle the GCP problems.
Abstract: Typically, Graph Coloring Problem (GCP) is one of the key features for graph stamping in graph theory. The general approach is to paint at least edges, vertices, or the surface of the graph with some colors. In the simplest case, a kind of coloring is preferable in which two vertices are not adjacent to the same color. Similarly, the two edges in the same joint should not have the same color. In addition, the same goes for the surface color of the graph. This is one of the NP-hard issues well studied in graph theory. Therefore, many different meta-heuristic techniques are presented to solve the problem and provide high performance. Seemingly, regardless of the importance of the nature-stimulated meta-heuristic methods to solve the GCP, there is not any inclusive report and detailed review about overviewing and investigating the crucial problems of the field. As a result, the present study introduces a wide-ranging reporting of nature- stimulated meta-heuristic methods, which are used throughout the graph coloring. The literature review contains a classification of significant techniques. This study mainly aims at emphasizing the optimization algorithms to handle the GCP problems. Furthermore, the advantages and disadvantages of the meta-heuristic algorithms in solving the GCP and their key issues are examined to offer more advanced meta-heuristic techniques in the future.
••01 Jan 2019
TL;DR: This chapter reviews applications of Memetic Algorithms in the areas of business analytics and data science and gives emphasis to the large number of applications in business and consumer analytics that were published between January 2014 and May 2018.
Abstract: This chapter reviews applications of Memetic Algorithms in the areas of business analytics and data science. This approach originates from the need to address optimization problems that involve combinatorial search processes. Some of these problems were from the area of operations research, management science, artificial intelligence and machine learning. The methodology has developed considerably since its beginnings and now is being applied to a large number of problem domains. This work gives a historical timeline of events to explain the current developments and, as a survey, gives emphasis to the large number of applications in business and consumer analytics that were published between January 2014 and May 2018.
TL;DR: A new clustering algorithm based on horizontal projection and a constrained seed K-means algorithm to improve the clustering accuracy of cross-cut shredded text documents (RCCSTD) is proposed and is found to offer significantly improved performance.
Abstract: The reconstruction of cross-cut shredded text documents (RCCSTD) is an important problem in forensics and is a real, complex and notable issue for information security and judicial investigations. It can be considered a special kind of greedy square jigsaw puzzle and has attracted the attention of many researchers. Clustering fragments into several rows is a crucial and difficult step in RCCSTD. However, existing approaches achieve low clustering accuracy. This paper therefore proposes a new clustering algorithm based on horizontal projection and a constrained seed K-means algorithm to improve the clustering accuracy. The constrained seed K-means algorithm draws upon expert knowledge and has the following characteristics: 1) the first fragment in each row is easy to distinguish and the unidimensional signals that are extracted from the first fragment can be used as the initial clustering center; 2) two or more prior fragments cannot be clustered together. To improve the splicing accuracy in the rows, a penalty coefficient is added to a traditional cost function. Experiments were carried out on 10 text documents. The accuracy of the clustering algorithm was 99.1% and the overall splicing accuracy was 91.0%, according to our measurements. The algorithm was compared with two other approaches and was found to offer significantly improved performance in terms of clustering accuracy. Our approach obtained the best results of RCCSTD problem based on our experiment results. Moreover, a more complex and real problem – reconstruction of cross-cut shredded dual text documents (RCCSDTD) problem – was tried to solve. The satisfactory results for RCCSDTD problems in some cases were obtained, to authors’ best knowledge, our method is the first feasible approach for RCCSDTD problem. On the other hand, the developed system is fundamentally an expert system that is being specifically applied to solve RCCSTD problems.
TL;DR: Experiments conducted with real mechanically shredded documents showed that the system proposed here outperformed in accuracy other popular techniques in the literature considering documents with (almost) only textual content.
Abstract: Digital reconstruction of mechanically shredded documents has received increasing attention in the last years mainly for historical and forensics needs. Computational methods to solve this problem are highly desirable in order to mitigate the time-consuming human effort and to preserve document integrity. The reconstruction of strips-shredded documents is accomplished by horizontally splicing pieces so that the arising sequence (solution) is as similar as the original document. In this context, a central issue is the quantification of the fitting between the pieces (strips), which generally involves stating a function that associates a pair of strips to a real value indicating the fitting quality. This problem is also more challenging for text documents, such as business letters or legal documents, since they depict poor color information. The system proposed here addresses this issue by exploring character shapes as visual features for compatibility computation. Experiments conducted with real mechanically shredded documents showed that our approach outperformed in accuracy other popular techniques in the literature considering documents with (almost) only textual content.
••01 Oct 2018
TL;DR: A deep learning-based compatibility score to be applied in the reconstruction of strip-shredded text documents from a well-known OCR database is proposed and achieved an average accuracy of 94.58% in the reconstructing of mechanically- shredded documents.
Abstract: The use of paper-shredder machines (mechanical shredding) to destroy documents can be illicitly motivated when the purpose is hiding evidence of fraud and other sorts of crimes. Therefore, reconstructing such documents is of great value for forensic investigation, but it is admittedly a stressful and time-consuming task for humans. To address this challenge, several computational techniques have been proposed in literature, particularly for documents with text-based content. In this context, a critical challenge for automated reconstruction is to measure properly the fitting (compatibility) between paper shreds (strips), which has been observed to be the main limitation of literature on this topic. The main contribution of this paper is a deep learning-based compatibility score to be applied in the reconstruction of strip-shredded text documents. Since there is no abundance of real-shredded data, we propose a training scheme based on digital simulated-shredding of documents from a well-known OCR database. The proposed score was coupled to a black-box optimization tool, and the resulting system achieved an average accuracy of 94.58% in the reconstruction of mechanically-shredded documents.