scispace - formally typeset
Search or ask a question
Author

Jun-Yi Shen

Bio: Jun-Yi Shen is an academic researcher from Xi'an Jiaotong University. The author has contributed to research in topics: Cluster analysis & Canopy clustering algorithm. The author has an hindex of 10, co-authored 40 publications receiving 300 citations.

Papers
More filters
Book ChapterDOI
26 May 2004
TL;DR: The Semantic Sequence Kin (SSK) is tested and it is shown that SSK is excellent for detecting non-rewording plagiarism and valid even if documents are reworded to some extent.
Abstract: The string matching and global word frequency model are two basic models of Document Copy Detection, although they are both unsatisfied in some respects. The String Kernel (SK) and Word Sequence Kernel (WSK) may map string pairs into a new feature space directly, in which the data is linearly separable. This idea inspires us with the Semantic Sequence Kin (SSK) and we apply it to document copy detection. SK and WSK only take into account the gap between the first word/term and the last word/term so that it is not good for plagiarism detection. SSK considers each common word’s position information so as to detect plagiarism in a fine granularity. SSK is based on semantic density that is indeed the local word frequency information. We believe these measures diminish the noise of rewording greatly. We test SSK in a small corpus with several common copy types. The result shows that SSK is excellent for detecting non-rewording plagiarism and valid even if documents are reworded to some extent.

29 citations

Journal ArticleDOI
01 Nov 2009
TL;DR: A method for software physical time stage-effort prediction based on grey models GM(1,1) and Verhulst, which can adapt to particular development methodologies automatically by using a novel grey feedback mechanism, and indicates that the method can be effective and has considerable potential.
Abstract: Software effort prediction clearly plays a crucial role in software project management. In keeping with more dynamic approaches to software development, it is not sufficient to only predict the whole-project effort at an early stage. Rather, the project manager must also dynamically predict the effort of different stages or activities during the software development process. This can assist the project manager to reestimate effort and adjust the project plan, thus avoiding effort or schedule overruns. This paper presents a method for software physical time stage-effort prediction based on grey models GM(1,1) and Verhulst. This method establishes models dynamically according to particular types of stage-effort sequences, and can adapt to particular development methodologies automatically by using a novel grey feedback mechanism. We evaluate the proposed method with a large-scale real-world software engineering dataset, and compare it with the linear regression method and the Kalman filter method, revealing that accuracy has been improved by at least 28% and 50%, respectively. The results indicate that the method can be effective and has considerable potential. We believe that stage predictions could be a useful complement to whole-project effort prediction methods.

26 citations

Proceedings ArticleDOI
04 Nov 2002
TL;DR: A network intrusion detection model based on artificial immune theory, which shows that this method can shrink each generation scale greatly and create a good niche for patterns evolving, is proposed in this paper.
Abstract: A network intrusion detection model based on artificial immune theory is proposed in this paper. In this model, self patterns and non-self patterns are built upon frequent behaviors sequences, then a simple but efficient algorithm for encoding patterns is proposed. Based on the result of encoding, another algorithm for creating detectors is presented, which integrates a negative selection with the clonal selection. The algorithm performance is analyzed, which shows that this method can shrink each generation scale greatly and create a good niche for patterns evolving.

25 citations

Proceedings ArticleDOI
02 Nov 2003
TL;DR: This paper presents a new text feature extraction model: semantic sequence model (SSM) that based on the concepts of word distance, word density and semantic sequence that gets excellent accuracy of text copy detection.
Abstract: Text feature extraction is a common issue in information retrieval, text mining, Web mining, text classification/clustering and document copy etc. The most popular approach is word frequency based scheme, which uses a word frequency vector to represent a document. Cosine function, dot product and proportion function are regular similarity measures of vector. But that is only global semantic feature of a document and loses local feature and structural information so that it prevents us to distinguish text well, especially in copy detection. In this paper we present a new text feature extraction model: semantic sequence model (SSM) that based on the concepts of word distance, word density and semantic sequence. The semantic sequences of a document contain not only local semantic features but also global feature and structural information, on which we get excellent accuracy of text copy detection. At the end of the paper, we contrast SSM with VSM and RFM and the experimental results show SSM is a superior model.

23 citations

Proceedings ArticleDOI
04 Nov 2002
TL;DR: The experimental results show that the decision tree method is promising for video genre recognition.
Abstract: Automatic digital video classification is emerging as an important problem in the fields of video analysis and multimedia database. In this paper the decision tree method is used for automatic video classification. Video clips are first segmented into shots, then features of the video clips are generated, and a video features database is created at the same time. After that, the decision tree method is used to classify the videos as different genres and a set of decision rules are produced. The discovered decision rules are used to predicate the genre of a new video. The experimental results show that the decision tree method is promising for video genre recognition.

23 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
28 Feb 2001-JAMA

1,258 citations

01 Jan 1981
TL;DR: In this article, the authors provide an overview of economic analysis techniques and their applicability to software engineering and management, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.
Abstract: This paper summarizes the current state of the art and recent trends in software engineering economics. It provides an overview of economic analysis techniques and their applicability to software engineering and management. It surveys the field of software cost estimation, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.

283 citations

Journal ArticleDOI
01 Mar 2012
TL;DR: A new taxonomy of plagiarism is presented that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view, and supports deep understanding of different linguistic patterns in committing plagiarism.
Abstract: Plagiarism can be of many different natures, ranging from copying texts to adopting ideas, without giving credit to its originator. This paper presents a new taxonomy of plagiarism that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view. The taxonomy supports deep understanding of different linguistic patterns in committing plagiarism, for example, changing texts into semantically equivalent but with different words and organization, shortening texts with concept generalization and specification, and adopting ideas and important contributions of others. Different textual features that characterize different plagiarism types are discussed. Systematic frameworks and methods of monolingual, extrinsic, intrinsic, and cross-lingual plagiarism detection are surveyed and correlated with plagiarism types, which are listed in the taxonomy. We conduct extensive study of state-of-the-art techniques for plagiarism detection, including character n-gram-based (CNG), vector-based (VEC), syntax-based (SYN), semantic-based (SEM), fuzzy-based (FUZZY), structural-based (STRUC), stylometric-based (STYLE), and cross-lingual techniques (CROSS). Our study corroborates that existing systems for plagiarism detection focus on copying text but fail to detect intelligent plagiarism when ideas are presented in different words.

275 citations