Home
/
Authors
/
Jun-Yi Shen

Author

Jun-Yi Shen

Bio: Jun-Yi Shen is an academic researcher from Xi'an Jiaotong University. The author has contributed to research in topics: Cluster analysis & Canopy clustering algorithm. The author has an hindex of 10, co-authored 40 publications receiving 300 citations.

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Semantic Sequence Kin: A Method of Document Copy Detection

[...]

Jun-Peng Bao¹, Jun-Yi Shen¹, Xiao-Dong Liu¹, Hai-Yan Liu¹, Xiao-Di Zhang¹ - Show less +1 more•Institutions (1)

Xi'an Jiaotong University¹

26 May 2004

TL;DR: The Semantic Sequence Kin (SSK) is tested and it is shown that SSK is excellent for detecting non-rewording plagiarism and valid even if documents are reworded to some extent.

...read moreread less

Abstract: The string matching and global word frequency model are two basic models of Document Copy Detection, although they are both unsatisfied in some respects. The String Kernel (SK) and Word Sequence Kernel (WSK) may map string pairs into a new feature space directly, in which the data is linearly separable. This idea inspires us with the Semantic Sequence Kin (SSK) and we apply it to document copy detection. SK and WSK only take into account the gap between the first word/term and the last word/term so that it is not good for plagiarism detection. SSK considers each common word’s position information so as to detect plagiarism in a fine granularity. SSK is based on semantic density that is indeed the local word frequency information. We believe these measures diminish the noise of rewording greatly. We test SSK in a small corpus with several common copy types. The result shows that SSK is excellent for detecting non-rewording plagiarism and valid even if documents are reworded to some extent.

...read moreread less

29 citations

Journal Article•DOI•

Integrate the GM(1,1) and Verhulst Models to Predict Software Stage Effort

[...]

Yong Wang¹, Qinbao Song¹, Stephen G. MacDonell², Martin Shepperd³, Jun-Yi Shen¹ - Show less +1 more•Institutions (3)

Xi'an Jiaotong University¹, Auckland University of Technology², Brunel University London³

01 Nov 2009

TL;DR: A method for software physical time stage-effort prediction based on grey models GM(1,1) and Verhulst, which can adapt to particular development methodologies automatically by using a novel grey feedback mechanism, and indicates that the method can be effective and has considerable potential.

...read moreread less

Abstract: Software effort prediction clearly plays a crucial role in software project management. In keeping with more dynamic approaches to software development, it is not sufficient to only predict the whole-project effort at an early stage. Rather, the project manager must also dynamically predict the effort of different stages or activities during the software development process. This can assist the project manager to reestimate effort and adjust the project plan, thus avoiding effort or schedule overruns. This paper presents a method for software physical time stage-effort prediction based on grey models GM(1,1) and Verhulst. This method establishes models dynamically according to particular types of stage-effort sequences, and can adapt to particular development methodologies automatically by using a novel grey feedback mechanism. We evaluate the proposed method with a large-scale real-world software engineering dataset, and compare it with the linear regression method and the Kalman filter method, revealing that accuracy has been improved by at least 28% and 50%, respectively. The results indicate that the method can be effective and has considerable potential. We believe that stage predictions could be a useful complement to whole-project effort prediction methods.

...read moreread less

26 citations

Proceedings Article•DOI•

Artificial immune theory based network intrusion detection system and the algorithms design

[...]

Xiang-Rong Yang¹, Jun-Yi Shen¹, Rui Wang¹•Institutions (1)

Xi'an Jiaotong University¹

04 Nov 2002

TL;DR: A network intrusion detection model based on artificial immune theory, which shows that this method can shrink each generation scale greatly and create a good niche for patterns evolving, is proposed in this paper.

...read moreread less

Abstract: A network intrusion detection model based on artificial immune theory is proposed in this paper. In this model, self patterns and non-self patterns are built upon frequent behaviors sequences, then a simple but efficient algorithm for encoding patterns is proposed. Based on the result of encoding, another algorithm for creating detectors is presented, which integrates a negative selection with the clonal selection. The algorithm performance is analyzed, which shows that this method can shrink each generation scale greatly and create a good niche for patterns evolving.

...read moreread less

25 citations

Proceedings Article•DOI•

A new text feature extraction model and its application in document copy detection

[...]

Jun-Peng Bao¹, Jun-Yi Shen¹, Xiao-Dong Liu¹, Qinbao Song¹•Institutions (1)

Xi'an Jiaotong University¹

02 Nov 2003

TL;DR: This paper presents a new text feature extraction model: semantic sequence model (SSM) that based on the concepts of word distance, word density and semantic sequence that gets excellent accuracy of text copy detection.

...read moreread less

Abstract: Text feature extraction is a common issue in information retrieval, text mining, Web mining, text classification/clustering and document copy etc. The most popular approach is word frequency based scheme, which uses a word frequency vector to represent a document. Cosine function, dot product and proportion function are regular similarity measures of vector. But that is only global semantic feature of a document and loses local feature and structural information so that it prevents us to distinguish text well, especially in copy detection. In this paper we present a new text feature extraction model: semantic sequence model (SSM) that based on the concepts of word distance, word density and semantic sequence. The semantic sequences of a document contain not only local semantic features but also global feature and structural information, on which we get excellent accuracy of text copy detection. At the end of the paper, we contrast SSM with VSM and RFM and the experimental results show SSM is a superior model.

...read moreread less

23 citations

Proceedings Article•DOI•

Automatic video classification using decision tree method

[...]

Ye Yuan¹, Qinbao Song¹, Jun-Yi Shen¹•Institutions (1)

Xi'an Jiaotong University¹

04 Nov 2002

TL;DR: The experimental results show that the decision tree method is promising for video genre recognition.

...read moreread less

Abstract: Automatic digital video classification is emerging as an important problem in the fields of video analysis and multimedia database. In this paper the decision tree method is used for automatic video classification. Video clips are first segmented into shots, then features of the video clips are generated, and a video features database is created at the same time. After that, the decision tree method is used to classify the videos as different genres and a set of decision rules are produced. The discovered decision rules are used to predicate the genre of a new video. The experimental results show that the decision tree method is promising for video genre recognition.

...read moreread less

23 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•DOI•

Principles and Practice of Sleep Medicine

[...]

R. Stafford

28 Feb 2001-JAMA

1,258 citations

International Joint Conference on Neural Networks

[...]

Alan Murray

01 Jan 1993

668 citations

Software engineering economics

[...]

Barry Boehm

01 Jan 1981

TL;DR: In this article, the authors provide an overview of economic analysis techniques and their applicability to software engineering and management, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.

...read moreread less

Abstract: This paper summarizes the current state of the art and recent trends in software engineering economics. It provides an overview of economic analysis techniques and their applicability to software engineering and management. It surveys the field of software cost estimation, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.

...read moreread less

283 citations

Journal Article•DOI•

Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods

[...]

Salha Alzahrani¹, Naomie Salim², Ajith Abraham³•Institutions (3)

Taif University¹, Universiti Teknologi Malaysia², University of Ostrava³

01 Mar 2012

TL;DR: A new taxonomy of plagiarism is presented that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view, and supports deep understanding of different linguistic patterns in committing plagiarism.

...read moreread less

Abstract: Plagiarism can be of many different natures, ranging from copying texts to adopting ideas, without giving credit to its originator. This paper presents a new taxonomy of plagiarism that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view. The taxonomy supports deep understanding of different linguistic patterns in committing plagiarism, for example, changing texts into semantically equivalent but with different words and organization, shortening texts with concept generalization and specification, and adopting ideas and important contributions of others. Different textual features that characterize different plagiarism types are discussed. Systematic frameworks and methods of monolingual, extrinsic, intrinsic, and cross-lingual plagiarism detection are surveyed and correlated with plagiarism types, which are listed in the taxonomy. We conduct extensive study of state-of-the-art techniques for plagiarism detection, including character n-gram-based (CNG), vector-based (VEC), syntax-based (SYN), semantic-based (SEM), fuzzy-based (FUZZY), structural-based (STRUC), stylometric-based (STYLE), and cross-lingual techniques (CROSS). Our study corroborates that existing systems for plagiarism detection focus on copying text but fail to detect intelligent plagiarism when ideas are presented in different words.

...read moreread less

275 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

Collapse