Showing papers by "Paul Cook published in 2018"

PDF

Open Access

Proceedings Article•DOI•

Android authorship attribution through string analysis

[...]

Vaibhavi Kalgutkar¹, Natalia Stakhanova¹, Paul Cook¹, Alina Matyukhina¹•Institutions (1)

27 Aug 2018

TL;DR: This work proposes to develop a lightweight system that can generate signatures of malware writers by leveraging the string components present in their Android binaries, and can effectively detect a wide range of existing, as well as any new, malware samples generated by particular authors.

...read moreread less

Abstract: With the rising popularity of Android mobile devices, the amount of malicious applications targeting the Android platform has been increasing tremendously. To mitigate the risk of malicious apps, there is a need for an automated system to detect these applications. Current detection techniques rely on the signatures of well-documented malware, and hence may not be able to detect new malware samples. Instead of generating signatures for malware samples themselves, in this work, we propose to develop a lightweight system that can generate signatures of malware writers by leveraging the string components present in their Android binaries. Using these author signatures, we can effectively detect a wide range of existing, as well as any new, malware samples generated by particular authors. The proposed system achieved 98%, 96%, and 71% accuracy over datasets of 1559 benign, 262 malicious, and 96 obfuscated Android applications, respectively. The string-based approach achieved 71% of accuracy compared to only 50% obtained with the existing Ding and Samadzadeh's system.

...read moreread less

25 citations

Proceedings Article•DOI•

Leveraging distributed representations and lexico-syntactic fixedness for token-level prediction of the idiomaticity of English verb-noun combinations

[...]

Milton King¹, Paul Cook¹•Institutions (1)

University of New Brunswick¹

01 Jul 2018

TL;DR: The results show that a model based on averaging word embeddings performs on par with, or better than, a previously-proposed approach based on skip-thoughts, based on a variety of approaches to forming distributed representations.

...read moreread less

Abstract: Verb-noun combinations (VNCs) - e.g., blow the whistle, hit the roof, and see stars - are a common type of English idiom that are ambiguous with literal usages. In this paper we propose and evaluate models for classifying VNC usages as idiomatic or literal, based on a variety of approaches to forming distributed representations. Our results show that a model based on averaging word embeddings performs on par with, or better than, a previously-proposed approach based on skip-thoughts. Idiomatic usages of VNCs are known to exhibit lexico-syntactic fixedness. We further incorporate this information into our models, demonstrating that this rich linguistic knowledge is complementary to the information carried by distributed representations.

...read moreread less

10 citations

Proceedings Article•

Do Character-Level Neural Network Language Models Capture Knowledge of Multiword Expression Compositionality?

[...]

Ali Hakimi Parizi¹, Paul Cook²•Institutions (2)

Razi University¹, University of New Brunswick²

01 Aug 2018

TL;DR: Experimental results on two kinds of MWEs and two languages suggest that character-level neural network language models capture knowledge of multiword expression compositionality, in particular for English noun compounds and the particle component of English verb-particle constructions.

...read moreread less

Abstract: In this paper, we propose the first model for multiword expression (MWE) compositionality prediction based on character-level neural network language models. Experimental results on two kinds of MWEs (noun compounds and verb-particle constructions) and two languages (English and German) suggest that character-level neural network language models capture knowledge of multiword expression compositionality, in particular for English noun compounds and the particle component of English verb-particle constructions. In contrast to many other approaches to MWE compositionality prediction, this character-level approach does not require token-level identification of MWEs in a training corpus, and can potentially predict the compositionality of out-of-vocabulary MWEs.

...read moreread less

5 citations

Proceedings Article•

Towards Language Technology for Mi'kmaq.

[...]

Anant Maheshwari¹, Léo Bouscarrat, Paul Cook²•Institutions (2)

National Institute of Technology, Karnataka¹, University of New Brunswick²

01 May 2018

TL;DR: This paper first constructs and analyzes a web corpus of Mi’kmaq, then evaluates several approaches to language modelling for Mi'kmaQ, including character-level models that are particularly well-suited to morphologically-rich languages.

...read moreread less

Abstract: Mi’kmaq is a polysynthetic Indigenous language spoken primarily in Eastern Canada, on which no prior computational work has focused. In this paper we first construct and analyze a web corpus of Mi’kmaq. We then evaluate several approaches to language modelling for Mi’kmaq, including character-level models that are particularly well-suited to morphologically-rich languages. Preservation of Indigenous languages is particularly important in the current Canadian context; we argue that natural language processing could aid such efforts.

...read moreread less

4 citations

Proceedings Article•DOI•

UNBNLP at SemEval-2018 Task 10: Evaluating unsupervised approaches to capturing discriminative attributes.

[...]

Milton King¹, Ali Hakimi Parizi², Paul Cook¹•Institutions (2)

University of New Brunswick¹, Razi University²

01 Jun 2018

TL;DR: Three unsupervised models for capturing discriminative attributes based on information from word embeddings, WordNet, and sentence-level word co-occurrence frequency are presented and it is shown that the simple approach based on word co,occurrence performs best.

...read moreread less

Abstract: In this paper we present three unsupervised models for capturing discriminative attributes based on information from word embeddings, WordNet, and sentence-level word co-occurrence frequency. We show that, of these approaches, the simple approach based on word co-occurrence performs best. We further consider supervised and unsupervised approaches to combining information from these models, but these approaches do not improve on the word co-occurrence model.

...read moreread less

3 citations

Book Chapter•DOI•

Text-Based Detection of Unauthorized Users of Social Media Accounts

[...]

Milton King¹, Dima Alhadidi¹, Paul Cook¹•Institutions (1)

University of New Brunswick¹

08 May 2018

TL;DR: An author verification task in the realm of blog posts to detect and block unauthorized users based on the textual content of their unauthorized post, using different methods to represent a document, such as word frequency and word2vec.

...read moreread less

Abstract: Although social media platforms can assist organizations’ progress, they also make them vulnerable to unauthorized users gaining access to their account and posting as the organization. This can have negative effects on the company’s public appearance and profit. Once attackers gain access to a social media account, they are able to post any content from that account. In this paper, we propose an author verification task in the realm of blog posts to detect and block unauthorized users based on the textual content of their unauthorized post. We use different methods to represent a document, such as word frequency and word2vec, and we train two different classifiers over these document representations. The experimental results show that regardless of the classifier the word2vec method outperforms other representations.

...read moreread less

3 citations