Showing papers by "Anthony D. Joseph published in 2013"

PDF

Open Access

ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing

[...]

Matt Massie, Frank Austin Nothaft, Christopher Hartl, Christos Kozanitis, André Schumacher, Anthony D. Joseph, David A. Patterson - Show less +3 more

01 Jan 2013

TL;DR: The ADAM format provides explicit schemas for read and reference oriented sequence data, variants, and genotypes and eliminates the need for the development of language-specific libraries for format decoding/encoding, which eliminates the possibility of library incompatibilities.

...read moreread less

Abstract: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Executive Summary Current genomics data formats and processing pipelines are not designed to scale well to large datasets. The current Sequence/Binary Alignmen-t/Map (SAM/BAM) formats were intended for single node processing [18]. There have been attempts to adapt BAM to distributed computing environments, but they see limited scalability past eight nodes [22]. Additionally, due to the lack of an explicit data schema, there are well known incompatibilities between libraries that implement SAM/BAM/Variant Call Format (VCF) data access. To address these problems, we introduce ADAM, a set of formats, APIs, and processing stage implementations for genomic data. ADAM is fully open source under the Apache 2 license, and is implemented on top of Avro and Parquet [5, 26] for data storage. Our reference pipeline is implemented on top of Spark, a high performance in-memory map-reduce system [32]. This combination provides the following advantages: 1) Avro provides explicit data schema access in C/C++/C#, Java/Scala, Python, php, and Ruby; 2) Parquet allows access by database systems like Impala and Shark; and 3) Spark improves performance through in-memory caching and reducing disk I/O. In addition to improving the format's cross-platform portability, these changes lead to significant performance improvements. On a single node, we are able to speedup sort and duplicate marking by 2×. More importantly, on a 250 Gigabyte (GB) high (60×) coverage human genome, this system achieves a 50× speedup on a 100 node computing cluster (see Table 1), fulfilling the promise of scala-bility of ADAM. The ADAM format provides explicit schemas for read and reference oriented (pileup) sequence data, variants, and genotypes. As the schemas are implemented in Apache Avro—a cross-platform/language serialization format—they eliminate the need for the development of language-specific libraries for format decoding/encoding, which eliminates the possibility of library incompatibilities. A key feature of ADAM is that any application that implements the ADAM schema is compatible with ADAM. This is important, as it prevents applications from being locked into a specific tool or pattern. The ADAM stack is inspired by the " narrow …

...read moreread less

120 citations

Proceedings Article•DOI•

Approaches to adversarial drift

[...]

Alex Kantchelian¹, Sadia Afroz², Ling Huang³, Aylin Caliskan Islam², Brad Miller¹, Michael Carl Tschantz¹, Rachel Greenstadt², Anthony D. Joseph¹, J. D. Tygar¹ - Show less +5 more•Institutions (3)

University of California, Berkeley¹, Drexel University², Intel³

04 Nov 2013

TL;DR: It is argued that to be of practical interest, a machine-learning based security system must engage with the human operators beyond feature engineering and instance labeling to address the challenge of drift in adversarial environments and it is proposed that designers of such systems broaden the classification goal into an explanatory goal, which would deepen the interaction with system's operators.

...read moreread less

Abstract: In this position paper, we argue that to be of practical interest, a machine-learning based security system must engage with the human operators beyond feature engineering and instance labeling to address the challenge of drift in adversarial environments. We propose that designers of such systems broaden the classification goal into an explanatory goal, which would deepen the interaction with system's operators.To provide guidance, we advocate for an approach based on maintaining one classifier for each class of unwanted activity to be filtered. We also emphasize the necessity for the system to be responsive to the operators constant curation of the training set. We show how this paradigm provides a property we call isolation and how it relates to classical causative attacks.In order to demonstrate the effects of drift on a binary classification task, we also report on two experiments using a previously unpublished malware data set where each instance is timestamped according to when it was seen.

...read moreread less

78 citations

DOI•

Machine Learning Methods for Computer Security (Dagstuhl Perspectives Workshop 12371)

[...]

Anthony D. Joseph, Pavel Laskov, Fabio Roli, J. Doug Tygar, Blaine Nelson - Show less +1 more

01 Jan 2013

TL;DR: This workshop featured twenty-two invited talks from leading researchers within the secure learning community covering topics in adversarial learning, game-theoretic learning, collective classification, privacy-preserving learning, security evaluation metrics, digital forensics, authorship identification, adversarial advertisement detection, learning for offensive security, and data sanitization.

...read moreread less

Abstract: The study of learning in adversarial environments is an emerging discipline at the juncture between machine learning and computer security that raises new questions within both fields. The interest in learning-based methods for security and system design applications comes from the high degree of complexity of phenomena underlying the security and reliability of computer systems. As it becomes increasingly difficult to reach the desired properties by design alone, learning methods are being used to obtain a better understanding of various data collected from these complex systems. However, learning approaches can be co-opted or evaded by adversaries, who change to counter them. To-date, there has been limited research into learning techniques that are resilient to attacks with provable robustness guarantees making the task of designing secure learning-based systems a lucrative open research area with many challenges. The Perspectives Workshop, ``Machine Learning Methods for Computer Security'' was convened to bring together interested researchers from both the computer security and machine learning communities to discuss techniques, challenges, and future research directions for secure learning and learning-based security applications. This workshop featured twenty-two invited talks from leading researchers within the secure learning community covering topics in adversarial learning, game-theoretic learning, collective classification, privacy-preserving learning, security evaluation metrics, digital forensics, authorship identification, adversarial advertisement detection, learning for offensive security, and data sanitization. The workshop also featured workgroup sessions organized into three topic: machine learning for computer security, secure learning, and future applications of secure learning.

...read moreread less

36 citations

A Study of Cloud Computing Software-as-a-Service (SaaS) in Financial Firms

[...]

James Lawler, H Howell-Barber, Supriya Desai, Anthony D. Joseph

01 Aug 2013

TL;DR: This study analyzes factors that can enable firms in the financial industry to formulate cloud computing strategy from a foundational investment in SaaS to contribute guidance into the formulation of strategy from initial investments in the technology.

...read moreread less

Abstract: Cloud computing is a delivery method of information systems that is being deployed by the financial industry. Software-as-a-Service (SaaS) is the more frequent model of this method in the industry. In this study the authors analyze factors that can enable firms in the financial industry to formulate cloud computing strategy from a foundational investment in SaaS. The authors learn that business and procedural factors are more critical than technical factors as drivers in an implementation strategy. The findings of the study contribute guidance into the formulation of strategy from initial investments in the technology.

...read moreread less

15 citations

Journal Article•DOI•

Comparing the Forecasts of Money Demand

[...]

Anthony D. Joseph¹, Maurice Larrain¹, Richard E. Ottoo¹•Institutions (1)

Pace University¹

01 Jan 2013-Procedia Computer Science

TL;DR: This paper is concerned with the forecasting of money demand changes relative to levels and using price level/inflation, real income, wealth, and interest rate as independent variables and neural networks yielded a better overall forecast of the changes in money demand.

...read moreread less

5 citations

A Study of Cloud Computing Infrastructure-as-a- Service (IaaS) in Financial Firms

[...]

H Howell-Barber, James Lawler, Anthony D. Joseph, Stuti Narula

01 Jan 2013

TL;DR: The authors find procedural factors more evident than technical and business factors on projects of IaaS, but also find implementation methods more limiting in strategy.

...read moreread less

Abstract: The cloud continues to be a delivery method of information systems deployed frequently by financial firms. Infrastructure-as-a-Service (IaaS) is an evolving model of this method in industry. In this study, the authors evaluate critical few factors that can enable financial firms to formulate a generic strategy from investment in IaaS. The authors find procedural factors more evident than technical and business factors on projects of IaaS, but also find implementation methods more limiting in strategy. The findings of this study contribute a framework for investment in this maturing method of cloud computing.

...read moreread less

5 citations

A Case Study of Engaging Community Service Students through Visual Storytelling of High School Students with Disabilities

[...]

James Lawler, Anthony D. Joseph

01 Jan 2013

TL;DR: In this article, the authors analyzed the benefits of digital storytelling in a course engaging college students with high school students with developmental and intellectual disabilities, and found that a project of storytelling progressively enables high engagement of the students, in importance, performance and satisfaction.

...read moreread less

Abstract: Community engagement is a common course in college curricula of computer science and information systems. In this study, the authors analyze the benefits of digital storytelling, in a course engaging college students with high school students with developmental and intellectual disabilities. The authors discover that a project of storytelling progressively enables high engagement of the students, in importance, performance and satisfaction. The authors also discover that the project enables progressively high impact in the advocacy of these students for individuals with disabilities, in selfefficacy and sociality. The study will benefit instructors in any discipline evaluating digital storytelling technology as a service-learning tool.

...read moreread less

4 citations

Proceedings Article•DOI•

Influence of entrepreneurial aptitude on technology entrepreneurship course performance

[...]

Anthony D. Joseph¹•Institutions (1)

Pace University¹

01 Oct 2013

TL;DR: The quality of the team projects produced and the correlation analysis of the examination grades in the Technology Entrepreneurship course showed relative improvement over those produced in the Data Mining course.

...read moreread less

Abstract: In a computing technology entrepreneurship course offered in fall 2011, students were separated into teams of three and four students and taught the concepts and skills of teamwork, innovation, and entrepreneurship. They applied these concepts and skills to an open-ended project for a niche market financial or healthcare information technology product. Each team produced a product supported by a business plan and PowerPoint presentation as the project deliverable. The course was supported by mentors for the teams and guest lecturers. In spring 2011, a Data Mining course was offered where no direct instruction in teamwork, innovation, and entrepreneurship was provided, but the student teams were assigned a similar open-ended project. The objective of this exploratory study is to evaluate students and teams' relative increase in entrepreneurial aptitude. The two courses' performances were determined by the project quality and course grades (average in-class and final examinations) supplemented by a post survey of student perceptions of course related gains and changes in attitudes. As expected, the quality of the team projects produced and the correlation analysis of the examination grades in the Technology Entrepreneurship course showed relative improvement over those produced in the Data Mining course.

...read moreread less

3 citations

Journal Article•DOI•

Internal Capital Markets and Patenting in Emerging Growth Firms

[...]

Richard E. Ottoo¹, Maurice Larrain¹, Anthony D. Joseph¹•Institutions (1)

Pace University¹

26 Aug 2013-International journal of economics and finance

TL;DR: In this article, the authors show that internal capital markets have a positive and significant relationship with patenting of emerging firms, but for mature firms, the relationship between internal finance and patenting is negative but not significant.

...read moreread less

Abstract: Despite the overwhelming theoretical intuition in support of the arguments that internally generated capital should be an important determinant of corporate innovation, very little empirical evidence of this association has been established. In this paper we show that internal capital markets have a positive and significant relationship with patenting of emerging firms. However, for mature firms, the relationship between internal finance and patenting is negative but not significant. Our empirical analysis is grounded on the theoretical modeling of granting a patent as the maturity date of an American real call option, with internal capital and R&D expenses serving to shorten the maturity of the growth option and to speed up innovation.

...read moreread less

1 citations