scispace - formally typeset
Search or ask a question

Showing papers by "Anthony D. Joseph published in 2013"


01 Jan 2013
TL;DR: The ADAM format provides explicit schemas for read and reference oriented sequence data, variants, and genotypes and eliminates the need for the development of language-specific libraries for format decoding/encoding, which eliminates the possibility of library incompatibilities.
Abstract: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Executive Summary Current genomics data formats and processing pipelines are not designed to scale well to large datasets. The current Sequence/Binary Alignmen-t/Map (SAM/BAM) formats were intended for single node processing [18]. There have been attempts to adapt BAM to distributed computing environments, but they see limited scalability past eight nodes [22]. Additionally, due to the lack of an explicit data schema, there are well known incompatibilities between libraries that implement SAM/BAM/Variant Call Format (VCF) data access. To address these problems, we introduce ADAM, a set of formats, APIs, and processing stage implementations for genomic data. ADAM is fully open source under the Apache 2 license, and is implemented on top of Avro and Parquet [5, 26] for data storage. Our reference pipeline is implemented on top of Spark, a high performance in-memory map-reduce system [32]. This combination provides the following advantages: 1) Avro provides explicit data schema access in C/C++/C#, Java/Scala, Python, php, and Ruby; 2) Parquet allows access by database systems like Impala and Shark; and 3) Spark improves performance through in-memory caching and reducing disk I/O. In addition to improving the format's cross-platform portability, these changes lead to significant performance improvements. On a single node, we are able to speedup sort and duplicate marking by 2×. More importantly, on a 250 Gigabyte (GB) high (60×) coverage human genome, this system achieves a 50× speedup on a 100 node computing cluster (see Table 1), fulfilling the promise of scala-bility of ADAM. The ADAM format provides explicit schemas for read and reference oriented (pileup) sequence data, variants, and genotypes. As the schemas are implemented in Apache Avro—a cross-platform/language serialization format—they eliminate the need for the development of language-specific libraries for format decoding/encoding, which eliminates the possibility of library incompatibilities. A key feature of ADAM is that any application that implements the ADAM schema is compatible with ADAM. This is important, as it prevents applications from being locked into a specific tool or pattern. The ADAM stack is inspired by the " narrow …

120 citations


Proceedings ArticleDOI
04 Nov 2013
TL;DR: It is argued that to be of practical interest, a machine-learning based security system must engage with the human operators beyond feature engineering and instance labeling to address the challenge of drift in adversarial environments and it is proposed that designers of such systems broaden the classification goal into an explanatory goal, which would deepen the interaction with system's operators.
Abstract: In this position paper, we argue that to be of practical interest, a machine-learning based security system must engage with the human operators beyond feature engineering and instance labeling to address the challenge of drift in adversarial environments. We propose that designers of such systems broaden the classification goal into an explanatory goal, which would deepen the interaction with system's operators.To provide guidance, we advocate for an approach based on maintaining one classifier for each class of unwanted activity to be filtered. We also emphasize the necessity for the system to be responsive to the operators constant curation of the training set. We show how this paradigm provides a property we call isolation and how it relates to classical causative attacks.In order to demonstrate the effects of drift on a binary classification task, we also report on two experiments using a previously unpublished malware data set where each instance is timestamped according to when it was seen.

78 citations


DOI
01 Jan 2013
TL;DR: This workshop featured twenty-two invited talks from leading researchers within the secure learning community covering topics in adversarial learning, game-theoretic learning, collective classification, privacy-preserving learning, security evaluation metrics, digital forensics, authorship identification, adversarial advertisement detection, learning for offensive security, and data sanitization.
Abstract: The study of learning in adversarial environments is an emerging discipline at the juncture between machine learning and computer security that raises new questions within both fields. The interest in learning-based methods for security and system design applications comes from the high degree of complexity of phenomena underlying the security and reliability of computer systems. As it becomes increasingly difficult to reach the desired properties by design alone, learning methods are being used to obtain a better understanding of various data collected from these complex systems. However, learning approaches can be co-opted or evaded by adversaries, who change to counter them. To-date, there has been limited research into learning techniques that are resilient to attacks with provable robustness guarantees making the task of designing secure learning-based systems a lucrative open research area with many challenges. The Perspectives Workshop, ``Machine Learning Methods for Computer Security'' was convened to bring together interested researchers from both the computer security and machine learning communities to discuss techniques, challenges, and future research directions for secure learning and learning-based security applications. This workshop featured twenty-two invited talks from leading researchers within the secure learning community covering topics in adversarial learning, game-theoretic learning, collective classification, privacy-preserving learning, security evaluation metrics, digital forensics, authorship identification, adversarial advertisement detection, learning for offensive security, and data sanitization. The workshop also featured workgroup sessions organized into three topic: machine learning for computer security, secure learning, and future applications of secure learning.

36 citations


01 Aug 2013
TL;DR: This study analyzes factors that can enable firms in the financial industry to formulate cloud computing strategy from a foundational investment in SaaS to contribute guidance into the formulation of strategy from initial investments in the technology.
Abstract: Cloud computing is a delivery method of information systems that is being deployed by the financial industry. Software-as-a-Service (SaaS) is the more frequent model of this method in the industry. In this study the authors analyze factors that can enable firms in the financial industry to formulate cloud computing strategy from a foundational investment in SaaS. The authors learn that business and procedural factors are more critical than technical factors as drivers in an implementation strategy. The findings of the study contribute guidance into the formulation of strategy from initial investments in the technology.

15 citations


Journal ArticleDOI
TL;DR: This paper is concerned with the forecasting of money demand changes relative to levels and using price level/inflation, real income, wealth, and interest rate as independent variables and neural networks yielded a better overall forecast of the changes in money demand.

5 citations


01 Jan 2013
TL;DR: The authors find procedural factors more evident than technical and business factors on projects of IaaS, but also find implementation methods more limiting in strategy.
Abstract: The cloud continues to be a delivery method of information systems deployed frequently by financial firms. Infrastructure-as-a-Service (IaaS) is an evolving model of this method in industry. In this study, the authors evaluate critical few factors that can enable financial firms to formulate a generic strategy from investment in IaaS. The authors find procedural factors more evident than technical and business factors on projects of IaaS, but also find implementation methods more limiting in strategy. The findings of this study contribute a framework for investment in this maturing method of cloud computing.

5 citations


01 Jan 2013
TL;DR: In this article, the authors analyzed the benefits of digital storytelling in a course engaging college students with high school students with developmental and intellectual disabilities, and found that a project of storytelling progressively enables high engagement of the students, in importance, performance and satisfaction.
Abstract: Community engagement is a common course in college curricula of computer science and information systems. In this study, the authors analyze the benefits of digital storytelling, in a course engaging college students with high school students with developmental and intellectual disabilities. The authors discover that a project of storytelling progressively enables high engagement of the students, in importance, performance and satisfaction. The authors also discover that the project enables progressively high impact in the advocacy of these students for individuals with disabilities, in selfefficacy and sociality. The study will benefit instructors in any discipline evaluating digital storytelling technology as a service-learning tool.

4 citations


Proceedings ArticleDOI
01 Oct 2013
TL;DR: The quality of the team projects produced and the correlation analysis of the examination grades in the Technology Entrepreneurship course showed relative improvement over those produced in the Data Mining course.
Abstract: In a computing technology entrepreneurship course offered in fall 2011, students were separated into teams of three and four students and taught the concepts and skills of teamwork, innovation, and entrepreneurship. They applied these concepts and skills to an open-ended project for a niche market financial or healthcare information technology product. Each team produced a product supported by a business plan and PowerPoint presentation as the project deliverable. The course was supported by mentors for the teams and guest lecturers. In spring 2011, a Data Mining course was offered where no direct instruction in teamwork, innovation, and entrepreneurship was provided, but the student teams were assigned a similar open-ended project. The objective of this exploratory study is to evaluate students and teams' relative increase in entrepreneurial aptitude. The two courses' performances were determined by the project quality and course grades (average in-class and final examinations) supplemented by a post survey of student perceptions of course related gains and changes in attitudes. As expected, the quality of the team projects produced and the correlation analysis of the examination grades in the Technology Entrepreneurship course showed relative improvement over those produced in the Data Mining course.

3 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that internal capital markets have a positive and significant relationship with patenting of emerging firms, but for mature firms, the relationship between internal finance and patenting is negative but not significant.
Abstract: Despite the overwhelming theoretical intuition in support of the arguments that internally generated capital should be an important determinant of corporate innovation, very little empirical evidence of this association has been established. In this paper we show that internal capital markets have a positive and significant relationship with patenting of emerging firms. However, for mature firms, the relationship between internal finance and patenting is negative but not significant. Our empirical analysis is grounded on the theoretical modeling of granting a patent as the maturity date of an American real call option, with internal capital and R&D expenses serving to shorten the maturity of the growth option and to speed up innovation.

1 citations