scispace - formally typeset
Search or ask a question
Author

Mark Daniel Ward

Other affiliations: University of Pennsylvania
Bio: Mark Daniel Ward is an academic researcher from Purdue University. The author has contributed to research in topics: Combinatorics on words & Trie. The author has an hindex of 11, co-authored 50 publications receiving 463 citations. Previous affiliations of Mark Daniel Ward include University of Pennsylvania.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the importance of data science proficiency and resources for instructors to implement data science in their own statistics curricula are discussed. But these data science topics have not traditionally been a major component of undergraduate programs in statistics.
Abstract: A growing number of students are completing undergraduate degrees in statistics and entering the workforce as data analysts. In these positions, they are expected to understand how to use databases and other data warehouses, scrape data from Internet sources, program solutions to complex problems in multiple languages, and think algorithmically as well as statistically. These data science topics have not traditionally been a major component of undergraduate programs in statistics. Consequently, a curricular shift is needed to address additional learning outcomes. The goal of this article is to motivate the importance of data science proficiency and to provide examples and resources for instructors to implement data science in their own statistics curricula. We provide case studies from seven institutions. These varied approaches to teaching data science demonstrate curricular innovations to address new needs. Also included here are examples of assignments designed for courses that foster engagement of und...

151 citations

Journal ArticleDOI
TL;DR: The exact moments of the number of 2-protected nodes in binary search trees grown from random permutations are derived using a properly normalized version of this tree parameter.

34 citations

Journal ArticleDOI
TL;DR: In this article, it was shown that under a Markovian model of order one, the average depth of suffix trees of index n is asymptotically similar to the average depths of tries (a.k.a. digital trees) built on n independent strings.
Abstract: In this report, we prove that under a Markovian model of order one, the average depth of suffix trees of index n is asymptotically similar to the average depth of tries (a.k.a. digital trees) built on n independent strings. This leads to an asymptotic behavior of $(\log{n})/h + C$ for the average of the depth of the suffix tree, where $h$ is the entropy of the Markov model and $C$ is constant. Our proof compares the generating functions for the average depth in tries and in suffix trees; the difference between these generating functions is shown to be asymptotically small. We conclude by using the asymptotic behavior of the average depth in a trie under the Markov model found by Jacquet and Szpankowski ([JaSz91]).

29 citations

01 May 2007
TL;DR: A joint source-channel coding algorithm capable of correcting some errors in the popular Lempel-Ziv'77 (LZ'77) scheme without introducing any measurable degradation in the compression performance is proposed.
Abstract: We propose a joint source-channel coding algorithm capable of correcting some errors in the popular Lempel-Ziv'77 (LZ'77)scheme without introducing any measurable degradation in the compression performance. This can be achieved because the LZ'77 encoder does not completely eliminate the redundancy present in the input sequence. One source of redundancy can be observed when an LZ'77 phrase has multiple matches. In this case, LZ'77 can issue a pointer to any of those matches, and a particular choice carries some additional bits of information. We call a scheme with embedded redundant information the LZS'77 algorithm. We analyze the number of longest matches in such a scheme and prove that it follows the logarithmic series distribution with mean 1/h (plus some fluctuations), where h is the source entropy. Thus, the distribution associated with the number of redundant bits is well concentrated around its mean, a highly desirable property for error correction. These analytic results are proved by a combination of combinatorial, probabilistic, and analytic methods (e.g., Mellin transform, depoissonization, combinatorics on words). In fact, we analyze LZRS'77 by studying the multiplicity matching parameter in a suffix tree, which in turn is analyzed via comparison to its independent version, called trie. Finally, we present an algorithm in which a channel coder (e.g., Reed-Solomon (RS) coder) succinctly uses the inherent additional redundancy left by the LZS'77 encoder to detect and correct a limited number of errors. We call such a scheme the LZS'77 algorithm. LZRS'77 is perfectly backward-compatible with LZ'77, that is, a file compressed with our error-resistant LZRS'77 can still be decompressed by a generic LZ'77 decoder.

26 citations

Journal ArticleDOI
TL;DR: In this paper, the authors consider a sequence of n geometric random variables and interpret the outcome as an urn model, and derive asymptotic equivalents for all (centered or uncentered) moments in a fairly automatic way.
Abstract: We consider a sequence of n geometric random variables and interpret the outcome as an urn model. For a given parameter m, we treat several parameters like what is the largest urn containing at least (or exactly) m balls, or how many urns contain at least m balls, etc. Many of these questions have their origin in some computer science problems. Identifying the underlying distributions as (variations of) the extreme value distribution, we are able to derive asymptotic equivalents for all (centered or uncentered) moments in a fairly automatic way.

24 citations


Cited by
More filters
01 Jan 1989
TL;DR: Chickering is a Distinguished Professor of Higher Education at Memphis State University and a Visiting Professor at George Mason University as mentioned in this paper, and Gamson is a sociologist who holds appointments at the John W. McCormack Institute of Public Affairs at the University of Massachusetts-Boston, and in the Center for the Study of Higher and Postsecondary Education at University of Michigan.
Abstract: Arthur Chickering is Distinguished Professor of Higher Education at Memphis State University. On leave from the Directorship of the Center for the Study of Higher Education at Memphis State, he is Visiting Professor at George Mason University. Zelda Gamson is a sociologist who holds appointments at the John W. McCormack Institute of Public Affairs at the University of Massachusetts-Boston and in the Center for the Study of Higher and Postsecondary Education at the University of Michigan.

488 citations

15 May 2015
TL;DR: In this article, a universally applicable attitude and skill set for computer science is presented, which is a set of skills and attitudes that everyone would be eager to learn and use, not just computer scientists.
Abstract: It represents a universally applicable attitude and skill set everyone, not just computer scientists, would be eager to learn and use.

430 citations

01 Jan 1947
TL;DR: This chapter discusses Statistical Training and Curricular Revision, which aims to provide a history of the discipline and some of the techniques used to train teachers.
Abstract: Statistical Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1, 124, 254, 297 History Corner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20, 179 Teacher’s Corner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26, 173, 263, 335 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40, 147, 211, 366 Statistical Computing and Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Statistical Computing and Software Reviews . . . . . . . . . . . . . . . . . . . . . . . . 75, 187 Reviews of Books and Teaching Materials . . . . . . . . . . . . . . . . . 92, 189, 281, 401 Brief Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100, 195, 292, 404 Letters to the Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102, 197, 294, 406 Special Section: Statistical Training and Curricular Revision . . . . . . . . . . . . . 105 Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Special Section: Opportunities and Challenges for the Discipline . . . . . . . . . 201 Software Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

318 citations