Search or ask a question

Showing papers by "Mark Daniel Ward published in 2020"

PDF

Open Access

Journal Article•DOI•

Fostering Undergraduate Data Science

[...]

Fulya Gokalp Yavuz¹, Mark Daniel Ward¹•Institutions (1)

Purdue University¹

02 Jan 2020-The American Statistician

TL;DR: An approach to data science training that uses several types of computational tools, including R, bash, awk, regular expressions, SQL, and XPath, often used in tandem is advocated.

...read moreread less

Abstract: Data Science is one of the newest interdisciplinary areas. It is transforming our lives unexpectedly fast. This transformation is also happening in our learning styles and practicing habits. We adv...

...read moreread less

11 citations

Journal Article•DOI•

Asymptotic Analysis of the kth Subword Complexity

[...]

Lida Ahmadi¹, Mark Daniel Ward¹•Institutions (1)

Purdue University¹

12 Feb 2020-Entropy

TL;DR: This paper evaluates the expected value and the second factorial moment (followed by a corollary on the second moment) of the kth Subword Complexity for the binary strings over memory-less sources and investigates the asymptotic behavior for values of k=Θ(logn).

...read moreread less

Abstract: Patterns within strings enable us to extract vital information regarding a string’s randomness. Understanding whether a string is random (Showing no to little repetition in patterns) or periodic (showing repetitions in patterns) are described by a value that is called the kth Subword Complexity of the character string. By definition, the kth Subword Complexity is the number of distinct substrings of length k that appear in a given string. In this paper, we evaluate the expected value and the second factorial moment (followed by a corollary on the second moment) of the kth Subword Complexity for the binary strings over memory-less sources. We first take a combinatorial approach to derive a probability generating function for the number of occurrences of patterns in strings of finite length. This enables us to have an exact expression for the two moments in terms of patterns’ auto-correlation and correlation polynomials. We then investigate the asymptotic behavior for values of k = Θ ( log n ) . In the proof, we compare the distribution of the kth Subword Complexity of binary strings to the distribution of distinct prefixes of independent strings stored in a trie. The methodology that we use involves complex analysis, analytical poissonization and depoissonization, the Mellin transform, and saddle point analysis.

...read moreread less

1 citations

Journal Article•DOI•

The next wave: We will all be data scientists

[...]

Margaret Betz¹, Ellen Gundlach¹, Elizabett Hillery¹, Jenna L. Rickus¹, Mark Daniel Ward¹ - Show less +1 more•Institutions (1)

Purdue University¹

01 Dec 2020

TL;DR: All undergraduate students, regardless of background or major, should think of all undergraduate students as future data scientists, and be given the opportunity to apply powerful tools to large data sets, using real-world problems.

...read moreread less

Abstract: Funding information Cummins Incorporated; Foundation for Food and Agriculture Research, Grant/Award Number: 534662; National Institute of Food and Agriculture; National Science Foundation, Grant/Award Numbers: 0939370, 1246818; Society of Actuaries Abstract In the next wave of educating future data scientists, we need to think of all undergraduate students, regardless of background or major, as future data scientists. We should train them in supportive, interdisciplinary environments. Starting from their first day at college, they should be given the opportunity to apply powerful tools to large data sets, using real-world problems. Partnerships with research computing, academic departments, research centers, companies, government, and nonprofits will all be necessary to fully prepare these students for the breadth of the data science workforce.

...read moreread less

1 citations