scispace - formally typeset
Search or ask a question
Topic

Trie

About: Trie is a research topic. Over the lifetime, 2245 publications have been published within this topic receiving 40445 citations. The topic is also known as: digital tree & radix tree.


Papers
More filters
Proceedings Article
30 Jul 2011
TL;DR: KenLM is a library that implements two data structures for efficient language model queries, reducing both time and memory costs and is integrated into the Moses, cdec, and Joshua translation systems.
Abstract: We present KenLM, a library that implements two data structures for efficient language model queries, reducing both time and memory costs. The Probing data structure uses linear probing hash tables and is designed for speed. Compared with the widely-used SRILM, our Probing model is 2.4 times as fast while using 57% of the memory. The Trie data structure is a trie with bit-level packing, sorted records, interpolation search, and optional quantization aimed at lower memory consumption. Trie simultaneously uses less memory than the smallest lossless baseline and less CPU than the fastest baseline. Our code is open-source, thread-safe, and integrated into the Moses, cdec, and Joshua translation systems. This paper describes the several performance techniques used and presents benchmarks against alternative implementations.

1,297 citations

Journal ArticleDOI
Edward Fredkin1
TL;DR: In this paper several paradigms of trie memory are described and compared with other memory paradigm, their advantages and disadvantages are examined in detail, and applications are discussed.
Abstract: Trie memory is a way of storing and retrieving information. ~ It is applicable to information that consists of function-argument (or item-term) pairs--information conventionally stored in unordered lists, ordered lists, or pigeonholes. The main advantages of trie memory over the other memoIw plans just mentioned are shorter access time, greater ease of addition or up-dating, greater convenience in handling arguments of diverse lengths, and the ability to take advantage of redundancies in the information stored. The main disadvantage is relative inefficiency in using storage space, but this inefficiency is not great when the store is large. In this paper several paradigms of trie memory are described and compared with other memory paradigms, their advantages and disadvantages are examined in detail, and applications are discussed. Many essential features of trie memory were mentioned by de la Briandais [1] in a paper presented to the Western Joint Computer Conference in 1959. The present development is essentially independent of his, having been described in memorandum form in January 1959 [2], and it is fuller in that it considers additional paradigms (finitedimensional trie memories) and includes experimental results bearing on the efficiency of utilization of storage space.

1,144 citations

Journal ArticleDOI
TL;DR: An algorithm for combinatorial optimization where an explicit check for the repetition of configurations is added to the basic scheme of Tabu search, and it is shown that the Hashing or Digital Tree techniques can be used in order to search for repetitions in a time that is approximately constant.
Abstract: We propose an algorithm for combinatorial optimization where an explicit check for the repetition of configurations is added to the basic scheme of Tabu search. In our Tabu scheme the appropriate size of the list is learned in an automated way by reacting to the occurrence of cycles. In addition, if the search appears to be repeating an excessive number of solutions excessively often, then the search is diversified by making a number of random moves proportional to a moving average of the cycle length. The reactive scheme is compared to a “strict” Tabu scheme that forbids the repetition of configurations and to schemes with a fixed or randomly varying list size. From the implementation point of view we show that the Hashing or Digital Tree techniques can be used in order to search for repetitions in a time that is approximately constant. We present the results obtained for a series of computational tests on a benchmark function, on the 0-1 Knapsack Problem, and on the Quadratic Assignment Problem. INFORMS...

865 citations

Journal ArticleDOI
TL;DR: This paper proposes three novel tree structures to efficiently perform incremental and interactive HUP mining that can capture the incremental data without any restructuring operation, and shows that these tree structures are very efficient and scalable.
Abstract: Recently, high utility pattern (HUP) mining is one of the most important research issues in data mining due to its ability to consider the nonbinary frequency values of items in transactions and different profit values for every item. On the other hand, incremental and interactive data mining provide the ability to use previous data structures and mining results in order to reduce unnecessary calculations when a database is updated, or when the minimum threshold is changed. In this paper, we propose three novel tree structures to efficiently perform incremental and interactive HUP mining. The first tree structure, Incremental HUP Lexicographic Tree (IHUPL-Tree), is arranged according to an item's lexicographic order. It can capture the incremental data without any restructuring operation. The second tree structure is the IHUP transaction frequency tree (IHUPTF-Tree), which obtains a compact size by arranging items according to their transaction frequency (descending order). To reduce the mining time, the third tree, IHUP-transaction-weighted utilization tree (IHUPTWU-Tree) is designed based on the TWU value of items in descending order. Extensive performance analyses show that our tree structures are very efficient and scalable for incremental and interactive HUP mining.

555 citations

Journal ArticleDOI
TL;DR: The main technique, controlled prefix expansion, transforms a set of prefixes into an equivalent set with fewer prefix lengths, and optimization techniques based on dynamic programming, and local transformations of data structures to improve cache behavior are used.
Abstract: Internet (IP) address lookup is a major bottleneck in high-performance routers. IP address lookup is challenging because it requires a longest matching prefix lookup. It is compounded by increasing routing table sizes, increased traffic, higher-speed links, and the migration to 128-bit IPv6 addresses. We describe how IP lookups and updates can be made faster using a set of of transformation techniques. Our main technique, controlled prefix expansion, transforms a set of prefixes into an equivalent set with fewer prefix lengths. In addition, we use optimization techniques based on dynamic programming, and local transformations of data structures to improve cache behavior. When applied to trie search, our techniques provide a range of algorithms (Expanded Tries) whose performance can be tuned. For example, using a processor with 1MB of L2 cache, search of the MaeEast database containing 38000 prefixes can be done in 3 L2 cache accesses. On a 300MHz Pentium II which takes 4 cycles for accessing the first word of the L2 cacheline, this algorithm has a worst-case search time of 180 nsec., a worst-case insert/delete time of 2.5 msec., and an average insert/delete time of 4 usec. Expanded tries provide faster search and faster insert/delete times than earlier lookup algirthms. When applied to Binary Search on Levels, our techniques improve worst-case search times by nearly a factor of 2 (using twice as much storage) for the MaeEast database. Our approach to algorithm design is based on measurements using the VTune tool on a Pentium to obtain dynamic clock cycle counts. Our techniques also apply to similar address lookup problems in other network protocols.

514 citations


Network Information
Related Topics (5)
Graph (abstract data type)
69.9K papers, 1.2M citations
87% related
Scalability
50.9K papers, 931.6K citations
86% related
Server
79.5K papers, 1.4M citations
86% related
Cache
59.1K papers, 976.6K citations
84% related
Web page
50.3K papers, 975.1K citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202336
202265
202156
202079
2019110
201887