AhoCorasickTrie: Fast Searching for Multiple Keywords in Multiple Texts

Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported.

Version: 0.1.2
Imports: Rcpp (≥ 0.12.5)
LinkingTo: Rcpp
Suggests: Biostrings, microbenchmark, testthat
Published: 2020-09-29
Author: Matt Chambers [aut, cre], Tomas Petricek [aut, cph], Vanderbilt University [cph]
Maintainer: Matt Chambers <matt.chambers42 at gmail.com>
BugReports: https://github.com/chambm/AhoCorasickTrie/issues
License: Apache License 2.0
URL: https://github.com/chambm/AhoCorasickTrie
NeedsCompilation: yes
SystemRequirements: C++11
CRAN checks: AhoCorasickTrie results

Documentation:

Reference manual: AhoCorasickTrie.pdf

Downloads:

Package source: AhoCorasickTrie_0.1.2.tar.gz
Windows binaries: r-devel: AhoCorasickTrie_0.1.2.zip, r-release: AhoCorasickTrie_0.1.2.zip, r-oldrel: AhoCorasickTrie_0.1.2.zip
macOS binaries: r-release (arm64): AhoCorasickTrie_0.1.2.tgz, r-oldrel (arm64): AhoCorasickTrie_0.1.2.tgz, r-release (x86_64): AhoCorasickTrie_0.1.2.tgz
Old sources: AhoCorasickTrie archive

Reverse dependencies:

Reverse imports: customProDB, prozor

Linking:

Please use the canonical form https://CRAN.R-project.org/package=AhoCorasickTrie to link to this page.