EdlibR: R interface to edlib

This R package edlibR provides bindings to the C/C++ library edlib, which computes the exact pairwise sequence alignment using the edit distance (Levenshtein distance). The functions within edlibR are modeled after the API of the Python package edlib on PyPI

There are three functions within edlibR:

align()

The first function provided by edlibR is align(). The function align() computes the pairwise alignment of the input query against the input target:

align(query, target, [mode], [task], [k], [cigarFormat], [additionalEqualities])

A list is returned with the following fields:

Examples:

library(edlibR)

algn1 = align("ACTG", "CACTRT", mode="HW", task="path")
print(algn1)
## $editDistance
## [1] 1
## 
## $alphabetLength
## [1] 5
## 
## $locations
## $locations[[1]]
## [1] 1 3
## 
## $locations[[2]]
## [1] 1 4
## 
## 
## $cigar
## [1] "3=1I"
## 
## $cigarFormat
## [1] "extended"
algn2 = align("elephant", "telephone")
print(algn2)
## $editDistance
## [1] 3
## 
## $alphabetLength
## [1] 8
## 
## $locations
## $locations[[1]]
## [1] NA  8
## 
## 
## $cigar
## NULL
## 
## $cigarFormat
## [1] "extended"
algn3 = align("ACTG", "CACTRT", mode="HW", task="path")
print(algn3)
## $editDistance
## [1] 1
## 
## $alphabetLength
## [1] 5
## 
## $locations
## $locations[[1]]
## [1] 1 3
## 
## $locations[[2]]
## [1] 1 4
## 
## 
## $cigar
## [1] "3=1I"
## 
## $cigarFormat
## [1] "extended"
## the previous example with additionalEqualities 
algn4 = align("ACTG", "CACTRT", mode="HW", task="path", additionalEqualities=list(c("R", "A"), c("R", "G")))
print(algn4)
## $editDistance
## [1] 0
## 
## $alphabetLength
## [1] 5
## 
## $locations
## $locations[[1]]
## [1] 1 4
## 
## 
## $cigar
## [1] "4="
## 
## $cigarFormat
## [1] "extended"

align(): arguments

getNiceAlignment()

The function getNiceAlignment() takes the output of align(), and represents this in a visually informative format for human inspection (“NICE format”). This will be an informative string showing the matches, mismatches, insertions, and deletions.

getNiceAlignment(alignResult, query, target, [gapSymbol])

Note: Users must use the argument task="path" within align() to output a CIGAR for getNiceAlignment(); otherwise, there will be no CIGAR for getNiceAlignment() to reconstruct the alignment in “NICE” format. Also, users must use the argument cigarFormat="extended" within align(); otherwise, the CIGAR will be too ambiguous for getNiceAlignment() to correctly reconstruct the alignment() in “NICE” format.

Examples:

library(edlibR)

query = "elephant"
target = "telephone"
result = align(query, target, task = "path")
nice_algn = getNiceAlignment(result, query, target)
print(nice_algn)
## $query_aligned
## [1] "-elephant"
## 
## $matched_aligned
## [1] "-|||||.|."
## 
## $target_aligned
## [1] "telephone"

getNiceAlignment(): arguments

nice_print()

The function nice_print() simply prints the output of getNiceAlignment() to the console for quickly inspecting the alignment. Users can think of this function as a “pretty-print” function for visualization.

library(edlibR)
## example above from getNiceAlignment()

query = "elephant"
target = "telephone"
result = align(query, target, task = "path")
nice_algn = getNiceAlignment(result, query, target)
nice_print(nice_algn)
## [1] "query:   -elephant"
## [1] "matched: -|||||.|."
## [1] "target:  telephone"

For more information regarding edlib, please see the publication in Bioinformatics.