Loading documentation...
Loading documentation...
Loading documentation...
The algorithms package provides built-in similarity functions for computing distance and similarity scores between values.
Import Path: github.com/kolosys/synapse/algorithms
All algorithms in this package return a normalized similarity score between 0.0 (completely different) and 1.0 (identical). This makes them directly compatible with Synapse's SimilarityFunc[K] type for string keys.
Computes the Levenshtein distance (edit distance) between two strings. This measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another.
score := algorithms.Levenshtein("hello", "helo")
// score = 0.80 (1 deletion needed)
score = algorithms.Levenshtein("kitten", "sitting")
// score = 0.57 (3 edits needed)
score = algorithms.Levenshtein("hello", "hello")
// score = 1.0 (identical)Use cases:
Complexity: O(n × m) time and space, where n and m are string lengths.
Similar to Levenshtein but also counts transpositions (swapping adjacent characters) as a single edit. This better handles common typing errors where characters are accidentally swapped.
score := algorithms.DamerauLevenshtein("hello", "hlelo")
// score = 0.80 (1 transposition)
// Compare with standard Levenshtein
score = algorithms.Levenshtein("hello", "hlelo")
// score = 0.60 (2 edits: delete + insert)Use cases:
Complexity: O(n × m) time and space.
Computes the Hamming distance between two strings of equal length. This counts the number of positions where corresponding characters differ.
score := algorithms.Hamming("hello", "hallo")
// score = 0.80 (1 position differs)
score = algorithms.Hamming("hello", "world")
// score = 0.20 (4 positions differ)
// Returns 0.0 for different lengths
score = algorithms.Hamming("hello", "hi")
// score = 0.0Use cases:
Complexity: O(n) time.
Computes Hamming distance between byte slices at the bit level. Useful for comparing binary data or hash values.
a := []byte{0xFF, 0x00} // 11111111 00000000
b := []byte{0xFE, 0x01} // 11111110 00000001
score := algorithms.HammingBytes(a, b)
// score = 0.875 (2 bits differ out of 16)Use cases:
Complexity: O(n) time.
Computes the Euclidean distance between two points in n-dimensional space. Returns a similarity score inversely proportional to the distance.
a := []float64{0, 0}
b := []float64{3, 4}
score := algorithms.Euclidean(a, b)
// score = 0.167 (distance = 5)
// Identical points
score = algorithms.Euclidean([]float64{1, 2, 3}, []float64{1, 2, 3})
// score = 1.0The similarity is calculated as 1.0 / (1.0 + distance).
Use cases:
Complexity: O(n) time.
Computes the Manhattan distance (L1 norm) between two points. This is the sum of absolute differences along each dimension.
a := []float64{0, 0}
b := []float64{3, 4}
score := algorithms.Manhattan(a, b)
// score = 0.125 (distance = 7)
score = algorithms.Manhattan([]float64{1, 2}, []float64{4, 6})
// score = 0.125 (distance = 7)Use cases:
Complexity: O(n) time.
All string algorithms can be used directly as similarity functions:
cache := synapse.New[string, string](
synapse.WithThreshold(0.8),
)
// Use Levenshtein directly
cache.WithSimilarity(algorithms.Levenshtein)
// Or use Damerau-Levenshtein
cache.WithSimilarity(algorithms.DamerauLevenshtein)For vector algorithms, wrap them in a function:
type Vector []float64
cache := synapse.New[Vector, string](
synapse.WithThreshold(0.5),
)
cache.WithSimilarity(func(a, b Vector) float64 {
return algorithms.Euclidean(a, b)
})You can implement custom similarity functions. They must:
float64 between 0.0 and 1.01.0 for identical keys// Custom case-insensitive prefix similarity
func prefixSimilarity(a, b string) float64 {
a, b = strings.ToLower(a), strings.ToLower(b)
if a == b {
return 1.0
}
shorter, longer := a, b
if len(b) < len(a) {
shorter, longer = b, a
}
if strings.HasPrefix(longer, shorter) {
return float64(len(shorter)) / float64(len(longer))
}
return 0.0
}
cache.WithSimilarity(prefixSimilarity)| Algorithm | Best For | Requirements |
|---|---|---|
| Levenshtein | General string matching, typos | None |
| Damerau-Levenshtein | Keyboard typos, transpositions | None |
| Hamming | Fixed-length identifiers | Equal length strings |
| HammingBytes | Binary data, hashes | Equal length slices |
| Euclidean | Embeddings, dense vectors | Same dimensions |
| Manhattan | Sparse vectors, grid distances | Same dimensions |
For high-throughput similarity searches, consider:
The algorithms package provides built-in similarity functions for computing distance and similarity scores between values.
Import Path: github.com/kolosys/synapse/algorithms
All algorithms in this package return a normalized similarity score between 0.0 (completely different) and 1.0 (identical). This makes them directly compatible with Synapse's SimilarityFunc[K] type for string keys.
Computes the Levenshtein distance (edit distance) between two strings. This measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform one string into another.
score := algorithms.Levenshtein("hello", "helo")
// score = 0.80 (1 deletion needed)
score = algorithms.Levenshtein("kitten", "sitting")
// score = 0.57 (3 edits needed)
score = algorithms.Levenshtein("hello", "hello")
// score = 1.0 (identical)Use cases:
Complexity: O(n × m) time and space, where n and m are string lengths.
Similar to Levenshtein but also counts transpositions (swapping adjacent characters) as a single edit. This better handles common typing errors where characters are accidentally swapped.
score := algorithms.DamerauLevenshtein("hello", "hlelo")
// score = 0.80 (1 transposition)
// Compare with standard Levenshtein
score = algorithms.Levenshtein("hello", "hlelo")
// score = 0.60 (2 edits: delete + insert)Use cases:
Complexity: O(n × m) time and space.
Computes the Hamming distance between two strings of equal length. This counts the number of positions where corresponding characters differ.
score := algorithms.Hamming("hello", "hallo")
// score = 0.80 (1 position differs)
score = algorithms.Hamming("hello", "world")
// score = 0.20 (4 positions differ)
// Returns 0.0 for different lengths
score = algorithms.Hamming("hello", "hi")
// score = 0.0Use cases:
Complexity: O(n) time.
Computes Hamming distance between byte slices at the bit level. Useful for comparing binary data or hash values.
a := []byte{0xFF, 0x00} // 11111111 00000000
b := []byte{0xFE, 0x01} // 11111110 00000001
score := algorithms.HammingBytes(a, b)
// score = 0.875 (2 bits differ out of 16)Use cases:
Complexity: O(n) time.
Computes the Euclidean distance between two points in n-dimensional space. Returns a similarity score inversely proportional to the distance.
a := []float64{0, 0}
b := []float64{3, 4}
score := algorithms.Euclidean(a, b)
// score = 0.167 (distance = 5)
// Identical points
score = algorithms.Euclidean([]float64{1, 2, 3}, []float64{1, 2, 3})
// score = 1.0The similarity is calculated as 1.0 / (1.0 + distance).
Use cases:
Complexity: O(n) time.
Computes the Manhattan distance (L1 norm) between two points. This is the sum of absolute differences along each dimension.
a := []float64{0, 0}
b := []float64{3, 4}
score := algorithms.Manhattan(a, b)
// score = 0.125 (distance = 7)
score = algorithms.Manhattan([]float64{1, 2}, []float64{4, 6})
// score = 0.125 (distance = 7)Use cases:
Complexity: O(n) time.
All string algorithms can be used directly as similarity functions:
cache := synapse.New[string, string](
synapse.WithThreshold(0.8),
)
// Use Levenshtein directly
cache.WithSimilarity(algorithms.Levenshtein)
// Or use Damerau-Levenshtein
cache.WithSimilarity(algorithms.DamerauLevenshtein)For vector algorithms, wrap them in a function:
type Vector []float64
cache := synapse.New[Vector, string](
synapse.WithThreshold(0.5),
)
cache.WithSimilarity(func(a, b Vector) float64 {
return algorithms.Euclidean(a, b)
})You can implement custom similarity functions. They must:
float64 between 0.0 and 1.01.0 for identical keys// Custom case-insensitive prefix similarity
func prefixSimilarity(a, b string) float64 {
a, b = strings.ToLower(a), strings.ToLower(b)
if a == b {
return 1.0
}
shorter, longer := a, b
if len(b) < len(a) {
shorter, longer = b, a
}
if strings.HasPrefix(longer, shorter) {
return float64(len(shorter)) / float64(len(longer))
}
return 0.0
}
cache.WithSimilarity(prefixSimilarity)| Algorithm | Best For | Requirements |
|---|---|---|
| Levenshtein | General string matching, typos | None |
| Damerau-Levenshtein | Keyboard typos, transpositions | None |
| Hamming | Fixed-length identifiers | Equal length strings |
| HammingBytes | Binary data, hashes | Equal length slices |
| Euclidean | Embeddings, dense vectors | Same dimensions |
| Manhattan | Sparse vectors, grid distances | Same dimensions |
For high-throughput similarity searches, consider:
score := algorithms.Levenshtein("hello", "helo")
// score = 0.80 (1 deletion needed)
score = algorithms.Levenshtein("kitten", "sitting")
// score = 0.57 (3 edits needed)
score = algorithms.Levenshtein("hello", "hello")
// score = 1.0 (identical)score := algorithms.DamerauLevenshtein("hello", "hlelo")
// score = 0.80 (1 transposition)
// Compare with standard Levenshtein
score = algorithms.Levenshtein("hello", "hlelo")
// score = 0.60 (2 edits: delete + insert)score := algorithms.Hamming("hello", "hallo")
// score = 0.80 (1 position differs)
score = algorithms.Hamming("hello", "world")
// score = 0.20 (4 positions differ)
// Returns 0.0 for different lengths
score = algorithms.Hamming("hello", "hi")
// score = 0.0a := []byte{0xFF, 0x00} // 11111111 00000000
b := []byte{0xFE, 0x01} // 11111110 00000001
score := algorithms.HammingBytes(a, b)
// score = 0.875 (2 bits differ out of 16)a := []float64{0, 0}
b := []float64{3, 4}
score := algorithms.Euclidean(a, b)
// score = 0.167 (distance = 5)
// Identical points
score = algorithms.Euclidean([]float64{1, 2, 3}, []float64{1, 2, 3})
// score = 1.0a := []float64{0, 0}
b := []float64{3, 4}
score := algorithms.Manhattan(a, b)
// score = 0.125 (distance = 7)
score = algorithms.Manhattan([]float64{1, 2}, []float64{4, 6})
// score = 0.125 (distance = 7)cache := synapse.New[string, string](
synapse.WithThreshold(0.8),
)
// Use Levenshtein directly
cache.WithSimilarity(algorithms.Levenshtein)
// Or use Damerau-Levenshtein
cache.WithSimilarity(algorithms.DamerauLevenshtein)type Vector []float64
cache := synapse.New[Vector, string](
synapse.WithThreshold(0.5),
)
cache.WithSimilarity(func(a, b Vector) float64 {
return algorithms.Euclidean(a, b)
})// Custom case-insensitive prefix similarity
func prefixSimilarity(a, b string) float64 {
a, b = strings.ToLower(a), strings.ToLower(b)
if a == b {
return 1.0
}
shorter, longer := a, b
if len(b) < len(a) {
shorter, longer = b, a
}
if strings.HasPrefix(longer, shorter) {
return float64(len(shorter)) / float64(len(longer))
}
return 0.0
}
cache.WithSimilarity(prefixSimilarity)score := algorithms.Levenshtein("hello", "helo")
// score = 0.80 (1 deletion needed)
score = algorithms.Levenshtein("kitten", "sitting")
// score = 0.57 (3 edits needed)
score = algorithms.Levenshtein("hello", "hello")
// score = 1.0 (identical)score := algorithms.DamerauLevenshtein("hello", "hlelo")
// score = 0.80 (1 transposition)
// Compare with standard Levenshtein
score = algorithms.Levenshtein("hello", "hlelo")
// score = 0.60 (2 edits: delete + insert)score := algorithms.Hamming("hello", "hallo")
// score = 0.80 (1 position differs)
score = algorithms.Hamming("hello", "world")
// score = 0.20 (4 positions differ)
// Returns 0.0 for different lengths
score = algorithms.Hamming("hello", "hi")
// score = 0.0a := []byte{0xFF, 0x00} // 11111111 00000000
b := []byte{0xFE, 0x01} // 11111110 00000001
score := algorithms.HammingBytes(a, b)
// score = 0.875 (2 bits differ out of 16)a := []float64{0, 0}
b := []float64{3, 4}
score := algorithms.Euclidean(a, b)
// score = 0.167 (distance = 5)
// Identical points
score = algorithms.Euclidean([]float64{1, 2, 3}, []float64{1, 2, 3})
// score = 1.0a := []float64{0, 0}
b := []float64{3, 4}
score := algorithms.Manhattan(a, b)
// score = 0.125 (distance = 7)
score = algorithms.Manhattan([]float64{1, 2}, []float64{4, 6})
// score = 0.125 (distance = 7)cache := synapse.New[string, string](
synapse.WithThreshold(0.8),
)
// Use Levenshtein directly
cache.WithSimilarity(algorithms.Levenshtein)
// Or use Damerau-Levenshtein
cache.WithSimilarity(algorithms.DamerauLevenshtein)type Vector []float64
cache := synapse.New[Vector, string](
synapse.WithThreshold(0.5),
)
cache.WithSimilarity(func(a, b Vector) float64 {
return algorithms.Euclidean(a, b)
})// Custom case-insensitive prefix similarity
func prefixSimilarity(a, b string) float64 {
a, b = strings.ToLower(a), strings.ToLower(b)
if a == b {
return 1.0
}
shorter, longer := a, b
if len(b) < len(a) {
shorter, longer = b, a
}
if strings.HasPrefix(longer, shorter) {
return float64(len(shorter)) / float64(len(longer))
}
return 0.0
}
cache.WithSimilarity(prefixSimilarity)