使用Ruby测量两个字符串之间的距离?

我可以用Ruby测量两个字符串之间的距离吗?

即:

compare('Test', 'est') # Returns 1 compare('Test', 'Tes') # Returns 1 compare('Test', 'Tast') # Returns 1 compare('Test', 'Taste') # Returns 2 compare('Test', 'tazT') # Returns 5 

我找到了这个给你:

 def levenshtein_distance(s, t) m = s.length n = t.length return m if n == 0 return n if m == 0 d = Array.new(m+1) {Array.new(n+1)} (0..m).each {|i| d[i][0] = i} (0..n).each {|j| d[0][j] = j} (1..n).each do |j| (1..m).each do |i| d[i][j] = if s[i-1] == t[j-1] # adjust index into string d[i-1][j-1] # no operation required else [ d[i-1][j]+1, # deletion d[i][j-1]+1, # insertion d[i-1][j-1]+1, # substitution ].min end end end d[m][n] end [ ['fire','water'], ['amazing','horse'], ["bamerindos", "giromba"] ].each do |s,t| puts "levenshtein_distance('#{s}', '#{t}') = #{levenshtein_distance(s, t)}" end 

这是非常棒的输出:=)

 levenshtein_distance('fire', 'water') = 4 levenshtein_distance('amazing', 'horse') = 7 levenshtein_distance('bamerindos', 'giromba') = 9 

资料来源: http : //rosettacode.org/wiki/Levenshtein_distance#Ruby

由于本机C绑定,更容易和更快:

 gem install levenshtein-ffi gem install levenshtein require 'levenshtein' Levenshtein.normalized_distance string1, string2, threshold 

http://rubygems.org/gems/levenshtein http://rubydoc.info/gems/levenshtein/0.2.2/frames

更简单的是,我有时会露出ruby……

 # Levenshtein distance, translated from wikipedia pseudocode by ross def lev s, t return t.size if s.empty? return s.size if t.empty? return [ (lev s.chop, t) + 1, (lev s, t.chop) + 1, (lev s.chop, t.chop) + (s[-1, 1] == t[-1, 1] ? 0 : 1) ].min end 

在Rubygems中有一个实用方法实际上应该是公共的但不是,不管怎么说:

 require "rubygems/text" ld = Class.new.extend(Gem::Text).method(:levenshtein_distance) p ld.call("asd", "sdf") => 2 

我制作了一个damerau-levenshtein gem ,其中算法在C中实现

 require "damerau-levenshtein" dl = DamerauLevenshtein dl.distance("Something", "Smoething") #returns 1 

我喜欢上面的DigitalRoss解决方案。 但是,正如dawg所指出的,它的运行时间呈指数级增长(在O(3^n)的顺序上),这对于更长的字符串是没有好处的。 使用memoization或“动态编程”可以大大加快该解决方案的速度:

 def lev s, t @memo ||= {} return t.size if s.empty? return s.size if t.empty? min = [ (@memo[[s.chop, t]] || (lev s.chop, t)) + 1, (@memo[[s, t.chop]] || (lev s, t.chop)) + 1, (@memo[[s.chop, t.chop]] || (lev s.chop, t.chop)) + (s[-1] == t[-1] ? 0 : 1) ].min @memo[[s, t]] = min end 

然后我们有更好的运行时间,( O(n^2)我相信)。

 [9] pry(main)> require 'benchmark' => true [10] pry(main)> @memo = {} => {} [11] pry(main)> Benchmark.realtime{puts lev("Hello darkness my old friend", "I've come to talk with you again")} 26 => 0.007071999832987785