如何计算角色连续出现的次数

我的代码适用于常规字符数

count = Hash.new(0) str.each_char do |char| count[char] += 1 unless char == " " end count 

例如, "aaabbaaaaacccbbdddd"等于’a’= 8,’b’= 4,’c’= 3,’d’= 4。

我想连续发生多少次。 我想要的结果是:’a’= 3,’b’= 2,’a’= 5’c’= 3,’b’= 2,’d’= 4.我该怎么做?

 "aaabbaaaaacccbbdddd".each_char.chunk(&:itself).map{|k, v| [k, v.length]} # => [["a", 3], ["b", 2], ["a", 5], ["c", 3], ["b", 2], ["d", 4]] 

我对sawa和spickermann的解决方案进行了基准测试:

 require 'benchmark/ips' def sawa(string) string.each_char.chunk(&:itself).map{|k, v| [k, v.length] } end def spickermann(string) string.split(//).slice_when { |a, b| a != b }.map { |group| [group.first, group.size] } end Benchmark.ips do |x| string = "aaabbaaaaacccbbdddd" x.report("sawa") { sawa string } x.report("spickerman") { spickermann string } x.compare! end # Calculating ------------------------------------- # sawa 6.293ki/100ms # spickermann 4.447ki/100ms # ------------------------------------------------- # sawa 75.353k (±10.4%) i/s - 371.287k # spickermann 48.661k (±12.0%) i/s - 240.138k # # Comparison: # sawa: 75353.5 i/s # spickermann: 48660.7 i/s - 1.55x slower 

关于什么:

 string.split(//).slice_when { |a, b| a != b }. map { |group| [group.first, group.size] } #=> [['a', 3], ['b', 2], ['a', 5], ['c', 3], ['b', 2], ['d', 4]] 

而不是哈希,使用数组来存储对,如你所见。

 str = "aaabbaaaaacccbbdddd" counts = [] str.each_char do |char| # Get the last seen character and count pair last_pair = counts[-1] || [] if last_pair[0] == char # This character is the same as the last one, increment its count last_pair[1] += 1 else # New character, push a new pair onto the list counts.push([char, 1]) end end counts.each { |c| puts "#{c[0]} = #{c[1]}" } 

这可以使用chunk更简洁地编写。

 str = "aaabbaaaaacccbbdddd" counts = [] str.chars.chunk(&:itself).each { |char, chars| counts << [char, chars.length] } puts counts.inspect 

chunk将列表拆分为块。 它通过调用每个元素上的块来决定这一点。 只要块返回与前一个值相同的值,它就会添加到当前块。 一旦它改变,它就会产生一个新的块。 这类似于我们之前在循环中通过存储最后看到的字符所做的事情。

  if last_seen == char # it's the same chunk else # it's a new chunk last_seen = char end 

itself返回角色。 所以chunk(&:itself)会将字符串拆分成字符块。

新列表是chunk(&:itself)的返回值(在我们的例子中是此块中的字符)加上实际的块(例如字符串“aaa”)。

我更喜欢这种问题的正则表达式:

 str = "aaabbaaaaacccbbdddd" counts = str.scan(/(?(?\w)\k+)/).inject([]) do |occurs, match| occurs << [match[1], match[0].size] occurs end puts counts.inspect #=>[["a", 3], ["b", 2], ["a", 5], ["c", 3], ["b", 2], ["d", 4]] 

编辑:

我用@sawa运行了相同的基准测试,并添加了正则表达式方式。 看起来好一点。 此外, #itself不适用于ruby < 2.2.x

 require 'benchmark/ips' def sawa(string) string.each_char.chunk(&:itself).map{|k, v| [k, v.length] } end def spickermann(string) string.split(//).slice_when { |a, b| a != b }.map { |group| [group.first, group.size] } end def stathopa(string) string.scan(/(?(?\w)\k+)/).inject([]) do |occurs, match| occurs << [match[1], match[0].size] occurs end end Benchmark.ips do |x| string = "aaabbaaaaacccbbdddd" x.report("sawa") { sawa string } x.report("spickerman") { spickermann string } x.report("stathopa") { stathopa string } x.compare! end # Calculating ------------------------------------- # sawa 6.730ki/100ms # spickerman 4.061ki/100ms # stathopa 11.969ki/100ms # ------------------------------------------------- # sawa 70.072k (± 8.9%) i/s - 349.960k # spickerman 43.652k (± 9.5%) i/s - 219.294k # stathopa 132.992k (± 8.8%) i/s - 670.264k # # Comparison: # stathopa: 132992.1 i/s # sawa: 70072.4 i/s - 1.90x slower # spickerman: 43651.6 i/s - 3.05x slower # 

要计算每个字符的最大长度序列:

 count = Hash.new(0) last_char = nil occurred = 0 str.each_char do |char| if char != last_char occurred = 1 else occurred += 1 end last_char = char count[char] = occurred if (count[char]||0) < occurred end count 

或者得到像[['a',3],['b',2],['a',5],['c',3],['b',2],['d'的结果”,4]]:

 count = [] last_char = nil occurred = 0 str.each_char do |char| if char != last_char count.push([last_char, occurred]) occurred = 1 else occurred += 1 end last_char = char end count.push([last_char, occurred]) count 

这是一种方法:

 s = "aaabbaaaaacccbbdddd" s.chars.uniq.map do |c| p [c, s.split(/[^#{c}]+/).reject(&:empty?).map(&:size)] end.to_h #=> {"a"=>[3, 5], "b"=>[2, 2], "c"=>[3], "d"=>[4]}