为什么使用String＃count比使用Ruby中的String #chars更快地计数字母？

使用以下基准：

def create_genome "gattaca" * 100 end def count_frequency_using_chars(sequence) 100000.times do sequence.chars.group_by{|x| x}.map{|letter, array| [letter, array.count]} end end def count_frequency_using_count(sequence) 100000.times do ["a", "c", "g", "t"].map{|letter| sequence.count(letter)} end end sequence = create_genome count_frequency_using_chars(sequence) count_frequency_using_count(sequence)

我发现，在基于C的Ruby中，对于1.8和1.9.2，使用String#count(letter)比使用Enumerable#group_by和Array#count它们进行排序和计算快约50倍。我对此感到有些惊讶，因为String#count方法每次迭代读取字符串四次，而后者只读取一次。

我尝试在ruby-prof和perftools.rb下运行代码，并且它们都只表示String#chars #chars占用了90％的时间，没有分解花费90％的时间。

如果我不得不猜测为什么会有差异，我会说创造7000万个单字符串会很昂贵，但我怎么能知道呢？（更新： String#chars mainly_execute_a_trivial_block不是罪魁祸首 – 请参阅mainly_execute_a_trivial_block的基准测试）

编辑：使用1.9.2补丁级别180的当前基准：

 require 'pp' require 'benchmark' def create_genome "gattaca" * 100 end ZILLION = 100000 def count_frequency_using_count(sequence) ZILLION.times do ["a", "c", "g", "t"].map{|letter| sequence.count(letter)} end end def count_frequency_using_chars(sequence) ZILLION.times do sequence.chars.group_by{|x| x}.map{|letter, array| [letter, array.count]} end end def count_frequency_using_inject_hash(sequence) ZILLION.times do sequence.chars.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } end end def count_frequency_using_each_with_object(sequence) ZILLION.times do sequence.chars.each_with_object(Hash.new(0)) { |char, hash| hash[char] += 1} end end def just_group_by(sequence) ZILLION.times do sequence.chars.group_by{|x| x} end end def just_chars_and_trivial_block(sequence) ZILLION.times do sequence.chars() {} end end def mainly_execute_a_trivial_block(sequence) ZILLION.times do sequence.length.times() {} end end def execute_an_empty_loop_instead(sequence) ZILLION.times do i = 0 max = sequence.length until i == max i += 1 end end end sequence = create_genome puts RUBY_VERSION Benchmark.bm do |benchmark| benchmark.report do count_frequency_using_count(sequence) end benchmark.report do count_frequency_using_chars(sequence) end benchmark.report do count_frequency_using_inject_hash(sequence) end benchmark.report do count_frequency_using_each_with_object(sequence) end benchmark.report do just_group_by(sequence) end benchmark.report do just_chars_and_trivial_block(sequence) end benchmark.report do mainly_execute_a_trivial_block(sequence) end benchmark.report do execute_an_empty_for_loop_instead(sequence) end end

结果：

  user system total real 0.510000 0.000000 0.510000 ( 0.508499) # count_frequency_using_count 23.470000 0.030000 23.500000 ( 23.575487) # count_frequency_using_chars 32.770000 0.050000 32.820000 ( 32.874634) # count_frequency_using_inject_hash 31.880000 0.040000 31.920000 ( 31.942437) # count_frequency_using_each_with_object 22.950000 0.030000 22.980000 ( 22.970500) # just_group_by 13.300000 0.020000 13.320000 ( 13.314466) # just_chars_and_trivial_block 5.660000 0.000000 5.660000 ( 5.661516) # mainly_execute_a_trivial_block 1.930000 0.010000 1.940000 ( 1.934861) # execute_an_empty_loop_instead

它与ruby内部无关。您正在将苹果与橙子进行比较。

在第一个示例中，您将700个字符串分组100000次并查找计数。所以这是你的逻辑问题。不计算。在第二种方法，你只是在计算，

在这两种方法中，您只使用计数

只需更改第一个示例

 def count_frequency_using_chars(sequence) grouped = sequence.chars.group_by{|x| x} 100000.times do grouped.map{|letter, array| [letter, array.count]} end end

它和你的第二个一样快

编辑

这种方法比count_frequency_using_count快3倍，检查基准测试

  def count_frequency_using_chars_with_single_group(sequence) grouped = sequence.chars.group_by{|x| x} 100000.times do grouped.map{|letter, array| [letter, array.count]} end end def count_frequency_using_count(sequence) 100000.times do ["a", "c", "g", "t"].map{|letter| sequence.count(letter)} end end Benchmark.bm do |benchmark| benchmark.report do pp count_frequency_using_chars_with_single_group(sequence) end benchmark.report do pp count_frequency_using_count(sequence) end end user system total real 0.410000 0.000000 0.410000 ( 0.419100) 1.330000 0.000000 1.330000 ( 1.324431)

安德鲁给你的评论，

measuring the character composition of 100000 sequences once each, not the character composition of one sequence 100000 times ，仍然你的计数方法比group_by方法慢得多。正如你所说，我只是对大字符串进行了基准测试

 seq = "gattaca" * 10000 #seq length is 70000 arr_seq = (1..10).map {|x| seq} #10 seq items

并改变了处理多个序列的方法

 def count_frequency_using_chars_with_single_group(sequences) sequences.each do |sequence| grouped = sequence.chars.group_by{|x| x} 100000.times do grouped.map{|letter, array| [letter, array.count]} end end end def count_frequency_using_count(sequence) sequences.each do |sequence| 100000.times do ["a", "c", "g", "t"].map{|letter| sequence.count(letter)} end end end Benchmark.bm do |benchmark| benchmark.report do pp count_frequency_using_chars_with_single_group(arr_seq) end benchmark.report do pp count_frequency_using_count(arr_seq) end end

对于处理100000次，10个序列，每个序列长度为70000

  user system total real 3.710000 0.040000 3.750000 ( 23.452889) #modified group_by approach 1039.180000 6.920000 1046.100000 (1071.374730) #simple char count approach

对于高音量字符串，您的简单字符计数方法比修改后的group_by方法慢47％。我运行了上述基准测试，只有10个序列，每个序列长度为70000。假设有100或1000个序列，简单计数永远不会是一个选项。对？

那个慢的东西是group_by。实际上，虽然你需要为count方法执行4次传递，但group_by方法要慢得多，因为为了做那个group_by，它做了很多工作。

打破代码有一个方法只通过以下组：

 def create_genome "gattaca" * 100 end def count_frequency_using_chars(sequence) 100000.times do sequence.chars.group_by{|x| x}.map{|letter, array| [letter, array.count]} end end def count_frequency_using_count(sequence) 100000.times do ["a", "c", "g", "t"].map{|letter| sequence.count(letter)} end end def just_group_by(sequence) 100000.times do sequence.chars.group_by{|x| x} end end sequence = create_genome ... ruby-1.9.2-p180 :068 > puts Time.now() 2011-06-17 11:17:36 -0400 ruby-1.9.2-p180 :069 > count_frequency_using_chars(sequence) => 100000 ruby-1.9.2-p180 :070 > puts Time.now() 2011-06-17 11:18:07 -0400 ruby-1.9.2-p180 :071 > count_frequency_using_count(sequence) => 100000 ruby-1.9.2-p180 :072 > puts Time.now() 2011-06-17 11:18:08 -0400 ruby-1.9.2-p180 :073 > just_group_by(sequence) => 100000 ruby-1.9.2-p180 :074 > puts Time.now() 2011-06-17 11:18:37 -0400

你可以看到

使用group_by花了31秒
使用计数需要1秒钟
刚做了group_by花了29秒

虽然使用group_by来获取所需的信息在概念上很好，但它正在做额外的工作而你不需要做。

更慢的方法是仅传递一次数组。

 hash = sequence.chars.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } => {"g"=>100, "a"=>300, "t"=>200, "c"=>100}

但实际上它并不快

这是一个完全复制和粘贴失败。

我还是留下了答案，因为它显示了你如何使用标准库中的Benchmark 。

 require 'pp' require 'benchmark' def count_frequency_using_inject_hash(sequence) 100000.times do sequence.chars.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h } end end sequence = create_genome Benchmark.bm do |benchmark| benchmark.report do pp count_frequency_using_chars(sequence) end benchmark.report do pp count_frequency_using_count(sequence) end benchmark.report do pp count_frequency_using_inject_hash(sequence) end end

 用户系统总真实
  41.500000 0.000000 41.500000（42.484375）
   1.312000 0.000000 1.312000（1.406250）
  49.265000 0.000000 49.265000（49.348633）

您可以通过分析VM本身来更好地了解正在发生的事情。

产生过多次的块是罪魁祸首。如果您对VM使用perftools’proferr，请使用https://github.com/tmm1/perftools.rb中 “分析Ruby VM和C扩展”中列出的说明（注意：这或多或少是vanilla perftools，而不是perftools.rb）

 Removing _init from all stack traces. Total: 3883 samples 1321 34.0% 34.0% 3883 100.0% rb_yield_0 273 7.0% 41.1% 274 7.1% _IO_str_pbackfail 191 4.9% 46.0% 191 4.9% __i686.get_pc_thunk.bx 171 4.4% 50.4% 171 4.4% _init 131 3.4% 53.7% 3880 99.9% rb_eval 122 3.1% 56.9% 347 8.9% st_lookup 105 2.7% 59.6% 423 10.9% new_dvar 93 2.4% 62.0% 326 8.4% rb_newobj 89 2.3% 64.3% 89 2.3% _setjmp 77 2.0% 66.3% 400 10.3% str_new 67 1.7% 68.0% 357 9.2% dvar_asgn_internal 63 1.6% 69.6% 204 5.3% malloc 62 1.6% 71.2% 3820 98.4% rb_str_each_char 58 1.5% 72.7% 187 4.8% rb_ary_store 55 1.4% 74.1% 55 1.4% rb_memcmp 55 1.4% 75.5% 3883 100.0% rb_yield # rest snipped for brevity

正如您所看到的， rb_yield_0占据了活动的三分之一以上，因此即使您可以优化其他所有内容，您仍然会比使用String#count rb_yield_0更慢。

您还可以通过执行基准测试来确认这一点，您只需要创建一个不执行任何操作的块。

 require 'pp' require 'benchmark' def create_genome "gattaca" * 100 end ZILLION = 100000 def mainly_execute_a_trivial_block(sequence) ZILLION.times do sequence.length.times() {} end end def execute_an_empty_loop_instead(sequence) ZILLION.times do i = 0 max = sequence.length until i == max i += 1 end end end sequence = create_genome puts RUBY_VERSION Benchmark.bm do |benchmark| benchmark.report do pp mainly_execute_a_trivial_block(sequence) end benchmark.report do pp execute_an_empty_loop_instead(sequence) end end

给

  user system total real 5.700000 0.000000 5.700000 ( 5.727715) # mainly_execute_a_trivial_block 1.920000 0.000000 1.920000 ( 1.942096) # execute_an_empty_loop_instead

为什么使用String＃count比使用Ruby中的String #chars更快地计数字母？

安德鲁给你的评论，

这是一个完全复制和粘贴失败。

多个线程调用相同的函数

使用Ruby脚本通过https登录网站

实际上，bluepill没有检测到进程已成功启动，因此创建了新进程

ruby强烈还是弱类型？

Ruby on Rails：控制台/ .rb文件中的行为之间的差异

安装debugger-linecache（1.1.1）时发生错误，Bundler无法继续

使用windows或linux开始使用Ruby On Rails？

从带有ruby数字的数组生成图形路径

Ruby On Rails Active Admin has_many更改下拉列表以使用不同的列

是否可以根据使用的标签（或不使用）在Cucumber步骤中执行不同的操作？