如何用ruby以有效的方式获得单词频率?

样本输入:

"I was 09809 home -- Yes! yes! You was" 

并输出:

 { 'yes' => 2, 'was' => 2, 'i' => 1, 'home' => 1, 'you' => 1 } 

我的代码不起作用:

 def get_words_f(myStr) myStr=myStr.downcase.scan(/\w/).to_s; h = Hash.new(0) myStr.split.each do |w| h[w] += 1 end return h.to_a; end print get_words_f('I was 09809 home -- Yes! yes! You was'); 

这有效,但我也是Ruby的新手。 可能有更好的解决方案。

 def count_words(string) words = string.split(' ') frequency = Hash.new(0) words.each { |word| frequency[word.downcase] += 1 } return frequency end 

而不是.split(' ') ,你也可以做.scan(/\w+/) ; 但是, .scan(/\w+/)会将"aren't" arent分开,而.split(' ')则不会。

输出示例代码:

 print count_words('I was 09809 home -- Yes! yes! You was'); #{"i"=>1, "was"=>2, "09809"=>1, "home"=>1, "yes"=>2, "you"=>1} 
 def count_words(string) string.scan(/\w+/).reduce(Hash.new(0)){|res,w| res[w.downcase]+=1;res} end 

第二种变体:

 def count_words(string) string.scan(/\w+/).each_with_object(Hash.new(0)){|w,h| h[w.downcase]+=1} end 
 def count_words(string) Hash[ string.scan(/[a-zA-Z]+/) .group_by{|word| word.downcase} .map{|word, words|[word, words.size]} ] end puts count_words 'I was 09809 home -- Yes! yes! You was' 

此代码将询问您输入,然后为您找到单词频率:

  puts "enter some text man" text = gets.chomp words = text.split(" ") frequencies = Hash.new(0) words.each { |word| frequencies[word.downcase] += 1 } frequencies = frequencies.sort_by {|a, b| b} frequencies.reverse! frequencies.each do |word, frequency| puts word + " " + frequency.to_s end 

这有效,并忽略了数字:

 def get_words(my_str) my_str = my_str.scan(/\w+/) h = Hash.new(0) my_str.each do |s| s = s.downcase if s !~ /^[0-9]*\.?[0-9]+$/ h[s] += 1 end end return h end print get_words('I was there 1000 !') puts '\n' 

您可以查看将文本拆分为单词的代码 。 基本代码如下:

 sentence = "Ala ma kota za 5zł i 10$." splitter = SRX::Polish::WordSplitter.new(sentence) histogram = Hash.new(0) splitter.each do |word,type| histogram[word.downcase] += 1 if type == :word end p histogram 

如果您希望使用英语以外的其他语言,您应该小心,因为在Ruby 1.9中,小写不会像您对’Ł’这样的字母一样有效。

 class String def frequency self.scan(/[a-zA-Z]+/).each.with_object(Hash.new(0)) do |word, hash| hash[word.downcase] += 1 end end end 

把“我是09809回家 – 是的!是的!你是”。频率