如何使用Ruby(和open-uri)并行处理数组中的项

我想知道如何使用open-uri打开多个并发连接? 我认为我需要使用线程或纤维,但我不确定。

示例代码:

def get_doc(url) begin Nokogiri::HTML(open(url).read) rescue Exception => ex puts "Failed at #{Time.now}" puts "Error: #{ex}" end end array_of_urls_to_process = [......] # How can I iterate over items in the array in parallel (instead of one at a time?) array_of_urls_to_process.each do |url| x = get_doc(url) do_something(x) end 

还有一个名为Parallel的gem,它类似于Peach,但正在积极更新。

我希望这会给你一个想法:

 def do_something(url, secs) sleep secs #just to see a difference puts "Done with: #{url}" end threads = [] urls_ary = ['url1', 'url2', 'url3'] urls_ary.each_with_index do |url, i| threads << Thread.new{ do_something(url, i+1) } puts "Out of loop #{i+1}" end threads.each{|t| t.join} 

也许为Array创建一个方法,如:

 class Array def thread_each(&block) inject([]){|threads,e| threads << Thread.new{yield(e)}}.each{|t| t.join} end end [1, 2, 3].thread_each do |i| sleep 4-i #so first one ends later puts "Done with #{i}" end 
 module MultithreadedEach def multithreaded_each each_with_object([]) do |item, threads| threads << Thread.new { yield item } end.each { |thread| thread.join } self end end 

用法:

 arr = [1,2,3] arr.extend(MultithreadedEach) arr.multithreaded_each do |n| puts n # Each block runs in it's own thread end 

一个使用线程的简单方法:

 threads = [] [1, 2, 3].each do |i| threads << Thread.new { puts i } end threads.each(&:join) 

有一个名为peach的gem( https://rubygems.org/gems/peach )可以让你这样做:

 require "peach" array_of_urls_to_process.peach do |url| do_something(get_doc(url)) end