加快csv导入

我想导入大量的cvs数据（不是直接导入AR，但经过一些提取后），而且我的代码非常慢。

def csv_import require 'csv' file = File.open("/#{Rails.public_path}/uploads/shate.csv") csv = CSV.open(file, "r:ISO-8859-15:UTF-8", {:col_sep => ";", :row_sep => :auto, :headers => :first_row}) csv.each do |row| #ename,esupp= row[1].split(/_/) #(ename,esupp,foo) = row[1]..split('_') abrakadabra = row[0].to_s() (ename,esupp) = abrakadabra.split(/_/) eprice = row[6] eqnt = row[1] # logger.info("1) ") # logger.info(ename) # logger.info("---") # logger.info(esupp) #---- #ename = row[4] #eprice = row[7] #eqnt = row[10] #esupp = row[12] if ename.present? && ename.size>3 search_condition = "*" + ename.upcase + "*" if esupp.present? #supplier = @suppliers.find{|item| item['SUP_BRAND'] =~ Regexp.new(".*#{esupp}.*") } supplier = Supplier.where("SUP_BRAND like ?", "%#{esupp}%").first logger.warn("!!! *** supp !!!") #logger.warn(supplier) end if supplier.present? @search = ArtLookup.find(:all, :conditions => ['MATCH (ARL_SEARCH_NUMBER) AGAINST(? IN BOOLEAN MODE)', search_condition.gsub(/[^0-9A-Za-z]/, '')]) @articles = Article.find(:all, :conditions => { :ART_ID => @search.map(&:ARL_ART_ID)}) @art_concret = @articles.find_all{|item| item.ART_ARTICLE_NR.gsub(/[^0-9A-Za-z]/, '').include?(ename.gsub(/[^0-9A-Za-z]/, '')) } @aa = @art_concret.find{|item| item['ART_SUP_ID']==supplier.SUP_ID} #| @articles if @aa.present? @art = Article.find_by_ART_ID(@aa) end if @art.present? @art.PRICEM = eprice @art.QUANTITYM = eqnt @art.datetime_of_update = DateTime.now @art.save end end logger.warn("------------------------------") end #logger.warn(esupp) end end

即使我删除并只获得它，它也很慢。

 def csv_import require 'csv' file = File.open("/#{Rails.public_path}/uploads/shate.csv") csv = CSV.open(file, "r:ISO-8859-15:UTF-8", {:col_sep => ";", :row_sep => :auto, :headers => :first_row}) csv.each do |row| end end

有人能帮助我使用fastercsv提高速度吗？

我认为它不会变得更快。

也就是说，一些测试表明，大部分时间用于转码（我的测试用例约为15％）。因此，如果您可以跳过它（例如，通过以UTF-8创建CSV），您会看到一些改进。

此外，根据ruby-doc.org ，用于读取CSV的“主要”界面是foreach ，因此这应该是首选：

 def csv_import import 'csv' CSV.foreach("/#{Rails.public_path}/uploads/shate.csv", {:encoding => 'ISO-8859-15:UTF-8', :col_sep => ';', :row_sep => :auto, :headers => :first_row}) do | row | # use row here... end end

更新

您还可以尝试将解析拆分为多个线程。我用这个代码试验了一些性能上升（处理标题遗漏）：

 N = 10000 def csv_import all_lines = File.read("/#{Rails.public_path}/uploads/shate.csv").lines # parts will contain the parsed CSV data of the different chunks/slices # threads will contain the threads parts, threads = [], [] # iterate over chunks/slices of N lines of the CSV file all_lines.each_slice(N) do | plines | # add an array object for the current chunk to parts parts << result = [] # create a thread for parsing the current chunk, hand it over the chunk # and the current parts sub-array threads << Thread.new(plines.join, result) do | tsrc, tresult | # parse the chunk parsed = CSV.parse(tsrc, {:encoding => 'ISO-8859-15:UTF-8', :col_sep => ";", :row_sep => :auto}) # add the parsed data to the parts sub-array tresult.replace(parsed.to_a) end end # wait for all threads to finish threads.each(&:join) # merge all the parts sub-arrays into one big array and iterate over it parts.flatten(1).each do | row | # use row (Array) end end

这将输入拆分为10000行的块，并为每个块创建解析线程。每个线程都被移交给数组parts的子数组以存储其结果。当所有线程都完成后（在threads.each(&:join) ）， parts中所有块的结果都是联合的，就是这样。

正如它的名字暗示更快的CSV更快:)

http://fastercsv.rubyforge.org

也看到了。了解更多信息

Ruby on Rails从CSV移动到FasterCSV

我很好奇文件的大小，以及它有多少列。

使用CSV.foreach是首选方式。在应用程序运行时查看内存配置文件会很有趣。（有时缓慢是由于打印造成的，所以请确保你不要做比你需要的更多的事情）

您可能能够对其进行预处理，并排除没有esupp的任何行，因为您的代码看起来只关心这些行。此外，您可以截断您不关心的任何右侧列。

另一种技术是收集独特的组件并将它们放入哈希中。好像你多次触发同一个查询。

您只需要对其进行分析并查看它花费时间的位置。

看看Gem smarter_csv！它可以以块的forms读取CSV文件，然后您可以创建Resque作业以进一步处理并将这些块插入数据库。

https://github.com/tilo/smarter_csv

加快csv导入

阅读时Ruby CSV UTF8编码错误

使用CSV类解析Ruby中的.csv文件

如何使用Ruby删除文本文件中间的数据行

Rails – CSV（导出到CSV）循环

RUBY CSV计算回报

从SQL Server 2008解析CSV的语义正确方法是什么？

使用Ruby CSV在第1行中非法引用

如何将我的Heroku控制台中的Ruby数组导出为CSV？

将上传的CSV文件中的行与rails中的用户相关联

Ruby 1.9.2导出CSV字符串而不生成文件