Ruby无法解析CSV文件:CSV :: MalformedCSVError(第1行中的非法引用)

Ubuntu 12.04 LTS

Ruby ruby​​ 1.9.3dev(2011-09-23修订版33323)[i686-linux]

Rails 3.2.9

以下是我收到的CSV文件的内容:

"date/time","settlement id","type","order id","sku","description","quantity","marketplace","fulfillment","order city","order state","order postal","product sales","shipping credits","gift wrap credits","promotional rebates","sales tax collected","selling fees","fba fees","other transaction fees","other","total" "Mar 1, 2013 12:03:54 AM PST","5481545091","Order","108-0938567-7009852","ALS2GL36LED","Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor","1","amazon.com","Amazon","Pasadena","CA","91104-1056","43.00","3.25","0","-3.25","0","-6.45","-3.75","0","0","32.80" 

但是,当我尝试解析CSV文件时,我收到错误:

 1.9.3dev :016 > options = { col_sep: ",", quote_char:'"' } => {:col_sep=>",", :quote_char=>"\""} 1.9.3dev :022 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row } CSV::MalformedCSVError: Illegal quoting in line 1. from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach' from (irb):22 from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `' 

然后我尝试简化数据即

 "name","age","email" "jignesh","30","jignesh@example.com" 

但是我仍然得到同样的错误:

  1.9.3dev :023 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row } CSV::MalformedCSVError: Illegal quoting in line 1. from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open' from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach' from (irb):23 from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `' 

我再次尝试简化这样的数据:

 name,age,email jignesh,30,jignesh@example.com 

它的工作原理。见下面的输出:

  1.9.3dev :024 > CSV.foreach("/tmp/my_data.csv") { |row| puts row } name age email jignesh 30 jignesh@example.com => nil 

但我将收到带有引用数据的CSV文件,因此删除引号解决方案实际上并不是我正在寻找。我无法弄清楚导致错误的原因: CSV :: MalformedCSVError:在我之前的示例中第1行中的非法引用

我已经通过在文本编辑器中启用“显示空白字符”和“显示行结尾”来validationCSV中没有前导/尾随空格。我也使用以下方法validation了编码。

  1.9.3dev :026 > File.open("/tmp/my_data.csv").read.encoding => # 

注意:我也尝试使用CSV.read,但该方法的错误相同。

任何人都可以帮我解决问题并让我明白哪里出错了?

=====================

我刚刚发现以下post: http : //www.ruby-forum.com/topic/448070并尝试以下内容:

  file_data = file.read file_data.gsub!('"', "'") arr_of_arrs = CSV.parse(file_data) arr_of_arrs.each do |arr| Rails.logger.debug "=======#{arr}" end 

得到以下输出:

  =======["\xEF\xBB\xBF'date/time'", "'settlement id'", "'type'", "'order id'", "'sku'", "'description'", "'quantity'", "'marketplace'", "'fulfillment'", "'order city'", "'order state'", "'order postal'", "'product sales'", "'shipping credits'", "'gift wrap credits'", "'promotional rebates'", "'sales tax collected'", "'selling fees'", "'fba fees'", "'other transaction fees'", "'other'", "'total'"] =======["'Mar 1", " 2013 12:03:54 AM PST'", "'5481545091'", "'Order'", "'108-0938567-7009852'", "'ALS2GL36LED'", "'Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor'", "'1'", "'amazon.com'", "'Amazon'", "'Pasadena'", "'CA'", "'91104-1056'", "'43.00'", "'3.25'", "'0'", "'-3.25'", "'0'", "'-6.45'", "'-3.75'", "'0'", "'0'", "'32.80'"] 

因为使用的默认col_sep是逗号字符,所以搞砸了正确读取数据。 但是我尝试使用这样的quote_char选项:

  arr_of_arrs = CSV.parse(file_data, :quote_char => "'") 

但它最终出现以下错误:

  CSV::MalformedCSVError (Illegal quoting in line 1.): 

谢谢,Jignesh

 quote_chars = %w(" | ~ ^ & *) begin @report = CSV.read(csv_file, headers: :first_row, quote_char: quote_chars.shift) rescue CSV::MalformedCSVError quote_chars.empty? ? raise : retry end 

它并不完美,但大部分时间都有效。

NB CSV.parse采用与CSV.parse相同的参数,因此可以使用文件或内存中的数据

Anand,谢谢你的编码建议。 这解决了我的非法引用问题。

注意:如果您希望迭代器跳过标题行添加headers: :first_row ,如下所示:

 CSV.foreach("test.csv", encoding: "bom|utf-8", headers: :first_row) 

我刚遇到这样的问题,发现CSV不喜欢col-sep和引号字符之间的空格。 一旦我删除那些一切都很顺利。 所以我有:

 12, "N", 12, "Pacific/Majuro" 

但是一旦我使用了空间

 .gsub(/,\s+\"/,',\"') 

导致

 12,"N", 12,"Pacific/Majuro" 

一切都很顺利。

我的商标角色出现了问题,引发了这个错误。

商标字符转换为\“!在UTF-8中,所以它是开放式引用符号抛出错误。所以我这样做:

.gsub!("\"!", "")

然后我尝试创建我的CSV对象,它工作正常。

试试这个提示:

  1. 在文本编辑器中打开CSV文件
  2. 选择整个文件并进行复制
  3. 打开一个新的文本文件
  4. 将CSV数据粘贴到新文件中并保存新文件
  5. 导入新的CSV文件