如何强制Ruby的CSV输出中的一个字段用双引号括起来?

我正在使用Ruby的内置CSV生成一些CSV输出。 一切正常,但客户希望输出中的name字段包含双引号,因此输出看起来像输入文件。 例如,输入看起来像这样:

1,1.1.1.1,"Firstname Lastname",more,fields 2,2.2.2.2,"Firstname Lastname, Jr.",more,fields 

CSV的输出是正确的,如下所示:

 1,1.1.1.1,Firstname Lastname,more,fields 2,2.2.2.2,"Firstname Lastname, Jr.",more,fields 

我知道CSV正在做正确的事情,因为它没有引用第三个字段只是因为它嵌入了空白,并且当它有嵌入的逗号时用双引号包装字段。 我想做的是,帮助客户感到温暖和模糊,告诉CSV总是双引号第三个字段。

我尝试在我的to_a方法to_a双引号包装字段,这会创建一个传递给CSV的"Firstname Lastname"字段,但是CSV嘲笑我的小人类尝试并输出"""Firstname Lastname""" 。 这是正确的做法,因为它正在逃避双引号,所以这不起作用。

然后我尝试在open方法中设置CSV’s :force_quotes => true ,输出双引号按预期包装所有字段,但客户不喜欢,我也是这样。 所以,这也没有用。

我查看了Table和Row文档,似乎没有任何东西可以让我访问“生成字符串字段”方法,或者设置“for field n always use quoting”标记的方法。

我即将潜入消息来源,看看是否有一些超级秘密的调整,或者是否有一种方法来修补CSV并弯曲它以实现我的意愿,但是想知道是否有人有一些特殊的知识或者遇到过这个问题之前。

而且,是的,我知道我可以滚动自己的CSV输出,但我更喜欢不重新发明经过良好测试的轮子。 而且,我也知道FasterCSV; 这是我正在使用的Ruby 1.9.2的一部分,因此明确使用FasterCSV并没有什么特别之处。 另外,我没有使用Rails并且无意在Rails中重写它,所以除非你有一个可爱的方法使用一小部分Rails实现它,所以不要打扰。 我会低估任何使用这些方法的建议,因为你没有费心去读这篇文章。

好吧,有一种方法可以做到,但它并不像我希望CSV代码允许的那样干净。

我必须forced_quote_fields= CSV,然后重写CSV::Row.<<=方法并添加另一个方法forced_quote_fields=以便可以定义我想强制引用的字段,再从其他方法中拉出两个lambdas。 至少它适用于我想要的东西:

 require 'csv' class MyCSV < CSV def <<(row) # make sure headers have been assigned if header_row? and [Array, String].include? @use_headers.class parse_headers # won't read data for Array or String self << @headers if @write_headers end # handle CSV::Row objects and Hashes row = case row when self.class::Row then row.fields when Hash then @headers.map { |header| row[header] } else row end @headers = row if header_row? @lineno += 1 @do_quote ||= lambda do |field| field = String(field) encoded_quote = @quote_char.encode(field.encoding) encoded_quote + field.gsub(encoded_quote, encoded_quote * 2) + encoded_quote end @quotable_chars ||= encode_str("\r\n", @col_sep, @quote_char) @forced_quote_fields ||= [] @my_quote_lambda ||= lambda do |field, index| if field.nil? # represent +nil+ fields as empty unquoted fields "" else field = String(field) # Stringify fields # represent empty fields as empty quoted fields if ( field.empty? or field.count(@quotable_chars).nonzero? or @forced_quote_fields.include?(index) ) @do_quote.call(field) else field # unquoted field end end end output = row.map.with_index(&@my_quote_lambda).join(@col_sep) + @row_sep # quote and separate if ( @io.is_a?(StringIO) and output.encoding != raw_encoding and (compatible_encoding = Encoding.compatible?(@io.string, output)) ) @io = StringIO.new(@io.string.force_encoding(compatible_encoding)) @io.seek(0, IO::SEEK_END) end @io << output self # for chaining end alias_method :add_row, :<< alias_method :puts, :<< def forced_quote_fields=(indexes=[]) @forced_quote_fields = indexes end end 

那是代码。 打电话给:

 data = [ %w[1 2 3], [ 2, 'two too', 3 ], [ 3, 'two, too', 3 ] ] quote_fields = [1] puts "Ruby version: #{ RUBY_VERSION }" puts "Quoting fields: #{ quote_fields.join(', ') }", "\n" csv = MyCSV.generate do |_csv| _csv.forced_quote_fields = quote_fields data.each do |d| _csv << d end end puts csv 

结果是:

 # >> Ruby version: 1.9.2 # >> Quoting fields: 1 # >> # >> 1,"2",3 # >> 2,"two too",3 # >> 3,"two, too",3 

这篇文章很老,但我不敢相信没有人想到这一点。

为什么不这样做:

 csv = CSV.generate :quote_char => "\0" do |csv| 

其中\ 0是一个空字符,然后只需将引号添加到需要它们的每个字段:

 csv << [product.upc, "\"" + product.name + "\"" # ... 

然后在最后你可以做一个

 csv.gsub!(/\0/, '') 

我怀疑这是否能帮助顾客在这段时间后感到温暖和模糊,但这似乎有效:

 require 'csv' #prepare a lambda which converts field with index 2 quote_col2 = lambda do |field, fieldinfo| # fieldinfo has a line- ,header- and index-method if fieldinfo.index == 2 && !field.start_with?('"') then '"' + field + '"' else field end end # specify above lambda as one of the converters csv = CSV.read("test1.csv", :converters => [quote_col2]) p csv # => [["aaa", "bbb", "\"ccc\"", "ddd"], ["fff", "ggg", "\"hhh\"", "iii"]] File.open("test1.txt","w"){|out| csv.each{|line|out.puts line.join(",")}} 

现有的CSV实现缺少猴子修补/重写它看起来没有办法做到这一点。

但是,假设您可以完全控制源数据,则可以执行以下操作:

  1. 包含逗号的自定义字符串(即数据中永远不会自然找到的字符串)附加到每行的相关字段的末尾; 也许像“ FORCE_COMMAS ”这样的东西。
  2. 生成CSV输出。
  3. 现在您的字段的每一行都有CSV输出,并删除自定义字符串: csv.gsub!(/FORCE_COMMAS,/, "")
  4. 顾客感到温暖和模糊。

CSV有一个force_quotes选项,它会强制它引用所有字段(当你最初发布它时它可能不存在)。 我意识到这不完全是你提出的建议,但它不是猴子修补。

 2.1.0 :008 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields'] 1,1.1.1.1,Firstname Lastname,more,fields 2.1.0 :009 > puts CSV.generate_line [1,'1.1.1.1','Firstname Lastname','more','fields'], force_quotes: true "1","1.1.1.1","Firstname Lastname","more","fields" 

缺点是第一个整数值最终列为字符串,这会在导入Excel时发生变化。

如@jwadsa​​ck所述,CSV在Ruby 2.1中有所改变,但是这是@ the-tin-man的MyCSV的工作版本。 位修改后,您可以通过选项设置forced_quote_fields。

 MyCSV.generate(forced_quote_fields: [1]) do |_csv|... 

修改后的代码

 require 'csv' class MyCSV < CSV def <<(row) # make sure headers have been assigned if header_row? and [Array, String].include? @use_headers.class parse_headers # won't read data for Array or String self << @headers if @write_headers end # handle CSV::Row objects and Hashes row = case row when self.class::Row then row.fields when Hash then @headers.map { |header| row[header] } else row end @headers = row if header_row? @lineno += 1 output = row.map.with_index(&@quote).join(@col_sep) + @row_sep # quote and separate if @io.is_a?(StringIO) and output.encoding != (encoding = raw_encoding) if @force_encoding output = output.encode(encoding) elsif (compatible_encoding = Encoding.compatible?(@io.string, output)) @io.set_encoding(compatible_encoding) @io.seek(0, IO::SEEK_END) end end @io << output self # for chaining end def init_separators(options) # store the selected separators @col_sep = options.delete(:col_sep).to_s.encode(@encoding) @row_sep = options.delete(:row_sep) # encode after resolving :auto @quote_char = options.delete(:quote_char).to_s.encode(@encoding) @forced_quote_fields = options.delete(:forced_quote_fields) || [] if @quote_char.length != 1 raise ArgumentError, ":quote_char has to be a single character String" end # # automatically discover row separator when requested # (not fully encoding safe) # if @row_sep == :auto if [ARGF, STDIN, STDOUT, STDERR].include?(@io) or (defined?(Zlib) and @io.class == Zlib::GzipWriter) @row_sep = $INPUT_RECORD_SEPARATOR else begin # # remember where we were (pos() will raise an exception if @io is pipe # or not opened for reading) # saved_pos = @io.pos while @row_sep == :auto # # if we run out of data, it's probably a single line # (ensure will set default value) # break unless sample = @io.gets(nil, 1024) # extend sample if we're unsure of the line ending if sample.end_with? encode_str("\r") sample << (@io.gets(nil, 1) || "") end # try to find a standard separator if sample =~ encode_re("\r\n?|\n") @row_sep = $& break end end # tricky seek() clone to work around GzipReader's lack of seek() @io.rewind # reset back to the remembered position while saved_pos > 1024 # avoid loading a lot of data into memory @io.read(1024) saved_pos -= 1024 end @io.read(saved_pos) if saved_pos.nonzero? rescue IOError # not opened for reading # do nothing: ensure will set default rescue NoMethodError # Zlib::GzipWriter doesn't have some IO methods # do nothing: ensure will set default rescue SystemCallError # pipe # do nothing: ensure will set default ensure # # set default if we failed to detect # (stream not opened for reading, a pipe, or a single line of data) # @row_sep = $INPUT_RECORD_SEPARATOR if @row_sep == :auto end end end @row_sep = @row_sep.to_s.encode(@encoding) # establish quoting rules @force_quotes = options.delete(:force_quotes) do_quote = lambda do |field| field = String(field) encoded_quote = @quote_char.encode(field.encoding) encoded_quote + field.gsub(encoded_quote, encoded_quote * 2) + encoded_quote end quotable_chars = encode_str("\r\n", @col_sep, @quote_char) @quote = if @force_quotes do_quote else lambda do |field, index| if field.nil? # represent +nil+ fields as empty unquoted fields "" else field = String(field) # Stringify fields # represent empty fields as empty quoted fields if field.empty? or field.count(quotable_chars).nonzero? or @forced_quote_fields.include?(index) do_quote.call(field) else field # unquoted field end end end end end end