将Ruby中的字符串切换为固定长度的字符串,忽略(不考虑/不考虑)新的行或空格字符

我有一个包含许多新行和空格的字符串。 我需要将其拆分为固定长度的子字符串。 例如

a = "This is some\nText\nThis is some text" 

现在我想把它分成长度为17的字符串。所以现在它应该导致

 ["This is some\nText", "\nThis is some tex", "t"] 

评论:我的字符串可能包含任何字符(空格/单词等)

 "This is some\nText\nThis is some text".scan(/.{1,17}/m) # => ["This is some\nText", "\nThis is some tex", "t"] 

另一种方式:

 (0..(a.length / 17)).map{|i| a[i * 17,17] } #=> ["This is some\nText", "\nThis is some tex", "t"] 

更新

和基准测试:

 require 'benchmark' a = "This is some\nText\nThis is some text" * 1000 n = 100 Benchmark.bm do |x| x.report("slice") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end} x.report("regex") { n.times do ; a.scan(/.{1,17}/m) ; end} x.report("eachc") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end } end 

结果:

  user system total real slice 0.090000 0.000000 0.090000 ( 0.091065) regex 0.230000 0.000000 0.230000 ( 0.233831) eachc 1.420000 0.010000 1.430000 ( 1.442033) 

具有可枚举的解决方案:使用each_slice将数组拆分为单个char,然后使用each_slice执行分区,并join结果:

 "This is some\nText\nThis is some text" .each_char # => ["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", "\n", T", "e", "x", "t", "\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", " ", t", "e", "x", "t"] .each_slice(17) # => [["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", \n", "T", "e", "x", "t"], ["\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", e", ", "t", "e", "x"], ["t"]] .map(&:join) # => ["This is some\nText", "\nThis is some tex", "t"] 

另一种解决方案:解压缩。

你需要为它构造一个字符串,如a17a17a17a17a8 (如果字符串不是x乘以17个字符长,则最后一个字符串需要更短。

 a = "This is some\nText\nThis is some text\nThis is some more text" a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}")) => ["This is some\nText", "\nThis is some tex", "t\nThis is some mo", "re text"] 

这似乎是目前建议中最快的一个,当然如果输入字符串很大,unpack字符串也会很大。 如果是这种情况,你需要一个缓冲读取器,用x * 17的块读取它,并为每个块执行类似上面的操作。

 require 'benchmark' a = "This is some\nText\nThis is some text" * 1000 n = 100 Benchmark.bm do |x| x.report("slice ") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end} x.report("regex ") { n.times do ; a.scan(/.{1,17}/m) ; end} x.report("eachc ") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end } x.report("unpack") { n.times do ; a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}")) ; end } end 

结果:

 user system total real slice 0.120000 0.000000 0.120000 ( 0.130709) regex 0.190000 0.000000 0.190000 ( 0.186407) eachc 1.430000 0.000000 1.430000 ( 1.427662) unpack 0.030000 0.000000 0.030000 ( 0.032807)