File.readlines UTF-8中的无效字节序列（ArgumentError）

我正在处理一个文件，其中包含来自Web的数据，并在某些日志文件中遇到UTF-8（ArgumentError）错误中的无效字节序列 。

a = File.readlines('log.csv').grep(/watch\?v=/).map do |s| s = s.parse_csv; { timestamp: s[0], url: s[1], ip: s[3] } end puts a

我想让这个解决方案正常运行。我见过有人在做

.encode!('UTF-8', 'UTF-8', :invalid => :replace)

但它似乎File.readlines用于File.readlines 。

File.readlines('log.csv').encode!('UTF-8', 'UTF-8', :invalid => :replace).grep(/watch\?v=/)

‘：undefined方法`编码！’ for＃（NoMethodError）

什么是在文件读取过程中过滤/转换无效UTF-8字符最直接的方法？

~~尝试1~~

试过这个，但它失败了同样的无效字节序列错误。

 IO.foreach('test.csv', 'r:bom|UTF-8').grep(/watch\?v=/).map do |s| # extract three columns: time stamp, url, ip s = s.parse_csv; { timestamp: s[0], url: s[1], ip: s[3] } end

解

这似乎对我有用。

 a = File.readlines('log.csv', :encoding => 'ISO-8859-1').grep(/watch\?v=/).map do |s| s = s.parse_csv; { timestamp: s[0], url: s[1], ip: s[3] } end puts a

Ruby是否提供了使用指定编码执行File.read（）的方法？

我想让这个解决方案正常运行。我见过有人在做
  .encode!('UTF-8', 'UTF-8', :invalid => :replace) 
但它似乎不适用于File.readlines。

File.readlines返回一个数组。数组没有编码方法。另一方面，字符串确实有编码方法。

你能为上面的替代方案提供一个例子吗？

 require 'csv' CSV.foreach("log.csv", encoding: "utf-8") do |row| md = row[0].match /watch\?v=/ puts row[0], row[1], row[3] if md end

要么，

 CSV.foreach("log.csv", 'rb:utf-8') do |row|

如果您需要更快的速度，请使用fastercsv gem。

这似乎对我有用。

 File.readlines('log.csv', :encoding => 'ISO-8859-1')

是的，为了读取文件，您必须知道其编码。

在我的情况下，脚本默认为US-ASCII，我无权在服务器上更改它以防止其他冲突。

我做到了

 File.readlines(email, :encoding => 'UTF-8').each do |line|

但这对一些日文字符不起作用，所以我在下一行添加了这个，并且工作正常。

 line = line.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')

File.readlines UTF-8中的无效字节序列（ArgumentError）

如何在1个表格中包含2个模型？

hpricot与firebug的XPath

改变版本的tap – 或管道表达式的方法？

Ruby – Rails – 将文本传递给javascript

正则表达式匹配日期

迭代并设置Ruby对象实例变量

Ruby on Rails 4 – 使用什么身份validationgem？

动态分配背景图像scss / sass

无法在Windows上运行捆绑软件更新

如何改进用“`引用所有数组元素的代码并返回一个包含所有引用和逗号分隔元素的字符串？