Ruby Tempfile – 行匹配并删除重复行

继续下面链接中的stackoverflow问题:

ruby match string or space/tab at the beginning of a line and insert uniq lines to a file 

在此输入链接描述

我有这个文件 – ./files_tmp/60416.log

 AAAAAA555 AAAAAA555 BBBBBB CCCCC AAAAAA434343 AAAAAA434343 

./files_tmp/60417.log

 AAAAAA55544 AAAAAA55544 BBBBBB CCCCC AAAAAA434343 AAAAAA434343 

我有这个代码:

 files = Dir["./files_tmp/*.log"] files.each do |file_name| puts file_name if !File.directory? file_name Tempfile.open do |temp| File.open(file_name) do |input| input.each_line do |line| if line.match(/AAAAAA/) || (line.match(/^\t/) and tabs) puts "found a line #{line}" temp.write(line.lstrip!) end end end File.open("./temp.log", "a") do |file| temp.rewind file.write(temp.readlines.uniq.join("")) end end end 
  1. AAAAAA puts "found a line #{line}"的结果如下,但我预计它只会打印带有AAAAAA的行
 ./files_tmp/60416.log found a line AAAAAA555 found a line AAAAAA555 found a line BBBBBB found a line CCCCC found a line AAAAAA434343 found a line AAAAAA434343 ./files_tmp/60417.log found a line AAAAAA55544 found a line AAAAAA55544 found a line BBBBBB found a line CCCCC found a line AAAAAA434343 found a line AAAAAA434343 

  1. 我可以在临时文件./temp.log看到重复的行,而不是所有带有AAAAAA行的行
  AAAAAA434343 AAAAAA434343 

我期望:

 AAAAAA555 AAAAAA434343 AAAAAA55544 

我想知道为什么?

  1. 我使用file.write(temp.readlines.uniq.join(""))而不是file.write(temp.readlines.uniq)因为结果将是:

    ["AAAAAA434343\n"]

  2. 理解rewind目的会很棒,它有什么用?

谢谢你的帮助 !

你不需要乱用Tempfile 。 只需收集您想要的内容,然后将所有内容写入目标文件:

 result = Dir["./files_tmp/*.log"].each_with_object([]) do |file_name, lines| next if File.directory? file_name # skip dirs File.readlines(file_name) do |line| next unless line =~ /AAAAAA/ puts "found a line #{line}" lines |= [line.lstrip!] # append if and only it's uniq end end File.write("./temp.log", result.join($/)) # join with OS-aware line sep