Ruby Tempfile – 行匹配并删除重复行
继续下面链接中的stackoverflow问题:
ruby match string or space/tab at the beginning of a line and insert uniq lines to a file
在此输入链接描述
我有这个文件 – ./files_tmp/60416.log
:
AAAAAA555 AAAAAA555 BBBBBB CCCCC AAAAAA434343 AAAAAA434343
./files_tmp/60417.log
AAAAAA55544 AAAAAA55544 BBBBBB CCCCC AAAAAA434343 AAAAAA434343
我有这个代码:
files = Dir["./files_tmp/*.log"] files.each do |file_name| puts file_name if !File.directory? file_name Tempfile.open do |temp| File.open(file_name) do |input| input.each_line do |line| if line.match(/AAAAAA/) || (line.match(/^\t/) and tabs) puts "found a line #{line}" temp.write(line.lstrip!) end end end File.open("./temp.log", "a") do |file| temp.rewind file.write(temp.readlines.uniq.join("")) end end end
-
AAAAAA
puts "found a line #{line}"
的结果如下,但我预计它只会打印带有AAAAAA
的行
./files_tmp/60416.log found a line AAAAAA555 found a line AAAAAA555 found a line BBBBBB found a line CCCCC found a line AAAAAA434343 found a line AAAAAA434343 ./files_tmp/60417.log found a line AAAAAA55544 found a line AAAAAA55544 found a line BBBBBB found a line CCCCC found a line AAAAAA434343 found a line AAAAAA434343
- 我可以在临时文件
./temp.log
看到重复的行,而不是所有带有AAAAAA
行的行
AAAAAA434343 AAAAAA434343
我期望:
AAAAAA555 AAAAAA434343 AAAAAA55544
我想知道为什么?
-
我使用
file.write(temp.readlines.uniq.join(""))
而不是file.write(temp.readlines.uniq)
因为结果将是:["AAAAAA434343\n"]
-
理解
rewind
目的会很棒,它有什么用?
谢谢你的帮助 !
你不需要乱用Tempfile
。 只需收集您想要的内容,然后将所有内容写入目标文件:
result = Dir["./files_tmp/*.log"].each_with_object([]) do |file_name, lines| next if File.directory? file_name # skip dirs File.readlines(file_name) do |line| next unless line =~ /AAAAAA/ puts "found a line #{line}" lines |= [line.lstrip!] # append if and only it's uniq end end File.write("./temp.log", result.join($/)) # join with OS-aware line sep