如何压缩哈希,使每个键成为唯一值?
我想采用嵌套哈希和数组的哈希,并将其展平为具有唯一值的单个哈希。 我一直试图从不同的角度来解决这个问题,但随后我会让它变得比它需要的更复杂,让自己迷失在正在发生的事情中。
示例源哈希:
{ "Name" => "Kim Kones", "License Number" => "54321", "Details" => { "Name" => "Kones, Kim", "Licenses" => [ { "License Type" => "PT", "License Number" => "54321" }, { "License Type" => "Temp", "License Number" => "T123" }, { "License Type" => "AP", "License Number" => "A666", "Expiration Date" => "12/31/2020" } ] } }
示例所需哈希:
{ "Name" => "Kim Kones", "License Number" => "54321", "Details_Name" => "Kones, Kim", "Details_Licenses_1_License Type" => "PT", "Details_Licenses_1_License Number" => "54321", "Details_Licenses_2_License Type" => "Temp", "Details_Licenses_2_License Number" => "T123", "Details_Licenses_3_License Type" => "AP", "Details_Licenses_3_License Number" => "A666", "Details_Licenses_3_Expiration Date" => "12/31/2020" }
对于它的价值,这是我最近的尝试,然后才放弃。
def flattify(hashy) temp = {} hashy.each do |key, val| if val.is_a? String temp["#{key}"] = val elsif val.is_a? Hash temp.merge(rename val, key, "") elsif val.is_a? Array temp["#{key}"] = enumerate val, key else end print "=> #{temp}\n" end return temp end def rename (hashy, str, n) temp = {} hashy.each do |key, val| if val.is_a? String temp["#{key}#{n}"] = val elsif val.is_a? Hash val.each do |k, v| temp["#{key}_#{k}#{n}"] = v end elsif val.is_a? Array temp["#{key}"] = enumerate val, key else end end return flattify temp end def enumerate (ary, str) temp = {} i = 1 ary.each do |x| temp["#{str}#{i}"] = x i += 1 end return flattify temp end
有趣的问题!
理论
这是一个解析数据的递归方法。
- 它会跟踪它找到的键和索引。
- 它将它们附加到
tmp
数组中。 - 一旦找到了叶子对象,它就会以哈希值的forms写入,并将连接的
tmp
作为键。 - 然后,这个小哈希以递归方式合并回主哈希。
码
def recursive_parsing(object, tmp = []) case object when Array object.each.with_index(1).with_object({}) do |(element, i), result| result.merge! recursive_parsing(element, tmp + [i]) end when Hash object.each_with_object({}) do |(key, value), result| result.merge! recursive_parsing(value, tmp + [key]) end else { tmp.join('_') => object } end end
举个例子:
require 'pp' pp recursive_parsing(data) # {"Name"=>"Kim Kones", # "License Number"=>"54321", # "Details_Name"=>"Kones, Kim", # "Details_Licenses_1_License Type"=>"PT", # "Details_Licenses_1_License Number"=>"54321", # "Details_Licenses_2_License Type"=>"Temp", # "Details_Licenses_2_License Number"=>"T123", # "Details_Licenses_3_License Type"=>"AP", # "Details_Licenses_3_License Number"=>"A666", # "Details_Licenses_3_Expiration Date"=>"12/31/2020"}
调试
这是一个带有旧式调试的修改版本。 它可能会帮助您了解正在发生的事情:
def recursive_parsing(object, tmp = [], indent="") puts "#{indent}Parsing #{object.inspect}, with tmp=#{tmp.inspect}" result = case object when Array puts "#{indent} It's an array! Let's parse every element:" object.each_with_object({}).with_index(1) do |(element, result), i| result.merge! recursive_parsing(element, tmp + [i], indent + " ") end when Hash puts "#{indent} It's a hash! Let's parse every key,value pair:" object.each_with_object({}) do |(key, value), result| result.merge! recursive_parsing(value, tmp + [key], indent + " ") end else puts "#{indent} It's a leaf! Let's return a hash" { tmp.join('_') => object } end puts "#{indent} Returning #{result.inspect}\n" result end
当使用recursive_parsing([{a: 'foo', b: 'bar'}, {c: 'baz'}])
调用时,它会显示:
Parsing [{:a=>"foo", :b=>"bar"}, {:c=>"baz"}], with tmp=[] It's an array! Let's parse every element: Parsing {:a=>"foo", :b=>"bar"}, with tmp=[1] It's a hash! Let's parse every key,value pair: Parsing "foo", with tmp=[1, :a] It's a leaf! Let's return a hash Returning {"1_a"=>"foo"} Parsing "bar", with tmp=[1, :b] It's a leaf! Let's return a hash Returning {"1_b"=>"bar"} Returning {"1_a"=>"foo", "1_b"=>"bar"} Parsing {:c=>"baz"}, with tmp=[2] It's a hash! Let's parse every key,value pair: Parsing "baz", with tmp=[2, :c] It's a leaf! Let's return a hash Returning {"2_c"=>"baz"} Returning {"2_c"=>"baz"} Returning {"1_a"=>"foo", "1_b"=>"bar", "2_c"=>"baz"}
与其他人不同,我不喜欢each_with_object
:-)。 但我确实喜欢传递一个结果哈希,所以我不必一次又一次地合并和重新散列哈希。
def flattify(value, result = {}, path = []) case value when Array value.each.with_index(1) do |v, i| flattify(v, result, path + [i]) end when Hash value.each do |k, v| flattify(v, result, path + [k]) end else result[path.join("_")] = value end result end
(Eric收集的一些细节,见评论)
非递归方法,使用带有数组作为队列的BFS。 我保留键值对,其中值不是数组/散列,并将数组/散列内容推送到队列(使用组合键)。 将数组转换为哈希值( ["a", "b"]
↦ {1=>"a", 2=>"b"}
),因为它感觉很整洁。
def flattify(hash) (q = hash.to_a).select { |key, value| value = (1..value.size).zip(value).to_h if value.is_a? Array !value.is_a?(Hash) || !value.each { |k, v| q << ["#{key}_#{k}", v] } }.to_h end
我喜欢它的一件事是将键组合为"#{key}_#{k}"
。 在我的另一个解决方案中,我也可以使用字符串path = ''
并使用path + "_" + k
扩展,但这会导致我必须避免或使用额外代码修剪的前导下划线。