Word解析器脚本和实现memoization

描述

给定一个字典，我的程序生成两个输出文件，’sequences.txt’和’words.txt’。

‘sequences’包含四个字母（Az）的每个序列，它们恰好出现在字典的一个单词中，每行一个序列。
‘words’将包含包含序列的相应单词，顺序相同，每行一次。

例如，给定spec/fixtures/sample_words.txt字典仅包含

 arrows carrots give me

产出应该是：

 'sequences' 'words' carr carrots give give rots carrots rows arrows rrot carrots rrow arrows

当然，’arro’不会出现在输出中，因为它出现在多个单词中。

到目前为止我想出了什么

项目结构：

 ├── Gemfile ├── Gemfile.lock ├── examples │  └── dictionary.txt ├── lib │  └── word_sequence_parser.rb ├── main.rb ├── output ├── readme.md └── spec ├── fixtures │  └── sample_words.txt └── word_sequence_parser_spec.rb

要运行脚本： ruby main.rb examples/dictionary.txt

main.rb的

 require_relative 'lib/word_sequence_parser.rb' dict_path = ARGV.shift if dict_path.nil? dict_path = 'spec/fixtures/sample_words.txt' end parser = WordSequenceParser.new(dict_path) # step 1 - Opens dictionary file and generates a new set of words parser.set # step 2 - Parses word sequences parser.sequence # step 3 - Prints to files in ./output parser.dump_text

有效的脚本

word_sequence_parser.rb

 require 'set' class WordSequenceParser def initialize(path) @path = path end def set set = Set.new File.open(@path) do |f| f.each_line do |line| set.add(line.chomp.downcase) end end set end def sequence sequences = Set.new words = Set.new to_remove = Set.new set.each do |w| letters = w.split(//) letters.each_cons(4) do |seq| s = seq.join if !words.add?(s) to_remove.add(s) end sequences.add( {seq: s, word: w} ) end end sequences.delete_if { |hash| to_remove.include?(hash[:seq]) } end def dump_text output_s = File.open( 'output/sequences.txt', 'w' ) output_w = File.open( 'output/words.txt', 'w' ) sequence.each do |hash| output_s.puts("#{hash[:seq]}") output_w.puts("#{hash[:word]}") end output_s.close output_w.close end end

我对脚本的镜头记忆不起作用

 require 'set' class WordSequenceParser def initialize(path) @path = path end def set set = Set.new File.open(@path) do |f| f.each_line do |line| set.add(line.chomp.downcase) end end set end def memoize @set = set end def sequence sequences = Set.new words = Set.new to_remove = Set.new @set.each do |w| letters = w.split(//) letters.each_cons(4) do |seq| s = seq.join if !words.add?(s) to_remove.add(s) end sequences.add( {seq: s, word: w} ) end end sequences.delete_if { |hash| to_remove.include?(hash[:seq]) } end def dump_text output_s = File.open( 'output/sequences.txt', 'w' ) output_w = File.open( 'output/words.txt', 'w' ) sequence.each do |hash| output_s.puts("#{hash[:seq]}") output_w.puts("#{hash[:word]}") end output_s.close output_w.close end end

尝试运行脚本时收到此错误消息。

 ../word_sequence_parser.rb:29:in `sequence': undefined method `each' for nil:NilClass (NoMethodError) from main.rb:15:in `'

我已经阅读了贾斯汀韦斯关于记忆的文章，并且大部分都得到了它。只是很难将这种技术应用到我已经写过的东西中。

它不起作用，因为你从不调用memoize，所以@set永远不会被初始化。

然而，这里真正的问题是没有什么值得记住的。

您的原始代码看起来非常好，如果您考虑它是如何工作的，那么任何代码都不会冗余执行 。执行一次或多次执行的每一行都返回不同的值。

因此，记忆中没有任何目的。

让我们说你想多次调用dump_text（或只是序列）然后你肯定想要按如下方式记忆序列：

 def sequence @sequence ||= begin sequences = Set.new words = Set.new to_remove = Set.new set.each do |w| letters = w.split(//) letters.each_cons(4) do |seq| s = seq.join if !words.add?(s) to_remove.add(s) end sequences.add( {seq: s, word: w} ) end end sequences.delete_if { |hash| to_remove.include?(hash[:seq]) } end end

这只会执行一次原始序列计算代码，然后分配@sequence。对@sequence的每次其他调用都将重用已经计算过的@sequence的值（因为它现在不是nil。）

我喜欢这个问题，因为这是我公司开始使用ruby时的第一件事 。我们有一个顾问重做了很多旧的asp.net代码，他在方法中有这些@foo || = …表达式，这是我以前从未见过的。

Word解析器脚本和实现memoization

描述

到目前为止我想出了什么

有效的脚本

我对脚本的镜头记忆不起作用

Ruby gsub函数

检索给定用户评论的所有post，Ruby on Rails

ruby如何处理数组范围访问？

Ruby / Rails：Prepend，将代码附加到所有方法

OAuth gem未签署请求

ruby数组循环总是对

活动管理员日期过滤日期格式自定义

如何处理ZeroMQ + Ruby中的线程问题？

Rails中的匹配和路由

Rails：“新”行动如何称为“创造”行动？