在Ruby中使用Parslet的缩进敏感解析器?

我试图使用Ruby中的Parslet库解析一个简单的缩进敏感语法。

以下是我尝试解析的语法示例:

level0child0 level0child1 level1child0 level1child1 level2child0 level1child2 

生成的树看起来像这样:

 [ { :identifier => "level0child0", :children => [] }, { :identifier => "level0child1", :children => [ { :identifier => "level1child0", :children => [] }, { :identifier => "level1child1", :children => [ { :identifier => "level2child0", :children => [] } ] }, { :identifier => "level1child2", :children => [] }, ] } ] 

我现在的解析器可以解析嵌套级别0和1节点,但不能解析过去:

 require 'parslet' class IndentationSensitiveParser > newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) } rule(:document) { node.repeat } root :document end require 'ap' require 'pp' begin input = DATA.read puts '', '----- input ----------------------------------------------------------------------', '' ap input tree = IndentationSensitiveParser.new.parse(input) puts '', '----- tree -----------------------------------------------------------------------', '' ap tree rescue IndentationSensitiveParser::ParseFailed => failure puts '', '----- error ----------------------------------------------------------------------', '' puts failure.cause.ascii_tree end __END__ user name age recipe name foo bar 

很明显,我需要一个动态计数器,它需要3个缩进节点匹配嵌套级别3上的标识符。

如何以这种方式使用Parslet实现缩进敏感的语法分析器? 可能吗?

有几种方法。

  1. 通过将每一行识别为缩进和标识符的集合来解析文档,然后应用转换以基于缩进的数量重构层次结构。

  2. 使用捕获来存储当前缩进并期望下一个节点包含该缩进加上更多以匹配作为子项(我没有深入研究这种方法,因为下一个发生在我身上)

  3. 规则只是方法。 所以你可以将’node’定义为一个方法,这意味着你可以传递参数! (如下)

这允许您根据node(depth+1)定义node(depth) node(depth+1) 。 但是,这种方法的问题是node方法与字符串不匹配,它会生成解析器。 所以递归调用永远不会完成。

这就是dynamic存在的原因。 它会返回一个解析器,直到它尝试匹配它为止,它才会被解析,这样你现在可以毫无问题地进行递归。

请参阅以下代码:

 require 'parslet' class IndentationSensitiveParser < Parslet::Parser def indent(depth) str(' '*depth) end rule(:newline) { str("\n") } rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) } def node(depth) indent(depth) >> identifier >> newline.maybe >> (dynamic{|s,c| node(depth+1).repeat(0)}).as(:children) end rule(:document) { node(0).repeat } root :document end 

这是我最喜欢的解决方案。

我不喜欢通过整个语法编织缩进过程知识的想法。 我宁愿只生成INDENT和DEDENT令牌,其他规则可以使用类似于匹配“{”和“}”字符。 所以以下是我的解决方案。 它是一个IndentParser类,任何解析器都可以扩展以生成nlindentdecent标记。

 require 'parslet' # Atoms returned from a dynamic that aren't meant to match anything. class AlwaysMatch < Parslet::Atoms::Base def try(source, context, consume_all) succ("") end end class NeverMatch < Parslet::Atoms::Base attr_accessor :msg def initialize(msg = "ignore") self.msg = msg end def try(source, context, consume_all) context.err(self, source, msg) end end class ErrorMatch < Parslet::Atoms::Base attr_accessor :msg def initialize(msg) self.msg = msg end def try(source, context, consume_all) context.err(self, source, msg) end end class IndentParser < Parslet::Parser ## # Indentation handling: when matching a newline we check the following indentation. If # that indicates an indent token or detent tokens (1+) then we stick these in a class # variable and the high-priority indent/dedent rules will match as long as these # remain. The nl rule consumes the indentation itself. rule(:indent) { dynamic {|s,c| if @indent.nil? NeverMatch.new("Not an indent") else @indent = nil AlwaysMatch.new end }} rule(:dedent) { dynamic {|s,c| if @dedents.nil? or @dedents.length == 0 NeverMatch.new("Not a dedent") else @dedents.pop AlwaysMatch.new end }} def checkIndentation(source, ctx) # See if next line starts with indentation. If so, consume it and then process # whether it is an indent or some number of dedents. indent = "" while source.matches?(Regexp.new("[ \t]")) indent += source.consume(1).to_s #returns a Slice end if @indentStack.nil? @indentStack = [""] end currentInd = @indentStack[-1] return AlwaysMatch.new if currentInd == indent #no change, just match nl if indent.start_with?(currentInd) # Getting deeper @indentStack << indent @indent = indent #tells the indent rule to match one return AlwaysMatch.new else # Either some number of de-dents or an error # Find first match starting from back count = 0 @indentStack.reverse.each do |level| break if indent == level #found it, if level.start_with?(indent) # New indent is prefix, so we de-dented this level. count += 1 next end # Not a match, not a valid prefix. So an error! return ErrorMatch.new("Mismatched indentation level") end @dedents = [] if @dedents.nil? count.times { @dedents << @indentStack.pop } return AlwaysMatch.new end end rule(:nl) { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }} rule(:unixnl) { str("\n") } rule(:macnl) { str("\r") } rule(:winnl) { str("\r\n") } rule(:anynl) { unixnl | macnl | winnl } end 

我相信很多东西都可以改进,但这是我到目前为止所提出的。

用法示例:

 class MyParser < IndentParser rule(:colon) { str(':') >> space? } rule(:space) { match(' \t').repeat(1) } rule(:space?) { space.maybe } rule(:number) { match['0-9'].repeat(1).as(:num) >> space? } rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) } rule(:block) { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent } rule(:stmt) { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock } rule(:testblock) { identifier.as(:name) >> block } rule(:prgm) { testblock >> nl.repeat } root :prgm end