用于解析ERB文件的库

我试图解析,而不是评估,以Hpricot / Nokogiri类型方式导轨ERB文件。 我试图解析的文件包含混合了使用ERB生成的动态内容的HTML片段(标准的rails视图文件)我正在寻找一个不仅会解析周围内容的库,就像Hpricot或Nokogiri那样,但也会对待ERB符号,<%,<%=等,就好像它们是html / xml标签一样。

理想情况下,我会回到DOM结构,其中<%,<%= etc符号将作为自己的节点类型包含在内。

我知道可以使用正则表达式一起破解某些东西,但我正在寻找一些更可靠的东西,因为我正在开发一个工具,我需要在一个非常大的视图代码库上运行,其中包含html内容和erb内容是重要的。

例如,内容如:

等等等等等等
 
我的精彩文字

将返回一个树结构,如:

根
  -  text_node(等等等等)
  - 元素(div)
     -  text_node(我的好文字)
         -  erb_node(<%=)

我最终通过使用RLex, http: //raa.ruby-lang.org/project/ruby-lex/,以及以下语法的lex的ruby版本来解决这个问题:

 %{

 #define NUM 257

 #define OPTOK 258
 #define IDENT 259
 #define OPETOK 260
 #define CLSTOK 261
 #define CLTOK 262
 #define FLOAT 263
 #define FIXNUM 264
 #define WORD 265
 #define STRING_DOUBLE_QUOTE 266
 #define STRING_SINGLE_QUOTE 267

 #define TAG_START 268
 #define TAG_END 269
 #define TAG_SELF_CONTAINED 270
 #define ERB_BLOCK_START 271
 #define ERB_BLOCK_END 272
 #define ERB_STRING_START 273
 #define ERB_STRING_END 274
 #define TAG_NO_TEXT_START 275
 #define TAG_NO_TEXT_END 276
 #define WHITE_SPACE 277
 %}

数字[0-9]
空白[]
信[A-Za-z]
 name1 [A-Za-z_]
 name2 [A-Za-z_0-9]
 valid_tag_character [A-Za-z0-9“'= @ _():/] 
 ignore_tags样式|脚本
 %%

 {blank} +“\ n”{return [WHITE_SPACE,yytext]} 
 “\ n”{blank} + {return [WHITE_SPACE,yytext]} 
 {blank} +“\ n”{blank} + {return [WHITE_SPACE,yytext]} 

 “\ r”{return [WHITE_SPACE,yytext]} 
 “\ n”{return [yytext [0],yytext [0..0]]};
 “\ t”{return [yytext [0],yytext [0..0]]};

 ^ {blank} + {return [WHITE_SPACE,yytext]}

 {blank} + $ {return [WHITE_SPACE,yytext]};

 “”{return [TAG_NO_TEXT_START,yytext]}
 “”{return [TAG_NO_TEXT_END,yytext]}
 “”{return [TAG_SELF_CONTAINED,yytext]}
 “”{return [TAG_SELF_CONTAINED,yytext]}
 “”{return [TAG_START,yytext]}
 “”{return [TAG_END,yytext]}

 “”{return [ERB_BLOCK_END,yytext]}
 “”{return [ERB_STRING_END,yytext]}


 {letter} + {return [WORD,yytext]}


 \“。* \”{return [STRING_DOUBLE_QUOTE,yytext]}
 '。*'{return [STRING_SINGLE_QUOTE,yytext]}
 。  {return [yytext [0],yytext [0..0]]}

 %%

这不是一个完整的语法,但为了我的目的,查找和重新发出文本,它工作。 我将这个语法与这一小段代码结合起来:

     text_handler = MakeYourOwnCallbackHandler.new

     l = Erblex.new
     l.yyin = File.open(file_name,“r”)

    循环做
       a,v = l.yylex
      如果a == 0则中断

       if(a 

我最近有类似的问题。 我采用的方法是编写一个小脚本(erblint.rb)执行字符串替换以将ERB标记( https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<% %>https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%= %> )转换为XML标记,然后使用Nokogiri进行解析。

请参阅以下代码以了解我的意思:

 #!/usr/bin/env ruby require 'rubygems' require 'nokogiri' # This is a simple program that reads in a Ruby ERB file, and parses # it as an XHTML file. Specifically, it makes a decent attempt at # converting the ERB tags (https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<% %> and https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%= %>) to XML tags ( # and  respectively. # # Once the document has been parsed, it will be validated and any # error messages will be displayed. # # More complex option and error handling is left as an exercise to the user. abort 'Usage: erb.rb ' if ARGV.empty? filename = ARGV[0] begin doc = "" File.open(filename) do |file| puts "\n*** Parsing #{filename} ***\n\n" file.read(nil, s = "") # Substitute the standard ERB tags to convert them to XML tags # https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%= ... %> for  ...  # https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<% ... %> for  ...  # # Note that this won't work for more complex expressions such as: #  >link text # Of course, this is not great style, anyway... s.gsub!(/https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%=(.+?)%>/m, '\1') s.gsub!(/https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%(.+?)%>/m, '\1') doc = Nokogiri::XML(s) do |config| # put more config options here if required # config.strict end end puts doc.to_xhtml(:indent => 2, :encoding => 'UTF-8') puts "Huzzah, no errors!" if doc.errors.empty? # Otherwise, print each error message doc.errors.each { |e| puts "Error at line #{e.line}: #{e}" } rescue puts "Oops! Cannot open #{filename}" end 

我在Github上发布了这个要点: https : //gist.github.com/787145