用于解析ERB文件的库
我试图解析,而不是评估,以Hpricot / Nokogiri类型方式导轨ERB文件。 我试图解析的文件包含混合了使用ERB生成的动态内容的HTML片段(标准的rails视图文件)我正在寻找一个不仅会解析周围内容的库,就像Hpricot或Nokogiri那样,但也会对待ERB符号,<%,<%=等,就好像它们是html / xml标签一样。
理想情况下,我会回到DOM结构,其中<%,<%= etc符号将作为自己的节点类型包含在内。
我知道可以使用正则表达式一起破解某些东西,但我正在寻找一些更可靠的东西,因为我正在开发一个工具,我需要在一个非常大的视图代码库上运行,其中包含html内容和erb内容是重要的。
例如,内容如:
等等等等等等我的精彩文字
将返回一个树结构,如:
根 - text_node(等等等等) - 元素(div) - text_node(我的好文字) - erb_node(<%=)
我最终通过使用RLex, http: //raa.ruby-lang.org/project/ruby-lex/,以及以下语法的lex的ruby版本来解决这个问题:
%{ #define NUM 257 #define OPTOK 258 #define IDENT 259 #define OPETOK 260 #define CLSTOK 261 #define CLTOK 262 #define FLOAT 263 #define FIXNUM 264 #define WORD 265 #define STRING_DOUBLE_QUOTE 266 #define STRING_SINGLE_QUOTE 267 #define TAG_START 268 #define TAG_END 269 #define TAG_SELF_CONTAINED 270 #define ERB_BLOCK_START 271 #define ERB_BLOCK_END 272 #define ERB_STRING_START 273 #define ERB_STRING_END 274 #define TAG_NO_TEXT_START 275 #define TAG_NO_TEXT_END 276 #define WHITE_SPACE 277 %} 数字[0-9] 空白[] 信[A-Za-z] name1 [A-Za-z_] name2 [A-Za-z_0-9] valid_tag_character [A-Za-z0-9“'= @ _():/] ignore_tags样式|脚本 %% {blank} +“\ n”{return [WHITE_SPACE,yytext]} “\ n”{blank} + {return [WHITE_SPACE,yytext]} {blank} +“\ n”{blank} + {return [WHITE_SPACE,yytext]} “\ r”{return [WHITE_SPACE,yytext]} “\ n”{return [yytext [0],yytext [0..0]]}; “\ t”{return [yytext [0],yytext [0..0]]}; ^ {blank} + {return [WHITE_SPACE,yytext]} {blank} + $ {return [WHITE_SPACE,yytext]}; “”{return [TAG_NO_TEXT_START,yytext]} “”{return [TAG_NO_TEXT_END,yytext]} “”{return [TAG_SELF_CONTAINED,yytext]} “”{return [TAG_SELF_CONTAINED,yytext]} “”{return [TAG_START,yytext]} “”{return [TAG_END,yytext]} “”{return [ERB_BLOCK_END,yytext]} “”{return [ERB_STRING_END,yytext]} {letter} + {return [WORD,yytext]} \“。* \”{return [STRING_DOUBLE_QUOTE,yytext]} '。*'{return [STRING_SINGLE_QUOTE,yytext]} 。 {return [yytext [0],yytext [0..0]]} %%
这不是一个完整的语法,但为了我的目的,查找和重新发出文本,它工作。 我将这个语法与这一小段代码结合起来:
text_handler = MakeYourOwnCallbackHandler.new l = Erblex.new l.yyin = File.open(file_name,“r”) 循环做 a,v = l.yylex 如果a == 0则中断 if(a
我最近有类似的问题。 我采用的方法是编写一个小脚本(erblint.rb)执行字符串替换以将ERB标记( https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<% %>
和https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%= %>
)转换为XML标记,然后使用Nokogiri进行解析。
请参阅以下代码以了解我的意思:
#!/usr/bin/env ruby require 'rubygems' require 'nokogiri' # This is a simple program that reads in a Ruby ERB file, and parses # it as an XHTML file. Specifically, it makes a decent attempt at # converting the ERB tags (https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<% %> and https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%= %>) to XML tags ( # and respectively. # # Once the document has been parsed, it will be validated and any # error messages will be displayed. # # More complex option and error handling is left as an exercise to the user. abort 'Usage: erb.rb ' if ARGV.empty? filename = ARGV[0] begin doc = "" File.open(filename) do |file| puts "\n*** Parsing #{filename} ***\n\n" file.read(nil, s = "") # Substitute the standard ERB tags to convert them to XML tags # https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%= ... %> for ... # https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<% ... %> for ... # # Note that this won't work for more complex expressions such as: # >link text # Of course, this is not great style, anyway... s.gsub!(/https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%=(.+?)%>/m, '\1 ') s.gsub!(/https://stackoverflow.com/questions/2588967/library-to-parse-erb-files/<%(.+?)%>/m, '\1 ') doc = Nokogiri::XML(s) do |config| # put more config options here if required # config.strict end end puts doc.to_xhtml(:indent => 2, :encoding => 'UTF-8') puts "Huzzah, no errors!" if doc.errors.empty? # Otherwise, print each error message doc.errors.each { |e| puts "Error at line #{e.line}: #{e}" } rescue puts "Oops! Cannot open #{filename}" end
我在Github上发布了这个要点: https : //gist.github.com/787145