将XPath与HTML或XML片段一起使用？

我是Nokogiri和XPath的新手，我试图访问HTML或XML片段中的所有注释。当我没有使用fragment函数时，XPaths .//comment()和//comment()工作，但是他们找不到任何片段。使用标记而不是注释，它适用于第一个XPath。

通过反复试验，我意识到在这种情况下， comment()只找到顶级注释和.//comment()而其他一些只找到内部注释。难道我做错了什么？我错过了什么？任何人都可以解释发生了什么？

我应该使用什么XPath来获取Nokogiri解析的HTML片段中的所有注释？

这个例子可以帮助理解这个问题：

 str = "
" # this works: Nokogiri::HTML(str).xpath("//comment()") => [#, #] Nokogiri::HTML(str).xpath(".//comment()") => [#, #] # with fragment, it does not work: Nokogiri::HTML.fragment(str).xpath("//comment()") => [] Nokogiri::HTML.fragment(str).xpath("comment()") => [#] Nokogiri::HTML.fragment(str).xpath(".//comment()") => [#] Nokogiri::HTML.fragment(str).xpath("*//comment()") => [#] Nokogiri::HTML.fragment(str).xpath("*/comment()") => [#] # however it does if it is a tag instead of a comment: str = " two
" Nokogiri::HTML.fragment(str).xpath(".//a") => [#<Nokogiri::XML::Element:0x3f8535cb44c8 name="a" attributes=[#]>, #<Nokogiri::XML::Element:0x3f8535cb4220 name="a" children=[#]>, #<Nokogiri::XML::Element:0x3f8535cb3a3c name="a" attributes=[#]>]

PS：没有fragment它会做我想要的，但它也添加了一些像“DOCTYPE”的东西，我真的只有一个我正在编辑的HTML文件的片段（删除一些标签，替换其他标签）。

//comment()是/descendant-or-self::node()/child::comment()的简短forms

将此xpath与片段一起使用会忽略根注释（它们由/descendant-or-self::node()但它们没有子项）。

如果您使用HTML(str) ，则创建一个文档节点作为所有其他项的根。因此， /descendant-or-self::node()/child::comment()不会忽略顶级注释，因为它们是文档节点的子节点（它本身由/descendant-or-self::node() ）。

我不知道为什么descendant::comment()在任何情况下都有效，我会说它应该是descendant-or-self::comment() ，但没关系。

希望有帮助吗？

"descendant::comment()"和"descendant::sometag"在每种情况下"descendant::sometag"正常工作，但我仍然不理解这些差异。

将XPath与HTML或XML片段一起使用？

使用Mechanize和Nokogiri保存图像？

有没有办法逃脱Nokogiri css中的非字母数字字符？

如何用Nokogiri解析连续标签？

如何安装Nokogiri Gem for Windows

Nokogiri可以保留属性引用风格吗？

如何使用nokogiri方法.xpath和.at_xpath

如何检索nokogiri处理指令属性？

错误 – “gem install rails” – 缺少libxml2

如何将一组放在中

您如何知道何时使用XML解析器以及何时使用ActiveResource？