如何使用Nokogiri在某些标签之后或之前获取文本

我有一个HTML文档，如下所示：

title Something test # one # two # three # four something1 some random test test # first # second # third # fourth testing

我想提取：

 # one # two # three # four # first # second # third # fourth

换句话说，我希望“在test之后的所有文本以及之后开始的下一个标记之前。”

我可以使用'//root/text()'获取'//root/text()'之间的所有文本，但如何在某些标记之前和之后获取所有文本？

这似乎有效：

 require 'nokogiri' xml = '  Something  # one # two # three # four something1 some random test  # first # second # third # fourth   ' doc = Nokogiri::XML(xml) text = (doc / 'template[@element="1"]').map{ |n| n.next_sibling.text.strip.gsub(/\n +/, "\n") } puts text # >> # one # >> # two # >> # three # >> # four # >> # first # >> # second # >> # third # >> # fourth

我很确定krusty.ar是正确的，没有内置的方法来实现这一目标。如果您愿意，可以逐个删除根标记内的所有标记。这是一个黑客，但它的工作原理：

 doc = Nokogiri::HTML(open(url)) # or Nokogiri::HTML.parse(File.open(file_path)) doc.xpath('//template').remove doc.xpath('//h').remove doc

这样就可以找到您发布的HTML所需的结果。

如何使用Nokogiri在某些标签之后或之前获取文本

重构Ruby抓取代码

获取Nokogiri中属性的值以提取链接URL

使用Nokogiri获取包含特定属性名称的元素中的所有节点

Nokogiri在Heroku上解析时添加了字符

使用Nokogiri HTML Builder创建具有多个根节点的片段

Nokogiri是针对LibXML 2.7.7版本构建的，但动态加载了2.7.3

在带有Ruby 1.9.3的Windows 8 x64上，nokogiri gem缺少libxml2

保存网站中的所有图像文件

如何使用Nokogiri在两个HTML注释之间抓取HTML？

在Yosemite 10.10.3上安装Nokogiri