Nokogiri用于在唯一标签集之间选择文本和html

我正在尝试使用Nokogiri从两个独特的标签集之间提取文本。

什么是在p-tag中获取文本的最佳方法

The problem

The solution

,然后全部

The solution

之间的HTML?

完整html的示例:

 

The problem

TEXT I WANT

The solution

HTML I WANT with it's own set of tags (but never an

or
)

谢谢!

 require 'nokogiri' doc = Nokogiri.HTML(DATA) doc.search('//h2/following-sibling::node()[name() != "h2" and name() != "div" and text() != "\n"]').each do |block| p block.text end __END__ 

The problem

TEXT I WANT

The solution

dont capture this
HTML I WANT with it's

own set of tags

输出:

 "TEXT I WANT" "HTML I WANT with it's own set of tags" 

此XPath选择h2所有后续兄弟节点,这些节点不是h2div或只包含字符串"\n"

这里是你如何在两个包含类点的 h2之间获得p标签文本

 //h2[@class="point"][1]/following-sibling::p[./following-sibling::h2[@class="point"]]/text() 

对于第二个,你应该探索w3schools ,并以第一个作为例子来做。