Tag: xpath

使用XPath查找节点中的最后一行: 我想知道是否有办法总是选择某个元素上方的节点内容？我有以下代码要提取： Name Some content1 Address 12345 09876 City, Country 12345 以下是查找范围上方所有内容的XPath： //div[@id=”someDiv”]/span[@id=”tel_number”]/preceding-sibling::node() 现在，我需要的是一个XPath，它始终选择跨度上方的内容而不是其他内容（单行）。如果（出于某种原因）缺少跨度，它也应该起作用。希望有人可以帮忙！

nokogiri xpath属性 – 奇怪的结果: 我有一堆字段，当我尝试运行时： src.xpath(‘//RECORD’).each do |record| tbegin = record.xpath(‘//FIELD/TOKEN’) tbegin数组返回其他记录中的字段。我已经检查过第一行是否给了我适当的“记录”子树数组，但下一次调用tbegin并不tbegin搜索限制为仅仅“记录”子树。实际上，它始终返回record[0]的字段子树。到目前为止，我通过使用以下方式解决了这个问题： tbegin = record.css(‘TOKEN’) 但我想知道我做错了什么。

如何使用nokogiri方法.xpath和.at_xpath: 我正在学习如何使用nokogiri，根据下面的代码，我找到的问题很少 require ‘rubygems’ require ‘mechanize’ post_agent = WWW::Mechanize.new post_page = post_agent.get(‘http://www.vbulletin.org/forum/showthread.php?t=230708’) puts “\nabsolute path with tbody gives nil” puts post_page.parser.xpath(‘/html/body/div/div/div/div/div/table/tbody/tr/td/div[2]’).xpath(‘text()’).to_s.strip.inspect puts “\n.at_xpath gives an empty string” puts post_page.parser.at_xpath(“//div[@id=’posts’]/div/table/tr/td/div[2]”).at_xpath(‘text()’).to_s.strip.inspect puts “\ntwo lines solution with .at_xpath gives an empty string” rows = post_page.parser.xpath(“//div[@id=’posts’]/div/table/tr/td/div[2]”) puts rows[0].at_xpath(‘text()’).to_s.strip.inspect puts puts “two lines working code” rows = post_page.parser.xpath(“//div[@id=’posts’]/div/table/tr/td/div[2]”) puts rows[0].xpath(‘text()’).to_s.strip puts […]

如何使用Nokogiri获取XML文档的根元素名称？: 使用Nokogiri，我想确定根元素的名称。我认为对/做一个XPath查询可以做到这一点，但显然节点名称是“文档”？ require ‘nokogiri’ doc = Nokogiri::XML(‘Hello’) doc.xpath(‘/’).first.name # => “document” doc.xpath(‘/foo’).first.name # => “foo” 如何在不事先知道根节点名的情况下获取字符串“foo”？

Ruby on Rails XPath Json刮痧图像: 我正试图从网站上抓取图片。到目前为止，我正在使用Nokogiri和XPath，但收效甚微。对于HTML有img和src的典型网站，我可以使用： tmp2 = Nokogiri::HTML(open(site_url)) tmp2.xpath(“//img/@src”).each do |src| …do whatever end 但是，像亚马逊和eBay这样的网站只能用javascript触发某些图像。如果我查看代码，我可以在数组中看到数据。例如，来自亚马逊（来源： http ： //www.amazon.com/Threads-Thought-Womens-Dreams-X-Small/dp/B00T46V758/ref=sr_1_5? s=apparel&ie=UTF8&qid=1433555447&sr= 1-5 ）： P.when(‘jQuery’, ‘cf’).execute(function($, cf){ P.load.js(‘http://z-ecx.images-amazon.com/images/G/01/browser-scripts/imageBlock-udp-airy/imageBlock-udp-airy-4060168860._V1_.js’); }); P.when(‘A’, ‘jQuery’, ‘ImageBlockATF’, ‘cf’).register(‘ImageBlockBTF’, function(A, $, imageBlockATF, cf){ var data = {“indexToColor”:[],”burjImageBlock”:0,”isSwatchHoverConsistent”:1,”heroFocalPoint”:null,”visualDimensions”:[“color_name”],”productGroupID”:”apparel_display_on_website”,”newVideoMissing”:0,”useIV”:0,”useClickZoom”:null,”useChildVideos”:0,”numColors”:7,”logMetrics”:0,”defaultColor”:”initial”,”airyConfig”:{“enableContinuousPlay”:null,”installFlashButtonText”:”Install Flash Player”,”contentTitle”:null,”autoplayCutOffTimeSeconds”:null,”ageGate”:{“monthNames”:[“January”,”February”,”March”,”April”,”May”,”June”,”July”,”August”,”September”,”October”,”November”,”December”],”deniedPrompt”:”We’re sorry. You are not old enough to watch this video.”,”submitText”:”Submit”,”prompt”:”This video is not intended […]

将XPath与HTML或XML片段一起使用？: 我是Nokogiri和XPath的新手，我试图访问HTML或XML片段中的所有注释。当我没有使用fragment函数时，XPaths .//comment()和//comment()工作，但是他们找不到任何片段。使用标记而不是注释，它适用于第一个XPath。通过反复试验，我意识到在这种情况下， comment()只找到顶级注释和.//comment()而其他一些只找到内部注释。难道我做错了什么？我错过了什么？任何人都可以解释发生了什么？我应该使用什么XPath来获取Nokogiri解析的HTML片段中的所有注释？这个例子可以帮助理解这个问题： str = “” # this works: Nokogiri::HTML(str).xpath(“//comment()”) => [#, #] Nokogiri::HTML(str).xpath(“.//comment()”) => [#, #] # with fragment, it does not work: Nokogiri::HTML.fragment(str).xpath(“//comment()”) => [] Nokogiri::HTML.fragment(str).xpath(“comment()”) => [#] Nokogiri::HTML.fragment(str).xpath(“.//comment()”) => [#] Nokogiri::HTML.fragment(str).xpath(“*//comment()”) => [#] Nokogiri::HTML.fragment(str).xpath(“*/comment()”) => [#] # however it does if it is a […]

由火虫产生的xpath中的Tbody标签: 我正在尝试使用ruby hpricot库从在线htmls中提取一些数据。我使用firefox扩展fire bug来获取所选项目的xpath。生成的xpath表达式中始终存在额外的tbody标记。在某些情况下，我必须从表达式中删除tbody标记以获得结果，而在其他情况下，我必须保留标记以获得结果。我只是无法弄清楚何时保留tbody标签以及何时不能。

检查元素是否有两个类: 我有2个可能的div。和有没有办法检查div元素是否有2个类a和b？我使用Ruby，Capybara和XPath来选择元素，但如果可以解决问题，则css很好。

如何使用Nokogiri导航DOM: 我正在尝试填充变量parent_element_h1和parent_element_h2 。谁能帮助我使用Nokogiri将我需要的信息输入这些变量？ require ‘rubygems’ require ‘nokogiri’ value = Nokogiri::HTML.parse(<<-HTML_END) " A Foo B C Bar D E F ” HTML_END parent = value.css(‘body’).first # start_here is given: A Nokogiri::XML::Element of the with the id ‘X2 start_here = parent.at(‘div.block#X2’) # this should be a Nokogiri::XML::Element of the nearest, previous h1. # in this example it’s […]

如何让Nokogiri解析并返回XML文档？: 这是一些奇怪的例子： #!/usr/bin/ruby require ‘rubygems’ require ‘open-uri’ require ‘nokogiri’ print “without read: “, Nokogiri(open(‘http://weblog.rubyonrails.org/’)).class, “\n” print “with read: “, Nokogiri(open(‘http://weblog.rubyonrails.org/’).read).class, “\n” 运行此返回： without read: Nokogiri::XML::Document with read: Nokogiri::HTML::Document 没有read返回XML，并且它是HTML？网页被定义为“XHTML过渡”，所以起初我认为Nokogiri必须从流中读取OpenURI的“内容类型”，但返回’text/html’ ： (rdb:1) doc = open((‘http://weblog.rubyonrails.org/’)) (rdb:1) doc.content_type “text/html” 这是服务器返回的内容。所以，现在我想弄清楚为什么Nokogiri会返回两个不同的值。它似乎不是解析文本并使用启发式方法来确定内容是HTML还是XML。该页面指向的ATOM提要也发生了同样的事情： (rdb:1) doc = Nokogiri.parse(open(‘http://feeds.feedburner.com/RidingRails’)) (rdb:1) doc.class Nokogiri::XML::Document (rdb:1) doc = Nokogiri.parse(open(‘http://feeds.feedburner.com/RidingRails’).read) (rdb:1) doc.class Nokogiri::HTML::Document […]

Tag: xpath

使用XPath查找节点中的最后一行

nokogiri xpath属性 – 奇怪的结果

如何使用nokogiri方法.xpath和.at_xpath

如何使用Nokogiri获取XML文档的根元素名称？

Ruby on Rails XPath Json刮痧图像

将XPath与HTML或XML片段一起使用？

由火虫产生的xpath中的Tbody标签

检查元素是否有两个类

如何使用Nokogiri导航DOM

如何让Nokogiri解析并返回XML文档？

导轨通知没有出现

如何从rails 3.1引擎调用父应用程序的帮助方法

按共同关联数排序（Rails）

为应用程序Heroku HTTPS配置SSL

为什么设计不在登录页面上显示身份validation错误？

寻找在rake任务中加载所选模型而不是所有Rails环境的方法

HTTParty解析Rails中的JSON

Ruby On Rails libyaml

如何在Rails之外的Ruby脚本中使用ActionView :: Helper？

如何“取消”Ruby数组？

仅在root_path上登录后才重定向用户

Rails中的自引用模型3

如何在一个关系中使用accepts_nested_attributes_for设计？

Ruby on Rails：从另一个模型调用实例方法

为什么to_json在Rails 4中自动转义unicode？