使用Nokogiri提取链接时如何获取绝对URL？

我正在使用Nokogiri从页面中提取链接，但我想获得绝对路径，即使页面上的链接是相对路径。我怎么能做到这一点？

Nokogiri是不相关的，除了它为你提供了链接锚点。使用Ruby的URI库来管理路径：

absolute_uri = URI.join( page_url, href ).to_s

看到行动：

 require 'uri' # The URL of the page with the links page_url = 'http://foo.com/zee/zaw/zoom.html' # A variety of links to test. hrefs = %w[ http://zork.com/ http://zork.com/#id http://zork.com/bar http://zork.com/bar#id http://zork.com/bar/ http://zork.com/bar/#id http://zork.com/bar/jim.html http://zork.com/bar/jim.html#id /bar /bar#id /bar/ /bar/#id /bar/jim.html /bar/jim.html#id jim.html jim.html#id ../jim.html ../jim.html#id ../ ../#id #id ] hrefs.each do |href| root_href = URI.join(page_url,href).to_s puts "%-32s -> %s" % [ href, root_href ] end #=> http://zork.com/ -> http://zork.com/ #=> http://zork.com/#id -> http://zork.com/#id #=> http://zork.com/bar -> http://zork.com/bar #=> http://zork.com/bar#id -> http://zork.com/bar#id #=> http://zork.com/bar/ -> http://zork.com/bar/ #=> http://zork.com/bar/#id -> http://zork.com/bar/#id #=> http://zork.com/bar/jim.html -> http://zork.com/bar/jim.html #=> http://zork.com/bar/jim.html#id -> http://zork.com/bar/jim.html#id #=> /bar -> http://foo.com/bar #=> /bar#id -> http://foo.com/bar#id #=> /bar/ -> http://foo.com/bar/ #=> /bar/#id -> http://foo.com/bar/#id #=> /bar/jim.html -> http://foo.com/bar/jim.html #=> /bar/jim.html#id -> http://foo.com/bar/jim.html#id #=> jim.html -> http://foo.com/zee/zaw/jim.html #=> jim.html#id -> http://foo.com/zee/zaw/jim.html#id #=> ../jim.html -> http://foo.com/zee/jim.html #=> ../jim.html#id -> http://foo.com/zee/jim.html#id #=> ../ -> http://foo.com/zee/ #=> ../#id -> http://foo.com/zee/#id #=> #id -> http://foo.com/zee/zaw/zoom.html#id

之前使用URI.parse(root).merge(URI.parse(href)).to_s回答更复杂。
感谢@pguardiario的改进。

Phrogz的答案很好，但更简单：

 URI.join(base, url).to_s

您需要检查URL是绝对的还是相对的，如果以http:开头检查http:如果URL是相对的，则需要将主机添加到此URL。你不能通过nokogiri做到这一点。你需要处理里面的所有url来渲染像绝对的。

使用Nokogiri提取链接时如何获取绝对URL？

如何刮取延迟加载的页面

XPath选择前面的元素与可选的插入空白文本节点

如何使用Nokogiri将两个XML文件合并为一个？

“语法错误，意外的tIDENTIFIER，期待$ end”

Nokogiri Scraping错过了HTML

FF Xpather到Nokogiri – 我可以复制和粘贴吗？

从标记中提取HTML5数据属性

升级到ruby 1.9.2并在nokogiri中获得Segmentation Fault错误

DRY使用nokogiri搜索网站的每个页面

Ruby 2.1和Nokogiri安装错误？