如何阅读别人的论坛

我的朋友有一个论坛,里面有很多包含信息的post。 有时她想查看她论坛中的post,并得出结论。 目前,她通过点击她的论坛来评论post,并生成一个不一定准确的数据图片(在她的大脑中),她从中得出结论。 我今天的想法是,我可能会发出一个快速的Ruby脚本来解析必要的HTML,让她真正了解数据的含义。

我今天第一次使用Ruby的net / http库,我遇到了一个问题。 虽然我的浏览器没有查看我朋友的论坛,但似乎Net :: HTTP.new(“forumname.net”)方法产生以下错误:

无法建立连接,因为目标计算机主动拒绝它。 – 连接(2)

谷歌搜索这个错误,我已经知道它与MySQL(或类似的东西)有关,不希望像我这样的爱管闲事的人在那里远程探索:出于安全原因。 这对我来说很有意义,但它让我想知道:我的浏览器是如何在我朋友的论坛上找到的,但我的小Ruby脚本没有任何戳戳权利。 我的脚本是否有某种方式告诉服务器它不是威胁? 我只想要阅读权而不是写权利?

多谢你们,

ž。

刮网站? 使用机械化 :

#!/usr/bin/ruby1.8 require 'rubygems' require 'mechanize' agent = WWW::Mechanize.new page = agent.get("http://xkcd.com") page = page.link_with(:text=>'Forums').click page = page.link_with(:text=>'Mathematics').click page = page.link_with(:text=>'Math Books').click #puts page.parser.to_html # If you want to see the html you just got posts = page.parser.xpath("//div[@class='postbody']") for post in posts title = post.at_xpath('h3//text()').to_s author = post.at_xpath("p[@class='author']//a//text()").to_s body = post.xpath("div[@class='content']//text()").collect do |div| div.to_s end.join("\n") puts '-' * 40 puts "title: #{title}" puts "author: #{author}" puts "body:", body end 

输出的第一部分:

 ---------------------------------------- title: Math Books author: Cleverbeans body: This is now the official thread for questions about math books at any level, fr\ om high school through advanced college courses. I'm looking for a good vector calculus text to brush up on what I've forgotten.\ We used Stewart's Multivariable Calculus as a baseline but I was unable to pur\ chase the text for financial reasons at the time. I figured some things may hav\ e changed in the last 12 years, so if anyone can suggest some good texts on thi\ s subject I'd appreciate it. ---------------------------------------- title: Re: Multivariable Calculus Text? author: ThomasS body: The textbooks go up in price and new pretty pictures appear. However, Calculus \ really hasn't changed all that much. If you don't mind a certain lack of pretty pictures, you might try something li\ ke Widder's Advanced Calculus from Dover. it is much easier to carry around tha\ n Stewart. It is also written in a style that a mathematician might consider no\ rmal. If you think that you might want to move on to real math at some point, i\ t might serve as an introduction to the associated style of writing. 

某些网站只能使用“www”子域进行访问,因此可能会导致问题。

要创建get请求,您可能希望使用Get方法:

 require 'net/http' url = URI.parse('http://www.forum.site/') req = Net::HTTP::Get.new(url.path) res = Net::HTTP.start(url.host, url.port) {|http| http.request(req) } puts res.body 

你可能还需要在某个时候将用户代理设置为一个选项:

 {'User-Agent' => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1'})