Tag: 机械化

如何通过Mechanize和Nokogiri获取数据？: 我正在开发一个从http://www.screener.in/获取HTML的应用程序。我可以输入像“Atul Auto Ltd”这样的公司名称并提交它，并从下一页中删除以下详细信息：“CMP / BV”和“CMP”。我正在使用此代码： require ‘mechanize’ require ‘rubygems’ require ‘nokogiri’ Company_name=’Atul Auto Ltd.’ agent = Mechanize.new page = agent.get(‘http://www.screener.in/’) form = agent.page.forms[0] print agent.page.forms[0].fields agent.page.forms[0][“q”]=Company_name button = agent.page.forms[0].button_with(:value => “Search Company”) pages=agent.submit(form, button) puts pages.at(‘.//*[@id=”top”]/div[3]/div/table/tbody/tr/td[11]’) # not getting any output. 代码将我带到正确的页面，但我不知道如何查询以获取所需的数据。我尝试了不同的东西，但没有成功。如果可能，有人可以指向我一个很好的教程，解释如何从HTML页面刮取特定的类。第一个“CMP / BV”的XPath是： //*[@id=”top”]/div[3]/div/table/tbody/tr/td[11] 但它没有提供任何输出。

能机械化读ajax吗？（ruby）: 我可以使用ruby中的mechanize来获取通过AJAX显示的正确数据/文本吗？或者是否有任何其他脚本gem可以让我这样做？

无法在mac上安装ruby的机械化: 我正在尝试使用ruby版本1.8.7在Mac OS X版本10.7.3上安装mechanize。问题在于其依赖性之一nokogiri。我看过其他有关安装xcode的post，我这样做的是版本4.3.2。这是我收到的错误。先感谢您。 sudo gem install mechanize Building native extensions. This could take a while… ERROR: Error installing mechanize: ERROR: Failed to build gem native extension. /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby extconf.rb mkmf.rb can’t find header files for ruby at /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/ruby.h Gem files will remain installed in /Library/Ruby/Gems/1.8/gems/nokogiri-1.5.2 for inspection. Results logged to /Library/Ruby/Gems/1.8/gems/nokogiri-1.5.2/ext/nokogiri/gem_make.out

数据抓取多个数组创建和排序: 我们正在努力削减课程名称，资格和课程持续时间，并将每个课程存储在一个单独的arrays中。下面我们拉出所有这些，但它似乎是随机顺序，有些部分可能按页面排序等。想知道是否有人能够提供帮助。 require ‘mechanize’ mechanize = Mechanize.new @duration_array = [] @qual_array = [] @courses_array = [] page = mechanize.get(‘http://search.ucas.com/search/results?Vac=2&AvailableIn=2016&IsFeatherProcessed=True&page=1&providerids=41’) page.search(‘div.courseinfoduration’).each do |x| puts x.text.strip page.search(‘div.courseinfooutcome’).each do |y| puts y.text.strip end while next_page_link = page.at(‘.pager a[text()=”>”]’) page = mechanize.get(next_page_link[‘href’]) page.search(‘div.courseinfoduration’).each do |x| name = x @duration_array.push(name) puts x.text.strip end end while next_page_link = page.at(‘.pager a[text()=”>”]’) page […]

保持机械化页面超过请求边界: 我正在编写一个ruby应用程序，可以代表用户将评论发布到远程博客。我的问题是我必须在控制器的post方法中使用相同的页面，以使会话保持活动并填写validation码：应用程序/控制器/ comment_controller.rb require ‘mechanize’ class CommentController < ApplicationController def new agent = Mechanize.new @page = agent.get('http://blog.example.com') @captcha_src = @page.search("//div[@id='recaptcha_image']").search("//img")[1].attribute("src") #etc. end def post_comment # insert captcha, username, password + text into the form agent.submit(@page.form[0], @page.form[0].buttons.submitbutton) # Problem: page instance variable doesn't exist anymore end end 我已经尝试在Rails.cache中保存page-instance-variable，但是机械化页面无法编组为字符串。

nokogiri +通过文本机械化css选择器: 我是nokogiri的新手，到目前为止最熟悉CSS选择器，我试图从表中解析信息，下面是表的示例和我正在使用的代码，我坚持使用相应的if语句，如它似乎返回表的全部内容。表： … SPECIFIC TEXT What I want 我的脚本:(如果在表中找到SPECIFIC TEXT，它返回每个“div.c2 span.data”变量 – 所以我要么搞砸了我对do循环或if语句的了解） data = [] page.agent.get(url) page.search(‘div.row’).each do |row_data| if (row_data.search(‘div.c1:contains(“/SPECIFIC TEXT/”)’).text.strip temp = row_data.search(‘div.c2 span.data’).text.strip data << temp end end

使用带有Mechanize的登录表单: 我知道在Stackoverflow上有相似的post，但我似乎无法弄清楚我的尝试有什么问题。 # login to the site mech.get(base_URL) do |page| l = page.form_with(:action => “/site/login/”) do |f| username_field = f.field_with(:name => “LoginForm[username]”) username_field.value = userName password_field = f.field_with(:name => “LoginForm[password]”) password_field.value = password f.submit end end 这是我的错误： rb:18:in `block (2 levels) in ‘: undefined method `field_with’ for nil:NilClass (NoMethodError) 这是HTML Fields with * are required. Email […]

Ruby Mechanize Zlib :: BufError: 不知道为什么我现在使用Mechanize gem得到这个错误 – 现在已经使用了一段时间没有问题。我的脚本会随机停止并抛出以下错误： /Users/username/.rvm/gems/ruby-1.9.3-p194/gems/mechanize-2.5.1/lib/mechanize/http/agent.rb:798:in `rescue in response_content_encoding’: error handling content-encoding gzip: buffer error (Zlib::BufError) (Mechanize::Error) 有任何想法吗？

使用Ruby，Nokogiri和Mechanize在包含最多行的数组中查找表: @p = mechanize.get(url) tables = @p.search(‘table.someclass’) 我基本上翻了大约200页，将表放在一个数组中，唯一的排序方法是找到行数最多的表。所以我希望能够查看数组中的每个项目并选择行数最多的第一个项目。我一直在尝试使用max_by但这不起作用，因为我需要搜索作为数组项的表，以找到tr.count。