Rails:带有erb的动态robots.txt

我正在尝试在我的Rails(3.0.10)应用程序中呈现动态文本文件(robots.txt),但它继续将其呈现为HTML(称为控制台)。

match 'robots.txt' => 'sites#robots' 

控制器:

 class SitesController < ApplicationController respond_to :html, :js, :xml, :css, :txt def robots @site = Site.find_by_subdomain # blah blah end end 

应用程序/视图/网站/ robots.txt.erb:

 Sitemap: /sitemap.xml 

但是当我访问http://www.example.com/robots.txt我得到一个空白页面/来源,日志显示:

 Started GET "/robots.txt" for 127.0.0.1 at 2011-11-21 11:22:13 -0500 Processing by SitesController#robots as HTML Site Load (0.4ms) SELECT `sites`.* FROM `sites` WHERE (`sites`.`subdomain` = 'blah') ORDER BY created_at DESC LIMIT 1 Completed 406 Not Acceptable in 828ms 

知道我做错了什么吗?

注意:我将此添加到config / initializers / mime_types,因为Rails抱怨不知道.txt mime类型是什么:

 Mime::Type.register_alias "text/plain", :txt 

注意2:我确实从公共目录中删除了stock robots.txt。

我认为问题是,如果在控制器中定义respond_to ,则必须在操作中使用respond_with

 def robots @site = Site.find_by_subdomain # blah blah respond_with @site end 

另外,尝试显式指定要呈现的.erb文件:

 def robots @site = Site.find_by_subdomain # blah blah render 'sites/robots.txt.erb' respond_with @site end 

注意:这是来自coderwall的转贴

阅读Stackoverflow上类似答案的一些建议 ,我目前使用以下解决方案根据请求的主机参数呈现动态robots.txt。

路由

 # config/routes.rb # # Dynamic robots.txt get 'robots.:format' => 'robots#index' 

调节器

 # app/controllers/robots_controller.rb class RobotsController < ApplicationController # No layout layout false # Render a robots.txt file based on whether the request # is performed against a canonical url or not # Prevent robots from indexing content served via a CDN twice def index if canonical_host? render 'allow' else render 'disallow' end end private def canonical_host? request.host =~ /plugingeek\.com/ end end 

查看

基于request.host我们呈现两个不同的.text.erb视图文件之一。

允许机器人

 # app/views/robots/allow.text.erb # Note the .text extension # Allow robots to index the entire site except some specified routes # rendered when site is visited with the default hostname # http://www.robotstxt.org/ # ALLOW ROBOTS User-agent: * Disallow: 

禁止蜘蛛

 # app/views/robots/disallow.text.erb # Note the .text extension # Disallow robots to index any page on the site # rendered when robot is visiting the site # via the Cloudfront CDN URL # to prevent duplicate indexing # and search results referencing the Cloudfront URL # DISALLOW ROBOTS User-agent: * Disallow: / 

眼镜

使用RSpec和Capybara测试设置也非常容易。

 # spec/features/robots_spec.rb require 'spec_helper' feature "Robots" do context "canonical host" do scenario "allow robots to index the site" do Capybara.app_host = 'http://www.plugingeek.com' visit '/robots.txt' Capybara.app_host = nil expect(page).to have_content('# ALLOW ROBOTS') expect(page).to have_content('User-agent: *') expect(page).to have_content('Disallow:') expect(page).to have_no_content('Disallow: /') end end context "non-canonical host" do scenario "deny robots to index the site" do visit '/robots.txt' expect(page).to have_content('# DISALLOW ROBOTS') expect(page).to have_content('User-agent: *') expect(page).to have_content('Disallow: /') end end end # This would be the resulting docs # Robots # canonical host # allow robots to index the site # non-canonical host # deny robots to index the site 

最后一步,您可能需要删除公用文件夹中的静态public/robots.txt (如果它仍然存在)。

希望这个对你有帮助。 随意发表评论,帮助进一步改进这项技术。

在Rails 3.2.3中工作的一个解决方案(不确定3.0.10)如下:

1)为模板文件命名robots.text.erb #强调texttxt

2)像这样设置您的路线: match '/robots.:format' => 'sites#robots'

3)保持原样(您可以删除控制器中的respond_with)

 def robots @site = Site.find_by_subdomain # blah blah end 

此解决方案还消除了在接受的答案中提到的render调用中显式指定txt.erb的需要。

我不喜欢robots.txt到达我的Web服务器的想法。

如果您使用Nginx / Apache作为反向代理,那么静态文件的处理速度要快于达到rails本身的请求。

这更清洁,我认为这也更快。

尝试使用以下设置。

nginx.conf – 用于生产

 location /robots.txt { alias /path-to-your-rails-public-directory/production-robots.txt; } 

nginx.conf – 用于舞台

 location /robots.txt { alias /path-to-your-rails-public-directory/stage-robots.txt; } 

对于我的rails项目,我通常有一个单独的控制器用于robots.txt响应

 class RobotsController < ApplicationController layout nil def index host = request.host if host == 'lawc.at' then #liveserver render 'allow.txt', :content_type => "text/plain" else #testserver render 'disallow.txt', :content_type => "text/plain" end end end 

然后我有一个名为: disallow.txt.erballow.txt.erb

在我的routes.rb我有

 get "robots.txt" => 'robots#index'