Heroku上用于登台服务器的不同robots.txt

我在Heroku上有登台和制作应用程序。

对于crawler，我设置了robots.txt文件。

之后我收到了谷歌的消息。

尊敬的网站管理员：您网站的主机名https://www.myapp.com/与您的SSL证书中的任何“主题名称”都不匹配，它们是：
* .herokuapp.com
herokuapp.com

Google机器人在我的暂存应用上阅读robots.txt并发送此消息。因为我没有设置任何阻止抓取工具读取文件的内容。

所以，我正在考虑的是在暂存和生产之间更改.gitignore文件，但我无法弄清楚如何执行此操作。

实现这个的最佳实践是什么？

编辑

我搜索了这篇文章并发现了这篇文章http://goo.gl/2ZHal

本文说要设置基本的Rack身份validation，您不需要关心robots.txt。

我不知道基本的auth可以阻止谷歌机器人。看起来这个解决方案更适合操纵.gitignore文件。

如何使用控制器操作而不是静态文件动态地提供/robots.txt ？根据您允许或禁止搜索引擎索引应用程序的环境。

使用Rails 3的一个很好的解决方案是使用Rack。这是一篇很好的post，概述了这个过程：使用Rack提供不同的Robots.txt 。总而言之，您将其添加到routes.rb：

  # config/routes.rb require 'robots_generator' # Rails 3 does not autoload files in lib match "/robots.txt" => RobotsGenerator

然后在lib / robots_generator.rb中创建一个新文件

 # lib/robots_generator.rb class RobotsGenerator # Use the config/robots.txt in production. # Disallow everything for all other environments. # http://avandamiri.com/2011/10/11/serving-different-robots-using-rack.html def self.call(env) body = if Rails.env.production? File.read Rails.root.join('config', 'robots.txt') else "User-agent: *\nDisallow: /" end # Heroku can cache content for free using Varnish. headers = { 'Cache-Control' => "public, max-age=#{1.month.seconds.to_i}" } [200, headers, [body]] rescue Errno::ENOENT [404, {}, ['# A robots.txt is not configured']] end end

最后，确保将move robots.txt包含到配置文件夹中（或者在RobotsGenerator类中指定的任何位置）。

Heroku上用于登台服务器的不同robots.txt

如何在不泄露应用程序的密钥和凭据的情况下打开源我的Rails应用程序

将db / schema.rb放到.gitignore列表中是个好主意吗？