大括号的URL编码问题

我在从GitHub Archive获取数据时遇到问题。

主要问题是我在URL中编码{}和..问题。也许我误读了Github API或者没有正确理解编码。

 require 'open-uri' require 'faraday' conn = Faraday.new(:url => 'http://data.githubarchive.org/') do |faraday| faraday.request :url_encoded # form-encode POST params faraday.response :logger # log requests to STDOUT faraday.adapter Faraday.default_adapter # make requests with Net::HTTP end #query = '2015-01-01-15.json.gz' #this one works!! query = '2015-01-01-{0..23}.json.gz' #this one doesn't work encoded_query = URI.encode(query) response = conn.get(encoded_query) p response.body

用于检索一系列文件的GitHub Archive示例如下：

 wget http://data.githubarchive.org/2015-01-01-{0..23}.json.gz

{0..23}部分被wget本身解释为0的范围{0..23} 。您可以通过使用-v标志执行该命令来测试它，该标志返回：

 wget -v http://data.githubarchive.org/2015-01-01-{0..1}.json.gz --2015-06-11 13:31:07-- http://data.githubarchive.org/2015-01-01-0.json.gz Resolving data.githubarchive.org... 74.125.25.128, 2607:f8b0:400e:c03::80 Connecting to data.githubarchive.org|74.125.25.128|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2615399 (2.5M) [application/x-gzip] Saving to: '2015-01-01-0.json.gz' 2015-01-01-0.json.gz 100%[===========================================================================================================================================>] 2.49M 3.03MB/s in 0.8s 2015-06-11 13:31:09 (3.03 MB/s) - '2015-01-01-0.json.gz' saved [2615399/2615399] --2015-06-11 13:31:09-- http://data.githubarchive.org/2015-01-01-1.json.gz Reusing existing connection to data.githubarchive.org:80. HTTP request sent, awaiting response... 200 OK Length: 2535599 (2.4M) [application/x-gzip] Saving to: '2015-01-01-1.json.gz' 2015-01-01-1.json.gz 100%[===========================================================================================================================================>] 2.42M 867KB/s in 2.9s 2015-06-11 13:31:11 (867 KB/s) - '2015-01-01-1.json.gz' saved [2535599/2535599] FINISHED --2015-06-11 13:31:11-- Total wall clock time: 4.3s Downloaded: 2 files, 4.9M in 3.7s (1.33 MB/s)

换句话说，wget将值替换为URL，然后获取该新URL。这不是明显的行为，也没有很好的记录，但你可以在那里找到它的提及。例如在“ 你应该知道的所有Wget命令 ”中：

 7. Download a list of sequentially numbered files from a server wget http://example.com/images/{1..20}.jpg

要做你想做的事，你需要使用类似这个未经测试的代码迭代Ruby中的范围：

 0.upto(23) do |i| response = conn.get("/2015-01-01-#{ i }.json.gz") p response.body end

为了更好地了解出现了什么问题，让我们从GitHub文档中给出的示例开始：

 wget http://data.githubarchive.org/2015-01-01-{0..23}.json.gz

这里需要注意的是， {0..23}会被bash自动扩展。您可以通过运行以下命令来查看：

 echo {0..23} > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

这意味着wget不会被调用一次，而是被调用总共24次。你遇到的问题是Ruby不像bash那样自动扩展{0..23} ，而是你在http://data.githubarchive.org/2015-01-01-{0..23}.json.gz进行字面调用。 http://data.githubarchive.org/2015-01-01-{0..23}.json.gz ，不存在。

相反，你需要自己循环0..23并每次拨打一个电话：

 (0..23).each do |n| query = "2015-01-01-#{n}.json.gz" encoded_query = URI.encode(query) response = conn.get(encoded_query) p response.body end

大括号的URL编码问题

如何使用“github_api”gem从GitHub API v3获得100多个结果？

Github API使用OAuth访问私有存储库

我可以通过GH API在两次提交之间获取一个文件的差异数据吗？

使用rails oauth插件访问github API

Github API响应“内容无效Base64”