How to discover direct download URL?

Let’s say we need a link to file placed somewhere on the server in the Internet, but not just to copy it and paste in the browser. Example case can be like this: write a program that downloads files from a service or save the link and provide it somewhere else. Basically our goal is to automate some things and skip human interaction.

Most of the time resources like files are handled and managed by different services. This means that sometimes we aren’t given direct access to the file because it’s controlled by application. Reasons are often simple: authorization, increase download count, show ads, prepare file to download, prevent hotlinking, it’s all server-side processing in general.


We will use GitHub service as an example. Repositories that are hosted there can be downloaded as a ZIP file. I imagine GitHub doesn’t store all files of all repositories at once, but generates them on demand and delete later to save space. With Chrome Dev Tools (Network tab) we can see what is happening after clicking the Download ZIP button.


Actually two requests was made. First to was redirected (HTTP status 302) to real file address: In response headers of the first one we can check redirect location:


We can repeat the request in terminal by using curl tool. By right-clicking request in Dev Tools (Copy -> Copy as cURL) we have command ready to paste in the terminal. After doing it in this case will see this:

<html><body>You are being <a href="">redirected</a>.</body></html>

We can skip all given headers in previous command for readability and add -I option at the end to show document info:

$ curl -I
HTTP/1.1 302 Found
Date: Sun, 19 Mar 2017 14:36:50 GMT
Content-Type: text/html; charset=utf-8
Status: 302 Found
Cache-Control: no-cache
Vary: X-PJAX

Adding -L option will cause redirect follow:

$ curl -I -L

HTTP/1.1 200 OK
Content-Length: 518982
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
ETag: "4e10a37c09e7f2f808c0aed1bba92b4c86d3d5fd"
Content-Type: application/zip
Content-Disposition: attachment;

Here we can see some file properties like: name, size, type. We can also retrieve only specific header by adding pipeline with grep:

$ curl -I | grep -Fi Location

To download the file use L – follow redirect, O – write output to file:

$ curl -L -O

Code example

To achieve above things I will use Ruby:

What I did was actually using the same curl commands but executed by Ruby. Surrounding text with backticks (`…`) is one way to run system commands from the code in Ruby. This method allows to capture output string and assign it to variable. It’s not recommended in this case because we can’t be sure curl will be available on every system this code is executed in.

This task is simple with Net::HTTP, built-in Ruby http client:

Now we can do whatever we want with link string from response['Location']. This method isn’t guaranteed to work with every service, but it’s enough for simple cases like this. I didn’t show how to download files from Ruby because this post isn’t about it and actually I was never doing it so maybe there will be another post about it.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s