Imagine that you go on a road trip with some friends. One of your friends has a video camera and films the best moments of the trip. After the trip, your friend edits the footage and creates a movie with the very best moments. Your friend could then burn a CD (or a DVD) containing the movie, and send it to you via post-mail. However, that solution involves:
- wasting time to go to the mail office.
- spending money on CD’s, stamps, and envelopes.
We could do better than that: why not compress the movie into one or more ZIP files and upload them to a hosting website such as Rapidshare? Bandwidth is cheap these days, and downloading involves only a few mouseclicks. Your friend uploads the ZIP files to a hosting website and sends you the list of URL’s via email, including the size of each file next to each URL. Let us assume that:
- in this hosting website, you need to wait 1 minute for each MB you downloaded in the previous file download.
- you need to download all ZIP files before you can unzip them and watch the movie your friend created.
You then think: “in which order should I download the files”? This is a decision problem under certainty. Let us try to model it!
Let us say that we have a total of files to download, where . We consider the following variables and parameters (and respective units):
- : size of the -th file (unit: Megabytes).
- : time it takes to download the -th file (unit: seconds).
- : time one needs to wait after downloading the -th file (unit: seconds).
- : downloading speed (unit: Megabytes per second)
for all , where . The time it takes to download the -th file is
and the time one needs to wait after downloading the -th file is
The total downloading time is the sum of each file’s downloading time plus the time one needs to wait after downloading each file. Note that after we download the last file, we can unzip the files and obtain the full-length movie. Therefore, when counting the total downloading time, we do not need to include the waiting time of the last downloaded file. Let us denote by the index of the last downloaded file. Thus, the total downloading time is
which can also be written as
or, then, as follows
Optimal Downloading Strategies
Suppose that our objective is to minimize the total downloading time . Let us introduce the cost function defined by
Note that our decision variable is , i.e., our goal is to find the value of that minimizes the total downloading time . Note also that:
- is the “fixed cost”.
- is the “variable cost”.
We cannot “control” the fixed cost, but we can try to minimize the variable cost. The minimal cost is thus
Hence, we can conclude that the optimal is
i.e., we should leave the biggest of all files for last! This is quite intuitive. Given that
- the downloading speed, , is assumed to be constant.
- we need to download all the files.
we can conclude that the only way to minimize the total downloading time is to minimize the total waiting time, and the way to do that is to download the biggest file at the end.
Alternatively, we could have used the following cost function
where cost equals waiting time (i.e., we ignore the fixed cost). The optimal solution will be the same as before
and the only difference will be that the minimum cost will now be the minimum waiting time, not the minimum total downloading time.
The solution we have computed only tells us which file to download in the -th download. The solution tells us nothing on which files to download on the previous downloads. We can then conclude that:
- there are distinct ways of downloading the files.
- out of those ways, of them are optimal.
If all the files have the same size (i.e., if ), then there are optimal ways of downloading them. In other words, when all files have the same size, the order in which one downloads them is irrelevant, which is somewhat obvious.
This problem could perhaps be generalized. Any ideas?
DISCLAIMER: please note that downloading copyrighted material is illegal under U.S. law. I do not advocate downloading copyrighted material.
Tags: Operations Research