wget Command: Tutorial & Examples

Download files from the web

wget is a command-line utility for downloading files from the web. It supports various protocols such as HTTP, HTTPS, and FTP, and can be used to download files from websites, servers, and other resources on the internet.

Here's the basic syntax for using wget:

wget [options] URL

URL is the web address of the file you want to download. options are optional flags that you can use to modify the behavior of wget.

Here are a few examples of how you can use wget:

To download a file from a website, you can use the following command:

wget http://www.example.com/files/example.zip

To download a file and save it with a different name, you can use the -O flag:

wget -O new_name.zip http://www.example.com/files/example.zip

There are many other options available for wget, such as -c for continuing interrupted downloads, -t for specifying the number of retries, and -q for running wget in quiet mode.

One powerful option of wget is the ability to mirror a complete website. Here's the basic syntax:

wget --mirror -p --convert-links -P /path/to/local/directory http://www.example.com

Here's an explanation of the different flags used in this command:

--mirror: This flag tells wget to download the website in a way that replicates the directory structure of the original site.

-p: This flag tells wget to download the required files (such as images, CSS, and JavaScript) needed to properly display the website locally.

--convert-links: This flag tells wget to convert links in the downloaded HTML files to work locally, rather than pointing to the original website.

-P: This flag specifies the local directory where wget should save the mirrored website.

--no-parent: Do not ascend to the parent directory while recursively downloading files.

Keep in mind that mirroring a website can be a time-consuming process, especially if the site is large or has many links. You may want to use the -w flag to specify the amount of time wget should wait between requests, or the --random-wait flag to make wget wait a random amount of time between requests. This can help reduce the load on the server and avoid overloading it with requests.

wget --mirror -p --convert-links -w 2 --random-wait -P /path/to/local/directory https://www.example.com

If the process gets interrupted, you may want to restart it, skipping files which are already there. You can use a command like this:

wget -r --wait=1 --reject=*.bak --no-parent -nc --compression=auto --no-check-certificate https://www.example.com/dir

The important parameter is -nc which actually skips the already downloaded files. Other interesting parameters are --reject to exclude certain files, --no-parent to avoid crawling parent directories and --compression=auto to use efficient data transfer if the server supports it.

Except where otherwise noted, content on this site is licensed under a CC BY-SA 4.0 license CC BY SA