wget Command: Tutorial & Examples
Download files from the web
wget
is a command-line utility for downloading files from the web. It supports various protocols such as HTTP, HTTPS,
and FTP, and can be used to download files from websites, servers, and other resources on the internet.
Here's the basic syntax for using wget:
wget [options] URL
URL
is the web address of the file you want to download. options
are optional flags that you can use to modify the
behavior of wget
.
Here are a few examples of how you can use wget
:
To download a file from a website, you can use the following command:
wget http://www.example.com/files/example.zip
To download a file and save it with a different name, you can use the -O
flag:
wget -O new_name.zip http://www.example.com/files/example.zip
There are many other options available for wget
, such as -c
for continuing interrupted downloads, -t
for
specifying the number of retries, and -q
for running wget in quiet mode.
One powerful option of wget
is the ability to mirror a complete website.
Here's the basic syntax:
wget --mirror -p --convert-links -P /path/to/local/directory http://www.example.com
Here's an explanation of the different flags used in this command:
--mirror
: This flag tells wget
to download the website in a way that replicates the directory structure of the
original site.
-p
: This flag tells wget
to download the required files (such as images, CSS, and JavaScript) needed to properly
display the website locally.
--convert-links
: This flag tells wget
to convert links in the downloaded HTML files to work locally, rather than
pointing to the original website.
-P
: This flag specifies the local directory where wget
should save the mirrored website.
--no-parent
: Do not ascend to the parent directory while recursively downloading files.
Keep in mind that mirroring a website can be a time-consuming process, especially if the site is large or has many
links. You may want to use the -w
flag to specify the amount of time wget
should wait between requests, or
the --random-wait
flag to make wget
wait a random amount of time between requests. This can help reduce the load on
the server and avoid overloading it with requests.
wget --mirror -p --convert-links -w 2 --random-wait -P /path/to/local/directory https://www.example.com
If the process gets interrupted, you may want to restart it, skipping files which are already there. You can use a command like this:
wget -r --wait=1 --reject=*.bak --no-parent -nc --compression=auto --no-check-certificate https://www.example.com/dir
The important parameter is -nc
which actually skips the already downloaded files. Other interesting parameters
are --reject
to exclude certain files, --no-parent
to avoid crawling parent directories and --compression=auto
to
use efficient data transfer if the server supports it.