Learn how to create archives using tar.
- [Instructor] Hi. Welcome to this section, The Backup Plan. In the previous section, we learned how to interact with the web using shell scripts to automate tasks, such as collecting and passing data from web pages. In this section, we'll learn archiving with tar and cpio commands. We'll then move on to archiving and compressing with Zip, and do faster archiving with pbzip2. Also, we'll create filesystems with compression and backup snapshots with rsync. Finally, we'll look into version control-based backup with Git, and then create entire disk images with fsarchiver.
Now we move on to the first video of this section, that deals with archiving with the tar command. In this video, we're going to use tar to create archives and perform operations on existing archives. The tar command can be used to archive files originally designed for storing data on tape archives. It allows you to store multiple files and directories as a single file, while retaining all the file attributes, such as owner, permissions, and so on. The file created by the tar command is often referred to as a tarball. The tar command comes, by default, with all Unix-like operating systems.
It has simple syntax, and it is a portable file format. It supports these arguments: capital A, C, D, R, T, U, X, F, and V. Each of these options can be used independently for different purposes corresponding to it. We can use tar to create archives and perform operations on existing archives. Let's see how to do this. Now, to archive files with tar, use this syntax. Let's now list files in an archive, so here we use the hyphen T option.
Now, in order to print more details while we're archiving or listing, use the minus V or minus V V flag. This feature is called verbose, i.e. V, which, for most of the commands, will turn on printing more details on the terminal. For example, using verbose, you could print more details such as file permissions, owner group, modification date, and so on. You should note that the file name must appear immediately after you minus F, and it should be the last option in the argument. For example, if you want verbose output, you should use these options.
So, in this command, C stands for create file, and F stands for specify file name. We can specify folders and file names as sources. Also, we can use a list of file names or wild cards, such as Stardock text, to specify the sources. When finished, tar will archive these source files into a file called output.tar. We cannot pass hundreds of files or folders as a command line argument because there's a limit. So it's safer to use the append option if many files are to be archived.
Let's go through additional features that are available with the tar command. So let's talk about appending files to an archive. Sometimes we may need to add files to an archive that already exists. We can use the append option, minus R, for this. In order to append a file into an already existing archive, use this command. Let's create an archive with one text file in it. To list all the files present in the archive, use, Now, add another file to the archive, and list its contents again.
The archive now contains both files. Next, let's explore about extracting files and folders from an archive. The following command extracts the contents of the archive to the current directory. The minus X option stands for extract. When minus X is used, the tar command extracts the contents of the tar directory to the current directory. We can also specify the directory where the files need to be extracted by using the minus capital C flag in this way. So you need to mention the path to your directory as this.
The command extracts the contents of an archive to a specific directory. It extracts the entire contents of the archive. We can also extract only a few files by specifying them as command arguments. Oops, we gave a wrong filename, so it got this error. Let's change it and run the command again. This command extracts only file1 and file4 and ignores the other files in the archive. While archiving, we can specify "stdout" as the output file, so another command appearing through your pipe can read it as "stdin" and then do some process or extract in the archive.
This is very helpful in order to transfer data through a secure shell, i.e. SSH connection, while we're on network. Since we have not created this directory and file, this error is generated. For example, look at this command. Here, the directory file is added to a tar archive, which is output two's stdout, directed by a hyphen. Next we talk about concatenating two archives. We can easily manage multiple tar files with the minus capital A option. Let's pretend that we have two tarballs, file1.tar and file2.tar.
We can merge the contents of file2.tar to file1.tar in this way. We can now verify it by listing the contents. Let's now look into updating files in an archive with a timestamp check. The append option appends any given file to the archive. If the same file is inside the archive that's given to append, it would append that file, and the archive would contain duplicates.
We can use the update option, minus U, to specify only append files that are newer than the file inside the archive with the same name, as we do here. This command lists the files in the archive. To append hello.txt, only if hello.txt has been modified since the last time it was added to the archive.tar. So here we use this command. Nothing happens if the version of hello.txt outside the archive and the hello.txt inside the archive.tar have the same timestamp. Here we use the touch command to modify the file timestamp, and then try the tar command again in this way.
The file is appended since its timestamp is newer than the one inside the archive. Let's verify this. As you can notice, a new hello.txt has been appended to the tar archive. While extracting this archive, tar will pick up the latest version of the file hello.txt. Next, we will move on to comparing files in the archiving file system. Sometimes it's useful to know whether the files in the archive and the files with the same filename in the file system, are the same or contain any differences. The minus D flag can be used to print the differences, as we did it here.
We now move on to the next task, that is deleting files from the archive. We can remove files from a given archive using the minus delete option. For example, see this command. Let's see an example. Now let's delete hello.txt. Let's now dive in to compression with the tar archive. The tar command only archives files. It does not compress them. For this reason, most people usually add some form of a compression when working with tarballs.
This can significantly decrease the size of the files. Tarballs are often compressed into one of these formats. Different tar flags are used to specify different compression formats. F minus J for bzip2, F minus Z for gzip, F, dash, dash, L-Z-M-A for lzma. They're explained in the coming compression-specific videos. It's possible to use compression formats without explicitly specifying special options as above.
tar can compress by looking at the given extension, or the output, or input file names. In order for tar to support compression automatically by looking at the extensions, use A minus A, or dash dash auto, dash compress, with tar as this.
Next is excluding a set of files from archiving. It's possible to exclude a set of files from archiving by specific patterns. For example, to exclude all dot-text files from archiving, we can use this syntax. Note that the pattern should be enclosed within quotes to prevent the shell from expanding it. It's also possible to exclude a list of files provided in a list file, with the minus capital X flag as here. Now it excludes file1 and file3 from archiving.
Let's move ahead and discuss about excluding version control directories. We usually use tarballs for distributing source code. In general, most source code is maintained using version control systems, such as Subversion, Git, Mercurial, CDS, and so on. Code directories under a vision control will contain special directories used to manage versions like dot SVN OO, dot git. However, these directories aren't needed by the code itself, and so should be eliminated from the tarball or the source code.
In order to exclude version control-related files and directories while archiving, use the minus minus exclude minus vcs option along with tar. For example, we used this syntax here, and we provided source_code.tar.qz file. Finally, let's end with printing total bytes. It is sometimes useful if we can print total bytes copied to the archive. To print the total bytes copied after archiving, use the minus minus totals option in this way.
Superb. In this video, we've learned how to archive by using the tar command. We saw it's used to create archives and perform operations on existing archives. In the next video, in a similar way, we'll look into archiving with cpio.
Note: This course was created by Packt Publishing. We are pleased to host this training in our library.
- Printing in the terminal
- Performing math in the Linux shell
- Getting and setting dates
- Working with functions and arguments
- Reading output
- Making comparisons
- Concatenating text
- Finding, editing, generating, and deleting files
- Running parallel processes
- Using regular expressions
- Downloading webpages
- Parsing data from a website
- Finding broken links
- Backing up and archiving
- Transferring files and data through the network
- Monitoring your Linux system
- Gathering data for system administration