How to Backup Linux Content to Amazon S3 Using s3cmd October 21, 2012Posted by Tournas Dimitrios in Linux, Linux admin tools.
Amazon S3 provides a simple web services that can be used to store and retrieve any amount of data , at any time , from anywhere on the web . It gives any web-developer or administrator access to the same highly scalable , reliable , secure , fast , inexpensive (0.1$/GigaByte for the S3 service) infrastructure that Amazon uses to run its own global network of web sites .Actually Amazon has a whole suite of services (IAAS , PAAS , SAAS ) on which admins and web-developers can rely upon . There are three ways to interact with these services :
- Web based AWS Management Console (your web-browser)
- Command line tools (interacting via terminal or scripts)
- An API for programmers (almost all programming languages have libraries to interact with )
From all available services , S3 is the simplest to use and targeted to a broad range of users . Possible uses could be : personal photo storage , CDN for web-developers , media-library , for sysadmins (backing up their file-system) . This article will demonstrate a practical example of how this service can be used by Linux administrators . Just by installing a package , sysadmins (even with no programming knowledge) can use their terminal or run a cron-job to back-up / restore critical file-system or application data .
S3cmd is a command line tool for uploading , retrieving and managing data into Amazon’s S3 Cloud storage service . It is best suited for power users who don’t fear command line . It is also ideal for scripts , automated backups triggered from cron , etc. This tool can be installed via your distributions package manager ( Yum , apt-get , homebrew –MacOs-) . Although I ‘ ll use CentOs to do the demonstration , the same concepts apply to all variants of the Linux operating system (even on MacOs) .
Before storing anything into S3 you must sign up for an “AWS” account (where AWS = Amazon Web Services) to obtain a pair of identifiers : Access Key and Secret Key. You will need to give these keys to S3cmd (only once during initial setup) . Think of them as if they were a username and password for your S3 account .
This article is targeted to Linux sysadmins , a future article will “satisfy” web-developers . I ‘ll demonstrate how a PHP-library can be used to query Amazon’s S3 web-service (accomplishing identical functionality to s3cmd)
Prerequisites : The reader should already have an AWS account , the process of registering for an developer account is simple . As aforementioned , using the terminal is a must , as S3cmd is a command line back-up / restore tool . Writing Bash-scripts is only necessary if the administrator aims to automate some back-up tasks (cron jobs) .
Installing the tool should be done through your package manager (Yum , apt-get or homebrew) , as this process is only one line of code . For example , on CentOs this is the code : yum -y s3cmd install .
After installing s3cmd , you must configure it with your S3 Amazon keys (Access and Secret Keys ) , which can be obtained from your account’s web-console .
Run s3cmd –configure and provide the keys to s3cmd . You’ll also be prompted to provide an encryption password and asked if you want to use HTTPS. The default is no , but I enable HTTPS for my use . You can optionally enter a GPG encryption key that will be used for encrypting your files before sending them to Amazon . This will protect your data against reading by Amazon staff or anyone who may has access to your Amazon’s S3 account . I probably don’t need to say this , but just in case : both the above mentioned forms of encryption are independent on each other and serve a different purpose . While Https protects only during transferring the data , GPG encryption protects your data against reading while they are stored into your Amazon S3 account .
- Run s3cmd ls to list all your buckets
- Make a bucket with s3cmd mb s3://my-new-bucket-name
- Upload a file into the bucket :
s3cmd put filename.ini s3://my-new-bucket-name
- Upload only the content of a directory (recursively)
s3cmd -r put dirname/ s3://my-new-bucket-name
- Upload the whole directory recursively (also the folder of the dir)
s3cmd -r put dirname s3://my-new-bucket-name
- list the bucket contents
s3cmd ls s3://my-new-bucket-name
In the above listed commands , if “put” is replaced by “get” , the reverse process is accomplished ( put = upload , get= download) . Program s3cmd can transfer files to and from Amazon S3 in two basic modes : Unconditional transfer and Conditional transfer (all matching files are uploaded to S3 or only files that don’t exist at the destination in the same version are transferred ) . The put / get commands are of the type “Unconditional” (similar to a standard unix cp command — copies whatever it’s told ) . Another command , sync , is of the type “Conditional” , a md5 checksum and file size is compared , if the file already exists on the destination (similar to a standard unix rsync command ) . All three tasks ( put , get , sync) can be further customized by command-line options . Run a man s3cmd to read all the possible options . Of course the home page of s3cmd is a valuable place to read more about conditional and unconditional commands .
The only concerns I have for s3cmd is that it becomes really easy to start stuffing files into S3 for backup purposes . Watch your Amazon bill . While it’s really cheap to store files there , it will add up if you start getting into huge amount of data . Also remember that you are charged for requests as well ($o.o1 / 1000 put-list requests , $o.o1/10000 get requests) , it does take a while for those to add up though .