jump to navigation

How to Backup Linux Content to Amazon S3 Using s3cmd October 21, 2012

Posted by Tournas Dimitrios in Linux, Linux admin tools.

Amazon S3 provides a simple web services that can be used to store and retrieve any amount of data , at any time , from anywhere on the web . It gives any web-developer or administrator access to the same highly scalable , reliable , secure , fast , inexpensive (0.1$/GigaByte for the S3 service) infrastructure  that Amazon uses to run its own global network of web sites .Actually Amazon has a whole suite of services (IAAS , PAAS , SAAS ) on which admins and web-developers can rely upon . There are three ways to interact with these services :

  1. Web based AWS Management Console (your web-browser)
  2. Command line tools (interacting via terminal or scripts)
  3. An API for programmers (almost all programming languages have libraries to interact with )

From all available services ,  S3 is the simplest to use and targeted to a broad range of users . Possible uses could be : personal photo storage  , CDN for web-developers , media-library , for sysadmins (backing up their file-system) . This article will demonstrate a practical example of how this service can be used by Linux administrators  .  Just by installing a package , sysadmins (even with no programming knowledge)  can use their terminal or run a cron-job  to back-up / restore  critical file-system or application data .

S3cmd is a command line tool for uploading , retrieving and managing data into Amazon’s S3 Cloud storage service . It is best suited for power users who don’t fear command line . It is also ideal for scripts , automated backups triggered from cron , etc. This tool can be installed via your distributions package manager ( Yum , apt-get , homebrew –MacOs-) . Although I ‘ ll use CentOs to do the demonstration , the same concepts apply to all variants of the Linux operating system (even on MacOs) .
Before storing anything into S3 you must sign up for an “AWS” account (where AWS = Amazon Web Services) to obtain a pair of identifiers : Access Key and Secret Key. You will need to give these keys to S3cmd (only once during initial setup) . Think of them as if they were a username and password for your S3 account .

This article is targeted to Linux sysadmins , a future article will “satisfy”  web-developers . I ‘ll demonstrate how a PHP-library can be used to query Amazon’s S3 web-service (accomplishing identical functionality to s3cmd)

Prerequisites : The reader should already have an AWS  account , the process of registering for an developer account is simple . As aforementioned , using the terminal is a must , as S3cmd is a  command line back-up / restore tool . Writing Bash-scripts is only necessary if the administrator aims to automate some back-up tasks (cron jobs) .

Installing the tool should be done through your package manager (Yum , apt-get or homebrew) , as this process is only one line of code . For example , on CentOs this is the code :  yum  -y  s3cmd  install  .

After installing s3cmd , you must configure it with your S3 Amazon keys (Access and Secret Keys ) , which can be obtained from your account’s web-console .

Run s3cmd –configure and provide the keys to s3cmd . You’ll also be prompted to provide an encryption password and asked if you want to use HTTPS. The default is no , but I enable HTTPS for my use . You can optionally enter a GPG encryption key that will be used for encrypting your files before sending them to Amazon . This will protect your data against reading by Amazon staff or anyone who may has access to your Amazon’s S3 account .  I probably don’t need to say this , but just in case : both the above mentioned forms of encryption are independent on each other and serve a different purpose . While Https protects  only during transferring the data , GPG encryption protects your data against reading while they are stored into your Amazon S3 account .

The installation process is finished , let’s list some basic command

  • Run s3cmd ls to list all your buckets
  • Make a bucket with s3cmd mb s3://my-new-bucket-name
  • Upload a file into the bucket :
    s3cmd  put  filename.ini  s3://my-new-bucket-name
  • Upload only the content of a directory (recursively)
    s3cmd -r put  dirname/ s3://my-new-bucket-name
  • Upload the whole directory recursively (also the folder of the dir)
    s3cmd -r put dirname  s3://my-new-bucket-name 
  • list the bucket contents
    s3cmd  ls s3://my-new-bucket-name

In the above listed commands , if  “put” is replaced by “get” , the reverse process is accomplished ( put = upload , get= download) . Program s3cmd can transfer files to and from Amazon S3 in two basic modes : Unconditional transfer and Conditional transfer (all matching files are uploaded to S3 or only files that don’t exist at the destination in the same version are transferred ) . The put / get  commands are of the type “Unconditional”  (similar to a standard unix cp command — copies whatever it’s told ) . Another command , sync , is of the type “Conditional”  , a md5 checksum and file size is compared , if the file already exists on the destination (similar to a standard unix  rsync command ) . All three tasks ( put , get , sync) can be further customized by command-line options . Run a  man s3cmd  to read all the possible options . Of course the home page of s3cmd is a valuable place to read more about conditional and unconditional commands .

The only concerns I have for s3cmd is that it becomes really easy to start stuffing files into S3 for backup purposes . Watch your Amazon bill . While it’s really cheap to store files there , it will add up if you start getting into huge amount of data . Also remember that you are charged for requests as well ($o.o1 / 1000 put-list requests , $o.o1/10000 get requests) , it does take a while for those to add up though .



1. Frank - October 23, 2012

great post.

2. Wesley - November 13, 2012

hello very cool post man!

3. rohinichoudhary - November 28, 2015

Very very informative post. I found one another well explained instalation and usage of s3cmd post. Link http://jee-appy.blogspot.com/2015/11/how-to-install-s3cmd-to-manage-aws-s.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s