Amazon S3 Sync

SprightlySoft has released an Amazon S3 synchronization application. This application allows you to take a folder on your computer and upload it to S3. You can make additions, deletions and changes to your local files and next time you run the application it will detect these changes and apply them to Amazon S3. This program allows you to create a mirror of a local folder on S3 and always keep it up to date.

Amazon Simple Storage Service (Amazon S3) is a service that allows you to store files in Amazon’s cloud computing environment. When your files are in Amazon’s system you can retrieve the files from anywhere on the web. You can also use Amazon’s CloudFront service in conjunction with S3 to distribute your files to millions of people. Amazon S3 is the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. Best of all Amazon S3 is free for the first 5 GB of storage.

S3 Sync is open source and completely free to use. Visit http://sprightlysoft.com/S3Sync/ to download the application or get the source code.

The following gives you a rundown of how the source code works. This information is useful if you want to learn how the program works so you can make changes to it.

S3 Sync works by listing your files on Amazon S3, listing your files locally and comparing the differences between the two lists. Here is the logic of the program.

  1. The code starts by getting a list of all user settings. These include your Amazon AWS credentials, the S3 bucket you are syncing to, and the local folder you are syncing from.
  2. The next major call is the PopulateS3HashTable function. Here all the files in your Amazon S3 bucket is listed. The code calls the ListBucket function from the SprightlySoft AWS Component. This function returns an ArrayList of objects that represent each item in S3. Properties of each object include S3 key name, size, date modified and ETag. These objects are added to a HashTable so it can be used later on.
  3. The next call is the PopulateLocalArrayList. This function lists all local files and folders you want to synchronize. Each file and folder is added to an ArrayList so it can be used later on. The function has the option to only include certain files or exclude certain files. The function is recursive. That means it calls itself for each sub folder.
  4. Next the program creates a list of files that should be deleted on Amazom S3. This is done through the PopulateDeleteS3ArrayList function. This function goes through each item in the list of file on S3 and checks if they exist in the list of local items. If the S3 item does not exist locally, the item is added to the DeleteS3ArrayList.
  5. Next the program finds which local files do not exist on S3. This is done through the PopulateUploadDictionary function. This function goes through the local ArrayList and check if each items exists in the S3 HashTable. The item may exist on S3 and locally but the local file may have different content. To determine is a file is the same between S3 and locally the program has an option to compare files by ETag. An ETag is an identifier based on the content of a file. If the file changes the ETag changes. Amazon stores the Etag of each file you upload. When you list files on S3 the ETag for each file is returned. If you choose to compare by ETag the program will calculate the ETag of the local file and check if it matches the ETag returned by Amazon. Any files that don’t match are added to a Dictionary of items that need to be uploaded to S3.

Now we have a list of files that need to be deleted on S3 and a list of files that need to be uploaded to S3. The program has an option to list these changes or go ahead and apply these changes to S3.

If we are deleting files on S3 the DeleteExtraOnS3 function is called. This function goes through the DeleteS3ArrayList and deleted each file in it. This is done by calling the MakeS3Request function. This function uses the SprightlySoft AWS Component to send the appropriate command to S3. For more information about using the AWS Component see the documentation included with the source code and read the Amazon S3 API Reference documentation from Amazon. The function has the ability to retry the command if it fails.

Finally the program calls UploadMissingToS3 function to upload files from the UploadDictionary to S3. Here the program sets header information such as Content-MD5, Content-Type, and metadata to store the locate file’s timestamp in S3. It then calls the UploadFileToS3 function which is very similar to the MakeS3Request function. The difference is UploadFileToS3 function has parameters that are only relevant to uploading a file. The program uses a variable called MyUpload which is a SprightlySoft AWS Component object. This object raises an event whenever the progress on an upload changes. The program hooks into this progress event and shows the progress of the upload while it is taking place. This is done in the MyUpload_ProgressChangedEvent function.

When the program completes it has an option to send log information through an email. This is useful if you run the program as a scheduled task and you want to be notified if there is a failure.

Amazon S3’s Reduced Redundancy Storage

Amazon has released a new feature that many of you will be taking advantage of.  It’s called Reduced Redundancy Storage.  Amazon typically stores your files on multiple devices at multiple data centers.  When you store a file with the reduced redundancy storage option, the file is still stored on multiple devices at multiple data centers but fewer times.  The benefit of this is a lower price to store your files.  It costs 33% less to store files with reduced redundancy.  The drawback is that Amazon may lose one of your files.  For this reason you should only use this option if the file you are storing can be regenerated or restored from another location.

A typical use for reduced redundancy would be for the storage of programmatically generated thumbnails of a picture.  Suppose a user uploads a picture to your site.  You generate a thumbnail of the picture and store the original and the thumbnail on Amazon S3.  You can choose to store the original with the standard storage class and the thumbnail with the reduced redundancy storage class.  If the thumbnail is ever lost you can regenerate it from the original.

Amazon calculates reduced redundancy storage is 99.99% durable as opposed to standard storage’s 99.999999999% durability.  Amazon also calculates reduced redundancy storage provides 400 times the durability of a typical disk drive.

You can choose to make a file reduced redundancy when you upload it. Setting a special request header in the upload tells Amazon you want to use reduced redundancy for that file. The code snippet below shows how to upload a file with the SprightlySoft S3 Component for .NET and make it reduced redundancy.

Dictionary<String, String> ExtraRequestHeaders = new Dictionary<String, String>();
ExtraRequestHeaders.Add("x-amz-storage-class", "REDUCED_REDUNDANCY");

SprightlySoftS3.Upload MyUpload = new SprightlySoftS3.Upload();
MyUpload.UploadFile("AWSAccessKeyId", "AWSSecretAccessKey", "BucketName", "KeyName", "FilePath", ExtraRequestHeaders);

You can tell which files are using reduced redundancy when you list a bucket.  The StorageClass value will be STANDARD or REDUCED_REDUNDANCY.

You can configure Amazon to send you a notification through Amazon’s Simple Notification Service in the event that Amazon loses one of your reduced redundancy files.  First you set up a Simple Notification Service topic that will load a web page, send an email or add an Amazon Simple Queue Service item when it is triggered.  Next you configure your bucket to trigger the topic when the ReducedRedundancyLostObject event occurs.  Say your application stores images and thumbnails.  Create a web page to regenerate the thumbnail from the original.  Set up a Simple Notification Service topic to call your web page what it runs.  Then configure your bucket to call the topic if it loses a file.It’s a bit more work but it can pay for itself it you store many files with the reduced redundancy storage option.

To see examples of working with reduced redundancy files in C# or VB.NET try the SimpleExamples project included with the SprightlySoft S3 Component for .NET.  http://sprightlysoft.com/

Why Amazon S3 Is Great

Welcome to The SprightlySoft blog.  This article will give an overview of what the Amazon S3 service is and why it’s great. 

S3 stands for Simple Storage Service.  It’s a file repository that makes files available through the Internet.  You upload your files to Amazon and you can share them with anyone in the world.   Amazon originally created S3 for their own use.  They had some unique challenges that required them to build a system that can accommodate many users. 

Think about Oprah’s book club and how it affects Amazon.  Oprah goes on TV and promotes a book.  That day millions of people go on the Internet and search for more information on that book.  They find the book on Amazon’s site.  Thousands of people load the book’s web page at the time.  In most situations the web site would go down because too many people are trying to access the same files.  Amazon created a system that distributes files over many computers so all the computers together can handle the load.

Amazon did something wonderful; they opened up this service to the public.  Now anyone can upload their files to the Amazon system.  You can create a web application just as powerful and reliable as the Amazon web site without building the infrastructure required to do everything Amazon’s system does.

If you were to put your files on a traditional web hosting account your files would reside on a single computer.   If your web site became popular that computer would not be able to server all the requests and your web site would go down.  There is no automatic process to replicate your files to more computers if your web site becomes popular. 

If you wanted to build your own scalable system you would need to procure several servers, house them in a facility that can support the load, and configure the network to distribute the load over the set of computers.  This would take people with specialized skills and it would take money.  There would also be the cost of maintaining and upgrading the system.

The nice thing about the Amazon system is you only pay for what you use and there are no upfront fees.  If your site only gets a few visitors you will pay a few cents a month.  If your site becomes popular your site will say online and you will pay a reasonable price for your usage.

I highly recommend using Amazon S3 for your Internet storage needs.