SprightlySoft has released an Amazon S3 synchronization application. This application allows you to take a folder on your computer and upload it to S3. You can make additions, deletions and changes to your local files and next time you run the application it will detect these changes and apply them to Amazon S3. This program allows you to create a mirror of a local folder on S3 and always keep it up to date.
Amazon Simple Storage Service (Amazon S3) is a service that allows you to store files in Amazon’s cloud computing environment. When your files are in Amazon’s system you can retrieve the files from anywhere on the web. You can also use Amazon’s CloudFront service in conjunction with S3 to distribute your files to millions of people. Amazon S3 is the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. Best of all Amazon S3 is free for the first 5 GB of storage.
S3 Sync is open source and completely free to use. Visit http://sprightlysoft.com/S3Sync/ to download the application or get the source code.
The following gives you a rundown of how the source code works. This information is useful if you want to learn how the program works so you can make changes to it.
S3 Sync works by listing your files on Amazon S3, listing your files locally and comparing the differences between the two lists. Here is the logic of the program.
- The code starts by getting a list of all user settings. These include your Amazon AWS credentials, the S3 bucket you are syncing to, and the local folder you are syncing from.
- The next major call is the PopulateS3HashTable function. Here all the files in your Amazon S3 bucket is listed. The code calls the ListBucket function from the SprightlySoft AWS Component. This function returns an ArrayList of objects that represent each item in S3. Properties of each object include S3 key name, size, date modified and ETag. These objects are added to a HashTable so it can be used later on.
- The next call is the PopulateLocalArrayList. This function lists all local files and folders you want to synchronize. Each file and folder is added to an ArrayList so it can be used later on. The function has the option to only include certain files or exclude certain files. The function is recursive. That means it calls itself for each sub folder.
- Next the program creates a list of files that should be deleted on Amazom S3. This is done through the PopulateDeleteS3ArrayList function. This function goes through each item in the list of file on S3 and checks if they exist in the list of local items. If the S3 item does not exist locally, the item is added to the DeleteS3ArrayList.
- Next the program finds which local files do not exist on S3. This is done through the PopulateUploadDictionary function. This function goes through the local ArrayList and check if each items exists in the S3 HashTable. The item may exist on S3 and locally but the local file may have different content. To determine is a file is the same between S3 and locally the program has an option to compare files by ETag. An ETag is an identifier based on the content of a file. If the file changes the ETag changes. Amazon stores the Etag of each file you upload. When you list files on S3 the ETag for each file is returned. If you choose to compare by ETag the program will calculate the ETag of the local file and check if it matches the ETag returned by Amazon. Any files that don’t match are added to a Dictionary of items that need to be uploaded to S3.
Now we have a list of files that need to be deleted on S3 and a list of files that need to be uploaded to S3. The program has an option to list these changes or go ahead and apply these changes to S3.
If we are deleting files on S3 the DeleteExtraOnS3 function is called. This function goes through the DeleteS3ArrayList and deleted each file in it. This is done by calling the MakeS3Request function. This function uses the SprightlySoft AWS Component to send the appropriate command to S3. For more information about using the AWS Component see the documentation included with the source code and read the Amazon S3 API Reference documentation from Amazon. The function has the ability to retry the command if it fails.
Finally the program calls UploadMissingToS3 function to upload files from the UploadDictionary to S3. Here the program sets header information such as Content-MD5, Content-Type, and metadata to store the locate file’s timestamp in S3. It then calls the UploadFileToS3 function which is very similar to the MakeS3Request function. The difference is UploadFileToS3 function has parameters that are only relevant to uploading a file. The program uses a variable called MyUpload which is a SprightlySoft AWS Component object. This object raises an event whenever the progress on an upload changes. The program hooks into this progress event and shows the progress of the upload while it is taking place. This is done in the MyUpload_ProgressChangedEvent function.
When the program completes it has an option to send log information through an email. This is useful if you run the program as a scheduled task and you want to be notified if there is a failure.
2 thoughts on “Amazon S3 Sync”
Thanks for the solution….Have you done any performance tests syncing with large number of files. Example: 100K files already synced with S3 and with 500 new files added and 500 deleted locally
It should be a relatively quick process to compare 100,000 local files to 100,000 files on S3. An area to be aware of is listing files on S3. In order to list 100,000 files on S3 you will need to do 100 “Get Bucket” requests. The maximum results returned in a request is 1000 files. You are unable to do the requests in parallel since you need the NextMarker value from the previous request for the next request. 100 requests are not too much but it will prevent you from doing something like comparing all files every few seconds.
Comments are closed.