Deploying Alfresco on Amazon EC2

Given all of the “buzz” around Cloud Computing, TSG has had the opportunity to deploy Alfresco Cloud solutions on Amazon EC2 for a number of clients, as well as a chance to experiment with the flexibility of using Amazon EC2 for development purposes.  Amazon EC2 fits under the Infrastructure-as-a-Services (IaaS) model, which provides virtual servers on demand.

Overall, the primary benefit of deploying to an IaaS such as Amazon EC2 is the ease in deployment and configuration, as well as the ability to scale your infrastructure on demand.  Over time, your typical document management solution would mostly likely have a slow and steady increase of content and usage.  However, for scenarios which require a large ingestion of content initially or a huge increase in users, Amazon EC2 provides the flexibility to scale up or down as needed.

If deploying to a single server, deploying an ECM repository to Amazon EC2 should be a relatively straight forward process.  As ECM vendors market their Cloud offerings, the key will be how easy they are able to scale from one to many nodes in a Cloud environment.

Developing in the Cloud

In regards to internal development, utilizing Amazon EC2 On Demand Instances has been an easy way to quickly bring up instances of Alfresco.  In further enhancing our OpenMigrate Alfresco Target offering, we were able to perform benchmarking utilizing EC2.  Need to compare migration performance on Alfresco on a 32-bit vs. 64-bit OS?  No problem, start up a medium instance (32-bit) and large instance (64-bit) and go to town.  Maybe we want to see how we can increase migration performance in a clustered Alfresco environment?  No problem again..start up another instance and setup a cluster against the existing repository.  The benefit is once development tasks are complete, the instance can be shut down and brought up at a later time.

Amazon EC2 Lessons Learned

If deploying a production implementation into cloud, the following are best practices when deploying to the Amazon EC2.

  • On Demand vs. Reserved Instances – In most cases, it is assumed your production ECM repository will be 24/7.  Choose the Reserved Instance to reduce overall costs.  For Reserved Instances, you pay upfront for 1 to 3 year intervals, but pay a substantially less per hour rate.  Amazon is constantly updating their pricing model, but can be found here.  The Amazon Simple Monthly Calculator can be found here.
  • Backup Procedures / Restoring EBS Stores – When deploying to your traditional in-house IT infrastructure, backup procedures are assumed.  When deploying to an Amazon EC2 instance, make sure a backup process is in place.  Typically, backups will be configured to go to a separate S3 store for archival purposes.  Test the restore to make sure it is functioning.    We’ve also run into a couple of scenarios where your might lose your mount to an EBS store.  Make sure your startup scripts check for the mount or you have a script handy to remount the EBS stores should you lose them.  Alfresco has partnered with vendors like RightScale, which provide tools for overall Cloud Management and administration.
  • Migrating Content – Depending upon the amount of content, it may not make sense to migrate content directly from your local server to your EC2 instance.  Of course, TSG utilizes OpenMigrate for complex migrations from other repositories into Alfresco, but bandwidth may be an issue.  To maximize performance, use AWS Import/Export capabilities and send a hard drive to Amazon, who will mount the drive for you.  You can then perform the migration on the server using the source files, or perform the migration locally and then export the alf_data directly.

Deciding on the deploying to the Cloud

Obviously deploying production ECM repositories in the Cloud is not for everyone.  Security requirements, integrations with other applications, etc. may prohibit from any content from residing in the Cloud.  However, when looking at overall costs savings, including FTEs and internal IT costs and time to procure, lease and configure hardware, etc. deploying to an IaaS may be a viable option.