However, the choice of IaaS or PaaS capabilities can have a significant impact on the rate of return for cloud projects, as well as the time to value for app dev.
One early discussion point and common architectural decision when deploying big data solutions in the cloud is the choice between IaaS or PaaS offerings. Most large cloud providers offer a combination of both highly specialized PaaS capabilities, and more flexible IaaS capabilities, allowing architects to pick components with the right combination of features, cost, speed and scalability.
IaaS offerings in the big data market are often technologies regularly deployed on premise, but deployed in virtual machines within the cloud provider’s environment. Technologies like Hadoop, Cassandra and MongoDB are commonly deployed on public clouds with similar architectures to that of on-premise deployments.
PaaS offerings give users a set of capabilities, often exposed through a standard set of APIs that can be programmatically leveraged to quickly develop and deploy applications. Common PaaS offerings from AWS including Redshift for data warehousing and DynamoDB for NoSQL database services. Google also offers PaaS capabilities with Bigtable, a key/value store and BigQuery, a highly scalable data analysis engine supporting SQL-like queries of large datasets.
While each offering has unique advantages, their adoption will vary based on specific operational needs and technical feature requirements. Some common considerations for IaaS and PaaS tradeoffs are:
- Enables cloud deployments to closely mirror the technical architecture used for on-premise big data solutions
- Aligns cloud based architectures more closely with vendor certification programs and preferred application frameworks
- Allows for more portability of an application and workload between cloud providers
- Enables a wider range of clouds to be considered, including providers that offer more traditional colocation services
- Requires a broader skill set including systems administration and operations
- Pricing is typically done based on an hourly charge relative to the size of the compute and storage resources
- More rapid time to value through a programmatic approach to service consumption
- Proven scalability through the evolution of PaaS capabilities first as internal technologies for cloud providers, then exposure as a PaaS capability
- Pricing model is commonly based on the amount of data moved or the number of requests to a specific API
When building a big data solution in the cloud, there are several key questions to assess as you look at IaaS and PaaS capabilities to determine a fit for your specific use case and workload:
- Skill set assessment – PaaS solutions require lower operational overhead and fewer systems level skills, but do demand that developers have experience and familiarity with the specific APIs and methods used to call the PaaS offerings.
- Application support – Many big data deployments leverage commercial tools for visualization and predictive analytics, these tools often require certification with data platforms like those used in PaaS and IaaS deployments. You should assess all PaaS offerings to ensure the APIs are compliant with the needs of additional tools that will be integrated.
- Cost model – The cost model will be fundamentally different between IaaS and PaaS. PaaS systems will be based on usage; so unexpected loads can have a surprising effect on costs, where as IaaS is based on available capacity so there is a risk of overprovisioning the environment and paying for unused resources.
For building big data platforms, the cloud provides flexible options through both rapidly deployable PaaS solutions as well as flexible and portable IaaS options. As an organization you should start with your core requirements and technical needs to determine which path is best for you.