Recently, I came across an interesting benchmark of BigData systems based on "A Comparison of Approaches to Large-Scale Data Analysis" by Pavlo et al. (SIGMOD 2009). Based on the benchmark methodology, the APMLab guys from Berkeley University developed an open-source software that allows anyone to run this benchmark using public cloud (AWS in this case). This benchmark measures response time on a handful of relational queries: scans, aggregations, and joins across different data sizes. They have an impressive website with benchmark results, comparing Amazon Redshift, Hive , Shark, Impala and Stinger/Tez. Since a lot of my time is devoted to working with Google BigQuery, I was intrigued how BigQuery would stand being compared to other solutions based on exactly the same dataset. Clearly, BigQuery has a very different nature since it's shared service and not dedicated, customer-deployed (and maintained) solution.