The Spark Stack

• Spark SQL: This is Spark’s module for working with structured data, and it is designed to support workloads that combine familiar SQL database queries with more complicated, algorithm-based analytics. Spark SQL supports the open source Hive project, and its SQL-like HiveQL query syntax. Spark SQL also supports JDBC and ODBC connections, enabling a degree of integration withContinue reading “The Spark Stack”

Speed of Spark 100 terabytes in in 23 minutes.

Spark wins Daytona Gray Sort 100TB Benchmark We are proud to announce that Spark won the 2014 Gray Sort Benchmark (Daytona 100TB category). A team from Databricksincluding Spark committers, Reynold Xin, Xiangrui Meng, and Matei Zaharia, entered the benchmark using Spark. Spark won a tie with the Themis team from UCSD, and jointly set aContinue reading “Speed of Spark 100 terabytes in in 23 minutes.”