• Spark SQL: This is Spark’s module for working with structured data, and it is designed to support workloads that combine familiar SQL database queries with more complicated, algorithm-based analytics. Spark SQL supports the open source Hive project, and its SQL-like HiveQL query syntax. Spark SQL also supports JDBC and ODBC connections, enabling a degree of integration withContinue reading “The Spark Stack”
Category Archives: Apache Spark
Storage Options for Apache Spark
• MapR (file system and database) • Google Cloud • Amazon S3 • Apache Cassandra • Apache Hadoop (HDFS) • Apache HBase • Apache Hive • Berkeley’s Tachyon project
Spark Deployment Options
Running Spark on YARN Running Spark on Mesos Running Spark on EC2
Programming languages supported by Spark
Java • Python • Scala • SQL • R Spark Examples
Speed of Spark 100 terabytes in in 23 minutes.
Spark wins Daytona Gray Sort 100TB Benchmark We are proud to announce that Spark won the 2014 Gray Sort Benchmark (Daytona 100TB category). A team from Databricksincluding Spark committers, Reynold Xin, Xiangrui Meng, and Matei Zaharia, entered the benchmark using Spark. Spark won a tie with the Themis team from UCSD, and jointly set aContinue reading “Speed of Spark 100 terabytes in in 23 minutes.”
You must be logged in to post a comment.