* Spark was designed to run in a clustered environment
* You can compile or download specific binaries of Spark that run on each of the supported environments.
* With the Spark Standalone you can setup a Spark cluster without any dependency on any other Resource Mangers
* This is probably the best option for quick & dirty solutions, but if you want to build a cluster that other applications can run upon (like MapReduce), then it's a good idea to consider some of the other options
* Some other cluster options include YARN and Mesos.
* YARN is what encompasses the new Resource and Node Manager in the Hadoop 2.x ecosystem. The purpose of YARN is to decouple these responsibilities from the daemon's of the traditional Hadoop ecosystem
* Apache Mesos is similar to YARN, but goes a step further in unifying a cluster into a single pool of resources that applications can use.
* Definition from Apache Mesos website:
* "Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively."
## Apache Mesos cluster demo
### StackOverflow.com posts in `.cache()` using `spark-shell`
### learning-spark project
* Google Cloud Compute (free tier)
* 1 master VM and 3 slave VM's, 4 total, `n1-standard-2` instance type
* 2 vCPU each, 8 vCPU total
* 7.5 GB RAM each, 30GB RAM total
* HDFS, 500 GB each, 2 TB total