Amazon EMR is a managed cluster platform that simplifies running big data frameworks such as Apache Hadoop and Apache Spark on AWS to process and analyze vast amounts of data.
There are two options when connecting Zepl to AWS EMR clusters.
Note: In both cases, the EMR cluster will have to reside on the same VPC as Zepl.
Zepl can connect to existing EMR clusters that your team has created through the AWS console. There are two requirements:
To connect a Zepl notebook to an existing EMR cluster:
Note: Currently Zepl only supports EMR Release v5.14.0 (more will be added in the future)
Zepl also enables you to create a new EMR cluster through the Zepl interface.
Note: It is assumed that in the process of the Zepl deployment, the Zepl user IAM role has the credentials to create EMR clusters.
As with the above, go to the Resources page and click on the Clusters menu
On the Clusters page click on Create new Cluster
Select the Launch new Zepl managed EMR cluster and click Next
Give the cluster a name, give it an idle terminate value (shuts down the cluster after the time specified), give it any additional configurations, select the Hardware configuration from the dropdown and click Create
Note: The speed at which the new EMR cluster is created is dependent on AWS. This often take about 5 minutes.
All clusters can be managed from the Clusters console.
From here you can disconnect, shutdown, clone and control access to these clusters from your organization members.