This section describes how to upload data from your own filesystem into Zepl and analyze it using Spark, Python and other interpreters within Zepl. You can also download and delete the uploaded data.
If you have data files on your local machine that you want to analyze with Zepl you can upload the file by clicking the right menu bar in your notebook and choosing the Upload file button. You can also simply drag and drop the relevant file into the sidebar. You can upload multiple files which are only accessible through the given notebook. To access the same file in different notebooks the file will need to be uploaded to each notebook separately.
Note: Currently we allow up to 100MB in total.
Once the file is uploaded to the notebook, you can access the file by the following URL (where <file-name> is the name of the file):
Here are some examples you can use:
%spark import org.apache.spark.SparkFiles sc.addFile("http://zdata/bank.csv") val sparkDF = spark.read.format("csv") .option("delimiter", ";") .option("header", "true") .option("inferSchema", "true") .load(SparkFiles.get("bank.csv"))
%spark.pyspark from pyspark import SparkFiles sc.addFile('http://zdata/bank.csv') sparkDF = spark.read.format('csv').options(delimiter=';', header='true', inferSchema='true').load(SparkFiles.get('bank.csv'))
%spark.r spark.addFile("http://zdata/bank.csv") sparkDF <- read.df(path = spark.getSparkFiles("bank.csv"), source = "csv", delimiter = ";", header = "true", inferSchema = "true")
If you want to read your data into Python directly, you can also read your data using pandas. For example:
%python import pandas as pd pandas_df = pd.read_csv('http://zdata/bank.csv', sep=';', header='infer')
If the data volume is small enough, you can also load this data directly into the container. You can use
%python !wget http://zdata/<file-name> to download data to the container. Once the file is downloaded to the container, you can access the downloaded file as a local file.
For example, you can use the following code to access the data after downloading a file named bank.csv to the container:
%spark val sparkDF = spark.read.format("csv") .option("delimiter", ";") .option("header", "true") .option("inferSchema", "true") .load("bank.csv")
You cannot edit data directly within Zepl but you can overwrite the data file by uploading a file with the same name.
Warning: Overwritten data cannot be recovered.
To delete data, click the red "x" button next to the data file in the Files tab on the notebook page.
Warning: Deleted data cannot be recovered.