spark

ben.wangzLess than 1 minute

spark

main usage

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

conceptions

none

purpose

prepare a kind cluster with basic components
install spark
install spark-tool
test spark with spark-tool

installation

prepare a kind cluster with basic components

download and load images to qemu machine(run command at the host of qemu machine)

run scripts in download.and.load.function.sh to load function download_and_load

TOPIC_DIRECTORY="spark.software"
BASE_URL="https://resource.geekcity.tech/kubernetes/docker-images/x86_64"
download_and_load $TOPIC_DIRECTORY $BASE_URL \
    "docker.io_bitnami_spark_3.2.1-debian-10-r78.dim"

install spark

prepare spark.values.yaml

prepare images

run scripts in load.image.function.sh to load function load_image

load_image "docker.registry.local:443" \
    "docker.io/bitnami/spark:3.2.1-debian-10-r78"

install by helm

helm install \
    --create-namespace --namespace application \
    my-spark \
    https://resource.geekcity.tech/kubernetes/charts/https/charts.bitnami.com/bitnami/spark-5.9.11.tgz \
    --values spark.values.yaml \
    --atomic

install spark-tool

prepare spark.tool.yaml

kubectl -n application apply -f spark.tool.yaml

test spark with spark-tool

connect to database

kubectl -n application \
    exec -it my-spark-worker-0 -- \
        spark-submit --master spark://my-spark-master-svc:7077 \
            --class org.apache.spark.examples.SparkPi \
            /opt/bitnami/spark/examples/jars/spark-examples_2.12-3.2.1.jar 5

uninstallation

uninstall spark-tool

kubectl -n application delete -f spark.tool.yaml

uninstall spark

helm -n application uninstall my-spark \
    && kubectl -n application delete pvc data-my-spark-mariadb-0