Skip to main content

Apache Spark

Kubernetes-native distributed data processing engine.

CategoryAnalytics
Deploy./uis deploy spark
Undeploy./uis undeploy spark
Depends onNone
Required byNone
Helm chartspark-operator/spark-operator (unpinned)
Default namespacespark-operator

What It Does

The Spark Kubernetes Operator enables running Apache Spark jobs natively on Kubernetes using SparkApplication custom resources. Instead of managing a standalone Spark cluster, you submit jobs as Kubernetes manifests.

Key capabilities:

  • SparkApplication CRD — declarative job definitions as YAML
  • ARM64 support — runs on Apple Silicon and ARM-based clusters
  • Multi-language — PySpark, Scala, Java, R
  • Resource management — Kubernetes-native CPU/memory requests and limits
  • Job scheduling — cron-based scheduling via ScheduledSparkApplication
  • 100% Databricks compatible — same Spark runtime

Deploy

./uis deploy spark

No dependencies.

Verify

# Quick check
./uis verify spark

# Manual check
kubectl get pods -n spark-operator

# Check the CRD is installed
kubectl get crd sparkapplications.sparkoperator.k8s.io

Configuration

The Spark operator manages jobs via SparkApplication CRDs. No additional config files are needed for the operator itself.

Key Files

FilePurpose
ansible/playbooks/330-setup-spark.ymlDeployment playbook
ansible/playbooks/330-remove-spark.ymlRemoval playbook

Undeploy

./uis undeploy spark

Running Spark jobs will be terminated.

Troubleshooting

Operator pod won't start:

kubectl describe pod -n spark-operator -l app.kubernetes.io/name=spark-operator
kubectl logs -n spark-operator -l app.kubernetes.io/name=spark-operator

SparkApplication stuck in PENDING: Check driver pod events:

kubectl describe sparkapplication -n spark-operator <app-name>
kubectl describe pod -n spark-operator <driver-pod>

Executor pods not launching: Verify resource quotas and available capacity:

kubectl top nodes

Learn More