Apache Spark

Kubernetes-native distributed data processing engine.


Category	Analytics
Deploy	`./uis deploy spark`
Undeploy	`./uis undeploy spark`
Depends on	None
Required by	None
Helm chart	`spark-operator/spark-operator` (unpinned)
Default namespace	`spark-operator`

What It Does

The Spark Kubernetes Operator enables running Apache Spark jobs natively on Kubernetes using SparkApplication custom resources. Instead of managing a standalone Spark cluster, you submit jobs as Kubernetes manifests.

Key capabilities:

SparkApplication CRD — declarative job definitions as YAML
ARM64 support — runs on Apple Silicon and ARM-based clusters
Multi-language — PySpark, Scala, Java, R
Resource management — Kubernetes-native CPU/memory requests and limits
Job scheduling — cron-based scheduling via ScheduledSparkApplication
100% Databricks compatible — same Spark runtime

Deploy

./uis deploy spark

No dependencies.

Verify

# Quick check
./uis verify spark

# Manual check
kubectl get pods -n spark-operator

# Check the CRD is installed
kubectl get crd sparkapplications.sparkoperator.k8s.io

Configuration

The Spark operator manages jobs via SparkApplication CRDs. No additional config files are needed for the operator itself.

Key Files

File	Purpose
`ansible/playbooks/330-setup-spark.yml`	Deployment playbook
`ansible/playbooks/330-remove-spark.yml`	Removal playbook

Undeploy

./uis undeploy spark

Running Spark jobs will be terminated.

Troubleshooting

Operator pod won't start:

kubectl describe pod -n spark-operator -l app.kubernetes.io/name=spark-operator
kubectl logs -n spark-operator -l app.kubernetes.io/name=spark-operator

SparkApplication stuck in PENDING: Check driver pod events:

kubectl describe sparkapplication -n spark-operator <app-name>
kubectl describe pod -n spark-operator <driver-pod>

Executor pods not launching: Verify resource quotas and available capacity:

kubectl top nodes

What It Does​

Deploy​

Verify​

Configuration​

Key Files​

Undeploy​

Troubleshooting​

Learn More​