Skip to main content

Analytics

The analytics package provides data science and analytics platforms for interactive analysis, distributed processing, and data governance.

Services

ServiceDescriptionDeploy
JupyterHubMulti-user Jupyter notebooks with PySpark./uis deploy jupyterhub
OpenMetadataData discovery, governance, and metadata platform./uis deploy openmetadata
Apache SparkKubernetes-native distributed processing./uis deploy spark
Unity CatalogData catalog and governance./uis deploy unity-catalog

Quick Start

./uis stack install analytics

This installs Spark, JupyterHub, and Unity Catalog. OpenMetadata is deployed separately:

./uis deploy postgresql      # Required by OpenMetadata
./uis deploy elasticsearch # Required by OpenMetadata
./uis deploy openmetadata

How It Works

  • JupyterHub gives users interactive notebooks with PySpark pre-configured
  • Spark Operator runs batch jobs as Kubernetes-native SparkApplication resources
  • Unity Catalog provides a three-level namespace (catalog.schema.table) for governed data access
  • OpenMetadata provides data discovery, lineage tracking, and governance across all data assets