• 703-743-9010
  • info@oneoffcoder.com
  • 7526 Old Linton Hall Rd, Gainesville VA, 20155

Hadoop + Spark + Python Docker Container

If you want to learn Hadoop, Spark and Python (PySpark), we have published a Docker container to facilitate your learning efforts. The source code is available on GitHub and the container is published on Docker Hub. An example notebook is provided to get you jump started as well (see below).

Customized data science runtime containers for Databricks

Databricks has an experimental feature where you may customize your runtime container using Docker containers. In this article, we show how you can quickly build a Docker container for data science purpose. In particular, we will create a Docker container for use with Databricks with some Natural Language Processing (NLP) […]