One Stop Big Data Service Shop

Big Data Consulting & Training Services

Jumping Bean can enable your company with Big Data infrastructure,  assist with the development  data extraction, loading and transformation processes and performing data analysis and building reports. Our training division can also assist in the upskilling of your knowledge works and technical staff if required.

Big Data Infrastructure

Whether its on-premises,  multi-cloud and/or hybrid cloud we can assist in the set up and configuration of your big data infrastructure that fits with your strategic objectives and risk profile. From the Hadoop ecosystem with Hive, Spark and Pig and standard services such as Kafka clusters and HDF or ceph storage clusters, to cloud hosted solutions with Google Big Table, Big Query and Dataflow or AWS's Redshift and Elastic reduce Map; we can build an infrastructure solution that will scale with your data and analytical requirement growth.

Big Data Processing

Need assistance with the creation of your Big Data pipeline? Our engineers can assist in the creation of efficient processes that can scale with your data growth. Need to integrate and consolidate data sources into your data lake or specialist data warehouse like stores?  Whether its on-premises or in the cloud we will help you select the best components to meet your budget and policy requirements.  Our developers can assist with the writing of map/reduce jobs, the development of Hive, Spark or Pig scripts to transform and enrich your data.

Data Scientists and Data Analysis

Looking for data scientists to assist with the interrogation of your data? Our pool of expert resources can be made available to you.  Part mathematician, part computer scientist and part trend-spotter our data scientists can sift through your data, panning for the gold nuggets that it contains.

Big Data Training

Our training division will sit with your management team, after performing a skills-gap analysis and develop a unique training plan for your staff. Whether it's data analysts or infrastructure  and cloud engineers or developers requiring retraining or refresher courses our training team is up to the task.

Continual training and skill development is the new norm and any corporate looking to keep up with the latest technological developments and their competitors will need to integrate a long term training and skill acquisition programme into their yearly planning cycles. Jumping Bean is the team to speak to get the best results.

Big Data Training Courses

Big Data Training

We have a range of courses covering Big Data topics. These range from 1 day  to 4 days and for different audience, either business end-user, technical, data scientists or architects. We are constantly adding to, and adapting, our course to the ever changing landscape. If you looking for a course and don't see if here, please ask we may just not have added it yet!.


Hive Overview

  • Architecture and design
  • Aata types
  • SQL support in Hive
  • Creating Hive tables and querying
  • Partitions
  • Joins
  • Text processing
  • labs : various labs on processing data with Hive

DQL (Data Query Language) in Detail

  • SELECT clause
  • Column aliases
  • Table aliases
  • Date types and Date functions
  • Group function
  • Table joins
  • JOIN clause
  • UNION operator
  • Nested queries
  • Correlated sub-queries

Hadoop & Spark


  • Introduction to Apache Hadoop and the Hadoop Ecosystem
  • Apache Hadoop File Storage
  • Distributed Processing on an Apache Hadoop Cluster
  • Apache Spark Basics
  • Working with DataFrames and Schemas
  • Analyzing Data with DataFrame Queries
  • RDD Overview
  • Transforming Data with RDDs
  • Aggregating Data with Pair RDDs
  • Querying Tables and Views with SQL
  • Working with Datasets in Scala
  • Writing, Configuring, and Running Spark Applications
  • Spark Distributed Processing
  • Distributed Data Persistence
  • Common Patterns in Spark Data Processing
  • Introduction to Structured Streaming
  • Structured Streaming with Apache Kafka
  • Aggregating and Joining Streaming DataFrames


Kafka Admin

  • Kafka Fundamentals
  • Managing, configuring and optimizing a cluster for performance
  • Kafka Security
  • Designing, troubleshooting and Integrating Systems

Kafka Developer

Application Design

  • Using Kafka’s command line tools
  • pub/sub and streaming, and overall Apache Kafkaarchitecture and design
  • Apache Kafka API, configuration and metrics
  • Metadata design
  • Systems metrics


  • Programmatically Accessing Kafka
  • Writing a Producer in Java
  • Using the REST API to Write a Producer
  • Writing a Consumer in Java
  • Using the REST API to Write a Consumer


  • Recognize & implement secure procedures for deployment & testing
  • Monitor and troubleshoot clients
  • Tune clients as necessary (e.g.performance, throughput, latency)
  • Developing and testing Kafka Streams applications
  • Developing and testing Confluent KSQL applications

What and Why Of Big Data

Gold In Them Big Data Hills - What and Why of Big Data

Big Data is an all embracing label for the evolution of business intelligence practices, methodologies and technical infrastructure that has qualitatively transformed business reporting, data analysis. Now powerful tools and algorithms are within the reach of even small and medium enterprises.

Big Data Democratization

The amount of technical and analytical skill required to configure complex infrastructure and perform  advanced data manipulation and analysis has been reduced to such an extent that users of reports can now generate them themselves and gain valuable insights.

The cloud revolution means that niche experts can spin up the required infrastructure in minutes and apply machine learning logarithms at the click of a button. All you need to bring is the data.  The data that is busy accumulat5ing at a rapid rate on your storage systems.

Big Data  = Big Value

Data is the new gold rush. "There is gold in them mountains of data".  This  is what drives the value of Google, Facebook and Amazon.   We are here to assist you in your big data strategy, implementation and analysis.

Our Business Partners

Our Business Partners

About Us

About Us

Jumping Bean is an open source integration and training company that has been delivery solutions to customers for over 17 years.

Our services include:

  • Support
    • 24x7SLA based
    • Ad-hoc support,
  • Security consulting
    • Vulnerability scans,
    • Server hardening
    • Penetration tests
  • Training
    • Linux
    • Java
    • DevOps
    • Cloud

Long-Term Partnerships

We build long relationships with our customers which helps us better  understand their needs and offer customised solutions and training to meet their business requirements.

Our clients include large and small businesses in South Africa and across the globe.  We offer both remote and on-site support.

Passion for Technology

We are passionate about open source and love living on the bleeding edge of technology innovation. Our customers lean our our practical experience with emerging technologies to ensure they get the benefits of early adopters and avoid the pitfalls.

Training 100% Money Back Guarantee

We are so confident of the quality of our training that our courses carry a 100% money back guarantee. If at the end of the first day of training you are unsatisfied with the course we will refund 100% of your spend no questions asked!

Contact Form

Jumping Bean Contact Form!