Blog Posts

Streaming Apache Hop and MQTT

A quick walkthrough on how to use Apache Hop with MQTT

  February 8, 2021   


Automated Data Ingestion with AWS S3, EventBridge, Batch and Apache Hop

A tutorial focusing on building a data ingestion solution on AWS

  February 7, 2021   


Apache Hop: AWS ECS and AWS Batch

A quick walkthrough on how to get Apache Hop running on AWS ECS and AWS Batch

  February 5, 2021   


Apache Hop: Customising the Docker Image

A quick walkthrough on how to customise the Hop Docker Image

  February 1, 2021   


Slimming down Apache Hop

A quick walkthrough on how to make Hop as small as possible

  January 28, 2021   


Visual Side-by-Side Diffs with Git

Steps on how to set up git with external diff tools

  January 7, 2021   


Getting started with Apache Hop on AWS: Part 2 - CloudFormation

We will use AWS CloudFormation to automatically create our environment

  December 29, 2020   


Getting started with Apache Hop on AWS: Part 1

We will install Hop on an EC2 instance and source data from S3 and write it to Redshift.

  December 14, 2020   


PDI and the H2 Database

A brief overview on how to use the H2 database with PDI

  October 16, 2020   


Project Hop: Project and Environment Configuration

A brief overview of how to create environment definitions for Hop projects

  July 1, 2020   


Project Hop: A graphical way to build Apache Beam Pipelines

This article explains how to get started with creating Beam pipelines in Project Hop

  May 17, 2020   


Project Hop: Testing as Part of your Development

A few ideas on how to improve your development workflow

  May 12, 2020   


Project Hop: Create Environment Definitions

A brief overview of how to create environment definitions for Hop projects

  May 5, 2020   


Project Hop: Hop on Kubernetes

A brief overview on how to deploy Hop with Kubernetes

  April 29, 2020   


Project Hop: Hop on Docker

A brief overview on how to create a Docker image for Hop

  April 27, 2020   


Project Hop: Hop into a new World

A brief overview of what's new in Hop

  April 15, 2020   


Scheduling a PDI job on Apache Airflow

A tutorial on how to schedule a PDI job via Apache Airflow

  April 1, 2020   


Converting PDI Repositories to PDI Standalone Files

This article explains how to convert PDI repos to PDI standalone files

  March 2, 2020   


Automated testing of data processes: Part 2

This article explains how to set up an environment for automatically testing data integration processes

  January 20, 2020   


Automated testing of data processes: Part 1

This article explains how to set up an environment for automatically testing data integration processes

  January 18, 2020   


Installing JDK on MacOS

This article covers the basics of installing the Java Development Kit

  June 1, 2019   


Pentaho Data Integration: Unit Testing

This article explains how to use the Pentaho PDI Datasets plugin for unit testing

  January 9, 2019   


Pentaho Data Integration/Kettle: Environment Plugin

This article explains how to get started with a dynamic environment setup

  December 16, 2018   


Pentaho Data Integration/Kettle: The easy way to create Beam Pipelines

This article explains how to get started with creating Beam pipelines in PDI

  December 1, 2018   


Adding non-Marketplace plugins to PDI WebSpoon

This article explains how to add a custom plugin to WebSpoon

  November 10, 2018   


Hello Node-RED!

Introduction to Node-RED

  July 23, 2018   


Documentation is a Journey

This article explains how to encourage developers to write documentation

  July 21, 2018   


Best Practices: Generate Synthetic Data

This tutorial explains an easy way to generate synthetic data

  June 3, 2018   


Kubernetes: Scaling Pentaho Server

This article explains how to scale Pentaho Server with Kubernetes on Google Cloud Platform

  April 1, 2018   


Kubernetes: Manual and Automatic Volume Provisioning

This article explains how to easily deploy a stateful web app with Kubernetes on Google Cloud Platform

  February 11, 2018   


Kubernetes: Deployment Example WebSpoon

This article explains how to easily deploy Webspoon with Kubernetes on Google Cloud Platform

  February 3, 2018   


Pentaho Data Integration: WebSpoon on AWS Elastic Beanstalk and adding EBS or EFS Storage Volumes

This article explains how to get WebSpoon running on AWS and how to add storage volumes

  December 30, 2017   


Pentaho Data Integration v8: Getting started with the Spark Execution Engine

This article explains how to configure PDI to run with Spark

  November 16, 2017   


Pentaho Standardised Git Repo Setup First Release

A few comments on the first release

  November 14, 2017   


Pentaho Community Meeting 2017 Impressions

A short write-up of my impressions on this years PCM

  October 12, 2017   


Pentaho VizAPI: Create a custom visualization once and use it everywhere

This article explains how to create custom visualizations with Pentahos VizAPI

  October 1, 2017   


Mondrian: Averages in Aggregate Tables

This article explains how to make averages work with Mondrian aggregated tables

  October 1, 2017   


Pentaho Data Integration: Restartable Job

This article explains how to easily add restartablility to Pentaho jobs

  July 21, 2017   


Big Data Geospatial Analysis with Apache Spark, GeoMesa and Accumulo - Part 4: Ingesting Data with Spark SQL

This article walks you through practical GeoMesa examples.

  July 1, 2017   


Big Data Geospatial Analysis with Apache Spark, GeoMesa and Accumulo - Part 3: Practical Examples

This article walks you through practical GeoMesa examples.

  June 20, 2017   


Big Data Geospatial Analysis with Apache Spark, GeoMesa and Accumulo - Part 2: Basics

This article walks you through the basics of Accumulo and GeoMesa.

  June 19, 2017   


Big Data Geospatial Analysis with Apache Spark, GeoMesa and Accumulo - Part 1: Installation

This article walks you through the installation procudure for GeoMesa.

  June 18, 2017   


Pentaho Data Integration 7.1: Getting started with the Spark Execution Engine

This article explains how to configure PDI to run with Spark

  May 22, 2017   


Apache Spark: Mapping Scala Date to Spark SQL Date

  May 2, 2017   


20 Seconds for Embedding a CDE Dashboard

This article discusses how to embed a CDE dashboard in an external site

  April 27, 2017   


Pentaho Data Integration: Automatically source Metadata for ETL Metadata Injection

This article discusses various approaches on how to automatically source metadata from files and database tables to inject it later on into transformation templates

  April 21, 2017   


Log4J and Pentaho

  April 19, 2017   


MDX ToggleDrillState

We discuss an elegent use case for ToggleDrillState

  April 17, 2017   


MDX DRILLDOWNMEMBER and CDE Dashboard Tables

We discuss an elegent use case for DRILLDOWNMEMBER within a CDE table

  April 9, 2017   


Apache Spark: Retrieving Data from a REST API and converting JSON to a Spark Dataset

This is a very short article explaining how to retrieve data from a REST API and converting the retrieved JSON data to a Spark Dataset

  April 4, 2017   


Mondrian: Modeling a Multivalued Dimension Attribute

This article explain how to model multivalued dimensions with Mondrian

  March 20, 2017   


Adventures with Apache Spark: Creating a Snapshot Table

This article discusses how to create a snapshot table for OLAP analysis with Apache Spark.

  March 5, 2017   


Adventures with Apache Spark: How to clone a record

This article discusses how to clone a row

  March 4, 2017   


PDI Password Encryption

This article provides a short intro into PDI password encryption

  March 3, 2017   


Agile Data Integration: Continuous Integration with Jenkins and PDI

This article provides a short intro into using Jenkins with PDI

  February 17, 2017   


Real Time Streaming with Apache Flink and Kafka: Simple Example

This article provides a short intro into the fascinating world of Apache Flink

  December 8, 2016   


Pentaho Data Integration: Streamlined Data Refinery

This article discusses what is behind the Pentaho Streamline Data Refinery marketing buzz

  November 26, 2016   


Pentaho Data Integration: Advances in Real Time Streaming - Real Time SQL

This article discusses the latest developments on real time streaming with PDI

  October 30, 2016   


Rethinking the Snapshot Strategy

This article discusses a different strategy for creating a snapshot

  October 23, 2016   


Apache Flink Streaming: Using Case Classes

This article explains how to use case classes to properly type the data sets

  October 1, 2016   


Real Time Streaming with Apache Flink, ElasticSearch and Kibana: Simple Twitter Example

This article provides a short intro into the fascinating world of Apache Flink

  September 18, 2016   


Getting Started With Flink Streaming API

This article provides a short intro into the fascinating world of Apache Flink

  September 9, 2016   


A short exercise in Apache Spark REPL: Joining fact and dimension data

This article provides a short intro into the fascinating world of Apache Spark

  September 4, 2016   


Pentaho Data Integration: Flexible Parameter Setup for Big Projects

This provides detailled instructions on how to set up PDI Data Services

  June 10, 2016   


MDX: Totals of Ascendants have to reflect sum of the filtered leaf members

This article explains how to write a context sensitive MDX query

  June 4, 2016   


Announcing the MDX Maestro Challenge Series

MDX Challenges every 3 weeks

  May 17, 2016   


PDI Data Services

This provides detailled instructions on how to set up PDI Data Services

  May 2, 2016   


Using Pentaho Data Integration with Docker: Part 1

This provides detailled instructions on how to use PDI with Docker

  April 21, 2016   


Pentaho Data Integration: Reading from Named Pipes

This article takes a look at how easy it is to read from named pipes

  March 11, 2016   


Pentaho Data Integration: The Parameter Object and replacing Parameter Values with Variable Values

This article looks at more complex setup where a unique approach has to be chosen to cater for various parameter and variable needs

  February 20, 2016   


Presto

  February 14, 2016   


Creating DI Execution Logs for Pentaho Data Integration on Hadoop

This article explains how to set up a simple custom PDI logging framework for Hadoop

  February 4, 2016   


Unit testing Pentaho Data Integration jobs and transformations

This article explains how to unit test Pentaho Data Integration jobs and transformations

  January 30, 2016   


Database Version Management (DVM) Process Powered By Pentaho Data Integration

  January 25, 2016   


Generate XML Documents with Pentaho Data Integration

  January 6, 2016   


Pentaho Community Meeting 2015 - Recap

  November 8, 2015   


Pentaho Data Integration - Dynamically Injecting the Metadata Injection - Metadata Driven ETL

  October 31, 2015   


Defining a table layout in Pentaho Report Designer

  October 16, 2015   


Modular ETL with Pentaho Data Integration

  October 15, 2015   


Pentaho Mondrian: Custom Formatting with Cell Formatter

This article covers the basics of using the Mondrian Cell Formatter feature.

  July 29, 2015   


D3 Maps: Part 3

D3 Maps

  July 28, 2015   


D3 Maps: Publishing on the Pentaho BA Server

D3 Maps

  July 28, 2015   


Big Data Snapshot and Accumulating Snapshot

This article explains how to implement an accumulating snapshot on Hadoop

  July 27, 2015   


Geo Data - The fast lane to publishing a Map

  July 8, 2015   


Pentaho Data Integration: How to fix the GLib-CRITICAL problem

This article explains how to fix the GLib-CRITICAL problem in a very easy fashion

  June 7, 2015   


Setting up a Hadoop Dev Environment for Pentaho Data Integration

This article explains how to set up a vanilla Hadoop Distribution and configure Pentaho Data Integration to access it

  June 6, 2015   


Pentaho Community Meetup 2015

This article gives a brief overview of PCM15

  June 6, 2015   


D3 Maps: Getting Started

D3 Maps

  June 6, 2015   


Pentaho Data Integration: Parallelism and Partitioning

In this article we will discuss how implement parallelism and partitioning of data streams in PDI

  April 28, 2015   


Pentaho Data Integration: DB Rollback on Error

In this article we will discuss how implement a transactional behaviour in PDI

  April 28, 2015   


Cascading Parameters in Pentaho Report Designer

In this article we will discuss how implement cascading parameters in PRD

  April 23, 2015   


Pentaho Mondrian: The MDX Generate Function

In this article we will discuss how to use the MDX Generate function

  April 23, 2015   


Pentaho CDE: MDX Parameterization

In this article we will discuss how to parameterize a MDX query in your Pentaho dashboard.

  April 22, 2015   


Pentaho CCC Chart Label Formatting

This article explains various options on how to format a chart label

  April 15, 2015   


Creating a Pentaho CDE Table Add-In

This article explains how to create an add-in for the CDE Table Component to achieve a custom cell presentation behaviour

  March 31, 2015   


Pentaho CCC Core Concepts

This article explains readers, dimensions and visual roles.

  March 29, 2015   


The ultimate guide to Pentaho CCC Context Charts (aka Viewfinder or Sub-Charts)

In this article we will take a look at how to create context charts with Pentaho CCC.

  March 23, 2015   


Pentaho CDE: Global Properties

This article explains how to create global properties for Charts.

  March 23, 2015   


CDE Prototyping: Scriptable JavaScript JSON Data Source

  March 18, 2015   


Mondrian MDX: Intrinsic Member Properties

In this very short article I want to shed some light on intrinsic member proerties

  February 27, 2015   


Using CSS to create PDF Reports

This article explains how to create a PDF report using CSS, Pentaho BA-Server and WeasyPrint

  February 17, 2015   


The ultimate guide to using CCC and CGG with PRD

This article explains how you can use CCC charts with Pentaho Reporting solutions.

  February 10, 2015   


Mondrian 4: Hanger Dimensions (Actual VS Budget and other Scenarios)

Hanger dimensions in Mondrian 4 open up a new world of possibilities. Before we dive into this topic, we will first have a look at calculated members and then discuss a few use cases for hanger dimensions.

  January 15, 2015   


OLAP Cube Member Properties

Have you ever wondered what Member Properties in a multidimensional cube are really good for?

  December 29, 2014   


Advanced Data Modeling Techniques

In this article we will focus on some advanced data modeling techniques to cover many-to-many and parent-to-child relationships

  December 26, 2014   


Migrating from Pentaho BI Server v4 to v5: Path and API changes

This article is not meant to be a migration guide, but more a collection of various aspects of this migration that I thought might be worth noting and which I have not seen mentioned elsewhere.

  November 13, 2014   


Pentaho Data Integration: Rows to Json Output

A short article on how to output a flattened Json structure.

  November 11, 2014   


The Ultimate Guide to Configuring LDAP Security for Pentaho BI Server v5

Setting up LDAP Security can be quite challenging. This article tries to shed some light on the process.

  November 8, 2014   


Pentaho Kettle: Implementing Error Handling for Job and Transformation Executor Steps

Learn how to implement correct error handling for job and transformation executor steps

  October 16, 2014   


D3js: Supercharge your charts with related information

Learn how to create charts with event information

  October 15, 2014   


Mondrian semi-additive measures

This article explains a workaround you can use to implement semi-additive measures in Mondrian

  August 12, 2014   


Mondrian: The meaning of column, columnName and captionColumn Level Attributes

This article explains the various level attributes in detail.

  August 12, 2014   


Tips for using Pentaho Data Integration on Mac OS X

This is a short article on some of the problems you might encounter when working with Pentaho Data Integration on Mac OS X

  August 10, 2014   


Creating nested JSON structures in Pentaho Data Integration

This article discusses creating nested JSON structures with Pentaho Kettle.

  August 10, 2014   


Pentaho Business Analytics Cookbook Review

This is a brief review of the latest book on Pentaho.

  August 10, 2014   


Pentaho Sparkl Tips

This article is a list of various important points when working with Sparkl

  August 10, 2014   


Pentaho London User Meetup 22nd of July

Join us at this meetup, listing to interesting talks and exchange ideas

  July 12, 2014   


Installing Columnar DB MonetDB

Very brief instructions on how to install MonetDB

  July 1, 2014   


Pentaho Dashboards (CDE): Bootstrap styled custom selects

This article explains how to create custom bootstrap styled cascading selectors using JQuery.

  June 28, 2014   


Pentaho Dashboard Framework Basics

This article explains how to set up Pentaho CDF and describes the folder structure as well as how to create a basic dashboard.

  June 25, 2014   


Setting a variable value dynamically in a Pentaho Data Integration job

This article explains an easy way to define a variable in a Pentaho Kettle job

  June 18, 2014   


Pentaho Dashboards CDE: Create your custom Bootstrap table

This article covers the basic of using Bootstrap with Pentaho Dashboards.

  May 21, 2014