Streaming Apache Hop and MQTT
A quick walkthrough on how to use Apache Hop with MQTT
Automated Data Ingestion with AWS S3, EventBridge, Batch and Apache Hop
A tutorial focusing on building a data ingestion solution on AWS
Apache Hop: AWS ECS and AWS Batch
A quick walkthrough on how to get Apache Hop running on AWS ECS and AWS Batch
Apache Hop: Customising the Docker Image
A quick walkthrough on how to customise the Hop Docker Image
Slimming down Apache Hop
A quick walkthrough on how to make Hop as small as possible
Visual Side-by-Side Diffs with Git
Steps on how to set up git with external diff tools
Getting started with Apache Hop on AWS: Part 2 - CloudFormation
We will use AWS CloudFormation to automatically create our environment
Getting started with Apache Hop on AWS: Part 1
We will install Hop on an EC2 instance and source data from S3 and write it to Redshift.
PDI and the H2 Database
A brief overview on how to use the H2 database with PDI
Project Hop: Project and Environment Configuration
A brief overview of how to create environment definitions for Hop projects
Project Hop: A graphical way to build Apache Beam Pipelines
This article explains how to get started with creating Beam pipelines in Project Hop
Project Hop: Testing as Part of your Development
A few ideas on how to improve your development workflow
Project Hop: Create Environment Definitions
A brief overview of how to create environment definitions for Hop projects
Project Hop: Hop on Kubernetes
A brief overview on how to deploy Hop with Kubernetes
Project Hop: Hop on Docker
A brief overview on how to create a Docker image for Hop
Project Hop: Hop into a new World
A brief overview of what's new in Hop
Scheduling a PDI job on Apache Airflow
A tutorial on how to schedule a PDI job via Apache Airflow
Converting PDI Repositories to PDI Standalone Files
This article explains how to convert PDI repos to PDI standalone files
Automated testing of data processes: Part 2
This article explains how to set up an environment for automatically testing data integration processes
Automated testing of data processes: Part 1
This article explains how to set up an environment for automatically testing data integration processes
Installing JDK on MacOS
This article covers the basics of installing the Java Development Kit
Pentaho Data Integration: Unit Testing
This article explains how to use the Pentaho PDI Datasets plugin for unit testing
Pentaho Data Integration/Kettle: Environment Plugin
This article explains how to get started with a dynamic environment setup
Pentaho Data Integration/Kettle: The easy way to create Beam Pipelines
This article explains how to get started with creating Beam pipelines in PDI
Adding non-Marketplace plugins to PDI WebSpoon
This article explains how to add a custom plugin to WebSpoon
Hello Node-RED!
Introduction to Node-RED
Documentation is a Journey
This article explains how to encourage developers to write documentation
Best Practices: Generate Synthetic Data
This tutorial explains an easy way to generate synthetic data
Kubernetes: Scaling Pentaho Server
This article explains how to scale Pentaho Server with Kubernetes on Google Cloud Platform
Kubernetes: Manual and Automatic Volume Provisioning
This article explains how to easily deploy a stateful web app with Kubernetes on Google Cloud Platform
Kubernetes: Deployment Example WebSpoon
This article explains how to easily deploy Webspoon with Kubernetes on Google Cloud Platform
Pentaho Data Integration: WebSpoon on AWS Elastic Beanstalk and adding EBS or EFS Storage Volumes
This article explains how to get WebSpoon running on AWS and how to add storage volumes
Pentaho Data Integration v8: Getting started with the Spark Execution Engine
This article explains how to configure PDI to run with Spark
Pentaho Standardised Git Repo Setup First Release
A few comments on the first release
Pentaho Community Meeting 2017 Impressions
A short write-up of my impressions on this years PCM
Pentaho VizAPI: Create a custom visualization once and use it everywhere
This article explains how to create custom visualizations with Pentahos VizAPI
Mondrian: Averages in Aggregate Tables
This article explains how to make averages work with Mondrian aggregated tables
Pentaho Data Integration: Restartable Job
This article explains how to easily add restartablility to Pentaho jobs
Big Data Geospatial Analysis with Apache Spark, GeoMesa and Accumulo - Part 4: Ingesting Data with Spark SQL
This article walks you through practical GeoMesa examples.
Big Data Geospatial Analysis with Apache Spark, GeoMesa and Accumulo - Part 3: Practical Examples
This article walks you through practical GeoMesa examples.
Big Data Geospatial Analysis with Apache Spark, GeoMesa and Accumulo - Part 2: Basics
This article walks you through the basics of Accumulo and GeoMesa.
Big Data Geospatial Analysis with Apache Spark, GeoMesa and Accumulo - Part 1: Installation
This article walks you through the installation procudure for GeoMesa.
Pentaho Data Integration 7.1: Getting started with the Spark Execution Engine
This article explains how to configure PDI to run with Spark
Apache Spark: Mapping Scala Date to Spark SQL Date
20 Seconds for Embedding a CDE Dashboard
This article discusses how to embed a CDE dashboard in an external site
Pentaho Data Integration: Automatically source Metadata for ETL Metadata Injection
This article discusses various approaches on how to automatically source metadata from files and database tables to inject it later on into transformation templates
Log4J and Pentaho
MDX ToggleDrillState
We discuss an elegent use case for ToggleDrillState
MDX DRILLDOWNMEMBER and CDE Dashboard Tables
We discuss an elegent use case for DRILLDOWNMEMBER within a CDE table
Apache Spark: Retrieving Data from a REST API and converting JSON to a Spark Dataset
This is a very short article explaining how to retrieve data from a REST API and converting the retrieved JSON data to a Spark Dataset
Mondrian: Modeling a Multivalued Dimension Attribute
This article explain how to model multivalued dimensions with Mondrian
Adventures with Apache Spark: Creating a Snapshot Table
This article discusses how to create a snapshot table for OLAP analysis with Apache Spark.
Adventures with Apache Spark: How to clone a record
This article discusses how to clone a row
PDI Password Encryption
This article provides a short intro into PDI password encryption
Agile Data Integration: Continuous Integration with Jenkins and PDI
This article provides a short intro into using Jenkins with PDI
Real Time Streaming with Apache Flink and Kafka: Simple Example
This article provides a short intro into the fascinating world of Apache Flink
Pentaho Data Integration: Streamlined Data Refinery
This article discusses what is behind the Pentaho Streamline Data Refinery marketing buzz
Pentaho Data Integration: Advances in Real Time Streaming - Real Time SQL
This article discusses the latest developments on real time streaming with PDI
Rethinking the Snapshot Strategy
This article discusses a different strategy for creating a snapshot
Apache Flink Streaming: Using Case Classes
This article explains how to use case classes to properly type the data sets
Real Time Streaming with Apache Flink, ElasticSearch and Kibana: Simple Twitter Example
This article provides a short intro into the fascinating world of Apache Flink
Getting Started With Flink Streaming API
This article provides a short intro into the fascinating world of Apache Flink
A short exercise in Apache Spark REPL: Joining fact and dimension data
This article provides a short intro into the fascinating world of Apache Spark
Pentaho Data Integration: Flexible Parameter Setup for Big Projects
This provides detailled instructions on how to set up PDI Data Services
MDX: Totals of Ascendants have to reflect sum of the filtered leaf members
This article explains how to write a context sensitive MDX query
Announcing the MDX Maestro Challenge Series
MDX Challenges every 3 weeks
PDI Data Services
This provides detailled instructions on how to set up PDI Data Services
Using Pentaho Data Integration with Docker: Part 1
This provides detailled instructions on how to use PDI with Docker
Pentaho Data Integration: Reading from Named Pipes
This article takes a look at how easy it is to read from named pipes
Pentaho Data Integration: The Parameter Object and replacing Parameter Values with Variable Values
This article looks at more complex setup where a unique approach has to be chosen to cater for various parameter and variable needs
Presto
Creating DI Execution Logs for Pentaho Data Integration on Hadoop
This article explains how to set up a simple custom PDI logging framework for Hadoop
Unit testing Pentaho Data Integration jobs and transformations
This article explains how to unit test Pentaho Data Integration jobs and transformations
Database Version Management (DVM) Process Powered By Pentaho Data Integration
Generate XML Documents with Pentaho Data Integration
Pentaho Community Meeting 2015 - Recap
Pentaho Data Integration - Dynamically Injecting the Metadata Injection - Metadata Driven ETL
Defining a table layout in Pentaho Report Designer
Modular ETL with Pentaho Data Integration
Pentaho Mondrian: Custom Formatting with Cell Formatter
This article covers the basics of using the Mondrian Cell Formatter feature.
D3 Maps: Part 3
D3 Maps
D3 Maps: Publishing on the Pentaho BA Server
D3 Maps
Big Data Snapshot and Accumulating Snapshot
This article explains how to implement an accumulating snapshot on Hadoop
Geo Data - The fast lane to publishing a Map
Pentaho Data Integration: How to fix the GLib-CRITICAL problem
This article explains how to fix the GLib-CRITICAL problem in a very easy fashion
Setting up a Hadoop Dev Environment for Pentaho Data Integration
This article explains how to set up a vanilla Hadoop Distribution and configure Pentaho Data Integration to access it
Pentaho Community Meetup 2015
This article gives a brief overview of PCM15
D3 Maps: Getting Started
D3 Maps
Pentaho Data Integration: Parallelism and Partitioning
In this article we will discuss how implement parallelism and partitioning of data streams in PDI
Pentaho Data Integration: DB Rollback on Error
In this article we will discuss how implement a transactional behaviour in PDI
Cascading Parameters in Pentaho Report Designer
In this article we will discuss how implement cascading parameters in PRD
Pentaho Mondrian: The MDX Generate Function
In this article we will discuss how to use the MDX Generate function
Pentaho CDE: MDX Parameterization
In this article we will discuss how to parameterize a MDX query in your Pentaho dashboard.
Pentaho CCC Chart Label Formatting
This article explains various options on how to format a chart label
Creating a Pentaho CDE Table Add-In
This article explains how to create an add-in for the CDE Table Component to achieve a custom cell presentation behaviour
Pentaho CCC Core Concepts
This article explains readers, dimensions and visual roles.
The ultimate guide to Pentaho CCC Context Charts (aka Viewfinder or Sub-Charts)
In this article we will take a look at how to create context charts with Pentaho CCC.
Pentaho CDE: Global Properties
This article explains how to create global properties for Charts.
CDE Prototyping: Scriptable JavaScript JSON Data Source
Mondrian MDX: Intrinsic Member Properties
In this very short article I want to shed some light on intrinsic member proerties
Using CSS to create PDF Reports
This article explains how to create a PDF report using CSS, Pentaho BA-Server and WeasyPrint
The ultimate guide to using CCC and CGG with PRD
This article explains how you can use CCC charts with Pentaho Reporting solutions.
Mondrian 4: Hanger Dimensions (Actual VS Budget and other Scenarios)
Hanger dimensions in Mondrian 4 open up a new world of possibilities. Before we dive into this topic, we will first have a look at calculated members and then discuss a few use cases for hanger dimensions.
OLAP Cube Member Properties
Have you ever wondered what Member Properties in a multidimensional cube are really good for?
Advanced Data Modeling Techniques
In this article we will focus on some advanced data modeling techniques to cover many-to-many and parent-to-child relationships
Migrating from Pentaho BI Server v4 to v5: Path and API changes
This article is not meant to be a migration guide, but more a collection of various aspects of this migration that I thought might be worth noting and which I have not seen mentioned elsewhere.
Pentaho Data Integration: Rows to Json Output
A short article on how to output a flattened Json structure.
The Ultimate Guide to Configuring LDAP Security for Pentaho BI Server v5
Setting up LDAP Security can be quite challenging. This article tries to shed some light on the process.
Pentaho Kettle: Implementing Error Handling for Job and Transformation Executor Steps
Learn how to implement correct error handling for job and transformation executor steps
D3js: Supercharge your charts with related information
Learn how to create charts with event information
Mondrian semi-additive measures
This article explains a workaround you can use to implement semi-additive measures in Mondrian
Mondrian: The meaning of column, columnName and captionColumn Level Attributes
This article explains the various level attributes in detail.
Tips for using Pentaho Data Integration on Mac OS X
This is a short article on some of the problems you might encounter when working with Pentaho Data Integration on Mac OS X
Creating nested JSON structures in Pentaho Data Integration
This article discusses creating nested JSON structures with Pentaho Kettle.
Pentaho Business Analytics Cookbook Review
This is a brief review of the latest book on Pentaho.
Pentaho Sparkl Tips
This article is a list of various important points when working with Sparkl
Pentaho London User Meetup 22nd of July
Join us at this meetup, listing to interesting talks and exchange ideas
Installing Columnar DB MonetDB
Very brief instructions on how to install MonetDB
Pentaho Dashboards (CDE): Bootstrap styled custom selects
This article explains how to create custom bootstrap styled cascading selectors using JQuery.
Pentaho Dashboard Framework Basics
This article explains how to set up Pentaho CDF and describes the folder structure as well as how to create a basic dashboard.
Setting a variable value dynamically in a Pentaho Data Integration job
This article explains an easy way to define a variable in a Pentaho Kettle job
Pentaho Dashboards CDE: Create your custom Bootstrap table
This article covers the basic of using Bootstrap with Pentaho Dashboards.