PDI and the H2 Database

  16 Oct 2020


Intro

Pentaho Data Integration ships with a H2 Database Driver. H2 is quite often used as an embedded database and has a lot of features. In tandem with PDI it is particular useful for prototyping and this is what I’ll be focusing on in this very brief article. I recommend going through this H2 tutorial since I won’t cover any basics on H2 here.

H2 can be used as an embedded DB as well as standalone server, but it can also read from and write to CSV files.

H2 CSV

As previously mentioned, H2 can read from and write to CSV files. For this to work you have to create one database connection per CSV file:

Connection details:

Type Value
Connection URL File jdbc:h2:file://path/to/file
Driver Class org.h2.Driver
User H2

As part of the connection URL you can define a SQL statement ( via the INIT parameter) which will be executed when the connection is initiated. Example:

jdbc:h2:file://tmp/h2/sales;DB_CLOSE_DELAY=-1;MVCC=TRUE;FILE_LOCK=NO;INIT=CREATE TABLE IF NOT EXISTS sales (sale_date DATE, department VARCHAR(50), product_number INT, quantitiy_sold INT)

So within PDI just create a new Generic JDBC connection with these details and then simply use it with a Table Input or Table Output step.

Note: For the curious reader, it is also worth right clicking on the connection in the PDI View Panel and clicking on the Explore option.

H2 Server

Connection details:

Type Value
Connection URL In-Memory DB Example: jdbc:h2:tcp://localhost/~/test. This will create a H2 db file at ~/test.
Driver Class org.h2.Driver

Navigate to the lib folder within your PDI installation and locate the H2 driver:

lib % ls *h2*
h2-1.2.131.jar

Since we know the exact version of the driver now, we can issue following command (adjust to your version):

java -jar h2-1.2.131.jar

This should automatically open following URL in your web browser:

http://localhost:8082

image-20201016112907963

As shown in the screenshot above, pick the Generic H2 (Server) setting, provide a name, specify the driver class and a JDBC connection URL similar to this one:

jdbc:h2:tcp://localhost/~/test;INIT=CREATE SCHEMA IF NOT EXISTS myschema

Click on Test Connection.

Then you can also click on Connect which will open a nice interface to run SQL statements:

image-20201016113026130

You will notice that our schema got already created.

At this point you are ready to go and you can define a new Generic JDBC connection in PDI with fairly the same settings:

image-20201016113253984

And from here you just interact as usual with the DB.

There are many more options available with H2, in example you can even make it run in a mode that simulates other DBs like MySQL or PostgreSQL. Take a look at their documentation to find out more.

comments powered by Disqus