Intro
Pentaho Data Integration ships with a H2 Database Driver. H2 is quite often used as an embedded database and has a lot of features. In tandem with PDI it is particular useful for prototyping and this is what I’ll be focusing on in this very brief article. I recommend going through this H2 tutorial since I won’t cover any basics on H2 here.
H2 can be used as an embedded DB as well as standalone server, but it can also read from and write to CSV files.
H2 CSV
As previously mentioned, H2 can read from and write to CSV files. For this to work you have to create one database connection per CSV file:
Connection details:
Type | Value |
---|---|
Connection URL File | jdbc:h2:file://path/to/file |
Driver Class | org.h2.Driver |
User | H2 |
As part of the connection URL you can define a SQL statement ( via the INIT
parameter) which will be executed when the connection is initiated. Example:
jdbc:h2:file://tmp/h2/sales;DB_CLOSE_DELAY=-1;MVCC=TRUE;FILE_LOCK=NO;INIT=CREATE TABLE IF NOT EXISTS sales (sale_date DATE, department VARCHAR(50), product_number INT, quantitiy_sold INT)
So within PDI just create a new Generic JDBC connection with these details and then simply use it with a Table Input or Table Output step.
Note: For the curious reader, it is also worth right clicking on the connection in the PDI View Panel and clicking on the Explore option.
H2 Server
Connection details:
Type | Value |
---|---|
Connection URL In-Memory DB | Example: jdbc:h2:tcp://localhost/~/test . This will create a H2 db file at ~/test . |
Driver Class | org.h2.Driver |
Navigate to the lib
folder within your PDI installation and locate the H2 driver:
lib % ls *h2*
h2-1.2.131.jar
Since we know the exact version of the driver now, we can issue following command (adjust to your version):
java -jar h2-1.2.131.jar
This should automatically open following URL in your web browser:
http://localhost:8082
As shown in the screenshot above, pick the Generic H2 (Server) setting, provide a name, specify the driver class and a JDBC connection URL similar to this one:
jdbc:h2:tcp://localhost/~/test;INIT=CREATE SCHEMA IF NOT EXISTS myschema
Click on Test Connection.
Then you can also click on Connect which will open a nice interface to run SQL statements:
You will notice that our schema got already created.
At this point you are ready to go and you can define a new Generic JDBC connection in PDI with fairly the same settings:
And from here you just interact as usual with the DB.
There are many more options available with H2, in example you can even make it run in a mode that simulates other DBs like MySQL or PostgreSQL. Take a look at their documentation to find out more.