Using Regular Expressions to solve Date Format Issue in Pentaho Kettle

One of the most interesting and confusing data types that SQL Server supports is datetime. SQL Server comes with the following data types for storing a date or a date/time value in the database:

  • DATE – format YYYY-MM-DD
  • DATETIME – format: YYYY-MM-DD HH:MI:SS
  • SMALLDATETIME – format: YYYY-MM-DD HH:MI:SS
  • TIMESTAMP – format: a unique number

But it also brings with it some formatting issues that comes across in our data-ridden lives. One such issue of formatting is the date-month-year format which creates a number of inconsistencies during data migration or similar related workflow.

The way out is by using Regular Expressions to solve Date Format Issue in Pentaho Kettle

1.In our excel file all the LastUpdatedDt column’s values are in one format i.e dd-mm-yyyy except for one value which is in mm/dd/yyyy format.

pentaho_screenshot1

2. Now we have added a data validator step in which we refined our data according to the regular expression you can see underneath. According to the expression only the date format with the year ranging from 1800 to 2099 is eligible to fit the criteria with the format being dd-mm-yyyy . Other format or date values which are outside this range will be thrown as an error.

pentaho_screenshot2

3. Finally we throw the perfect date value to truedata step and the false one to the falsedata step as shown below.

pentaho_screenshot3

 

pentaho_screenshot4

 

You must be logged in to post a comment.