DEFINING NUMBER RANGE IN PENTAHO DATA INTEGRATION
In arithmetic, the range of a set of data is the difference between the largest and smallest values.
However, in descriptive statistics, this concept of range has a more complex meaning. The range is the size of the smallest interval which contains all the data and provides an indication of statistical dispersion. It is measured in the same units as the data. Since it only depends on two of the observations, it is most useful in representing the dispersion of small data sets.
In the following example we have defined a range between two date values to address the issue of a project’s performance through Pentaho.
1. Initially we took a Data Grid step where we have defined 3 columns namely project_name, start_date ,end_date with certain values stored in it.
2. In the following step we have taken in a calculator step in which we have made some calculations where we are adding a new field which will contain the value for the difference between dates.
If we preview the data until this point we will have the output as shown below.
3. Now we add a NUMBER RANGE step to the calculator step to find the range between min and max values and we have categorized the ranges in terms of excellent, good and average.
4. Finally we push the output to a dummy step where you can see the final output by previewing. One thing to notice in the output is that the field which contains no end_Date has range value as Unknown.