Wednesday, December 18, 2013

Pentaho overview using MongoDB (Oporto MUG #2)

 
I'm writing this post for share the presentation and resources of my talk on the second meetup Oporto MongoDB User Group. The talk consist on demonstrate the integration between Pentaho and MongoDB, on the three aspects: ETL, Reporting and Dashboarding.
Look the presentation:


The data I use is just for demonstration, I didn't care with performance, content or data quality, the goal are just demonstrate how you can integrate Pentaho with MongoDB with simple examples. And show the potential and how easy is integrating with other systems, in this case Google Maps API.

In resume, if you want integrate Pentaho with MongoDB, there're two options for do it:
  • ETL: With Pentaho Data Integration (aka kettle) you can use the "MongoDB input" and "MongoDB output" steps for get and save data. The ETL transformations can be use on Pentaho Reporting or CDE using CDA.
  • Programming: This is the best choose for me because I'm familiar with Java development. I know, using ETL steps is more intuitive but as any platform or product, you have more performance if you design exactly what you want. So, in ETL for programming you can use the "User Defined Java Class" step. In Report designer you can use the option Scriptable or a MongoDB datasource. Finally, in dashboards using CDE, that's use CDA for data access, you can use the "scriptable over scripting" datasource and the language beanshell, for example.

This is the link with resources of presentation. Note: You need be familiar with configuration those resources to works fine.

Thursday, September 26, 2013

How publish Saiku Analytics on OpenShift


Saiku Analytics is a great open source server or Pentaho plugin for explore and vizualize data!


About Saiku: "Saiku was founded in 2008 by Tom Barber and Paul Stoellberger. Originally called the Pentaho Analysis Tool, if started life as a basic GWT based wrapper around the OLAP4J library. Over the years it has evolved, and after a complete rewrite in 2010, it was reborn as Saiku.
Saiku offers a user friendly, web based analytics solution that lets users, quickly and easily analyse corporate data and create and share reports. The solution connects to a range of OLAP Servers including Mondrian, Microsoft Analysis Services, SAP BW and Oracle Hyperion and can be deployed rapidly and cost effectively to allow users to explore data in real time." - Meteorite

About OpenShift: "OpenShift is a cloud computing platform as a service product from Red Hat. A version for private cloud is named OpenShift Enterprise.
The software that runs the service is open-sourced under the name OpenShift Origin, and is available on GitHub. Developers can use Git to deploy web applications in different languages on the platform.
OpenShift also supports binary programs that are web applications, so long as they can run on Red Hat Enterprise Linux. This allows the use of arbitrary languages and frameworks. OpenShift takes care of maintaining the services underlying the application and scaling the application as needed." - Wikipedia


In this post I'll demonstrate how you can publish Saiku Analytics server on OpenShift platform using a free account that provide 1GB storage per gear. Using another words (commercial words) is put your business analytics on the sky (ok, on the cloud :) ) by free (or low cost depends of your bussiness).

After create your account on OpenShift website, you need install/configure OpenShift RHC Client Tools. I'll describe the steps for install and configure in Linux but you can follow the instructions on this link.
The steps on Linux Ubuntu are:
  1. sudo apt-get install ruby-full rubygems git-core
  2. sudo gem install rhc
  3. rhc setup (put your credentials)

The next step is create your application on OpenShift, you have three ways for do that. One using OpenShift website with your account (as I'll demonstrate), other way is using RHC client tool and the other using JBoss Developer Studio (you can find out more about those ways in this link).
The steps using website are:

  1. Click on ADD APPLICATION button;
  2. Choose which type application, in this case I choose Tomcat 6 (JBoss EWS 1.0);
  3. Write the application name, in this case I wrote "saiku".


The next step is download Saiku WAR's (UI and Backend) file to deploy on OpenShift. After that, you need do the following steps on your command line:

  1. rhc git-clone saiku
  2. cd saiku
  3. git rm -r src pom.xml
  4. cp <war-path-file>/saiku-ui-2.5.war webapps/ROOT.war
  5. cp <war-path-file>/saiku-webapp-2.5.war webapps/saiku.war
  6. git add .
  7. git commit -m 'Deploy Saiku 2.5'
  8. git push

And is done! You can check in your URL http://<application-name>-<username>.rhcloud.com. You should have something like: http://saiku-latinojoel.rhcloud.com/ .

Enjoy. ;)




Interested Links:

Saturday, July 27, 2013

A new Pentaho book

A new book about Pentaho is out! The name is Instant Pentaho Data Integration Kitchen.



"The book is about kitchen and how to use the PDI's command line tools efficiently to help people in their day to day operations with Pentaho Data Integration. It is a practical book and it seemed relatively easy to write.Sergio Ramazzina

I was the technical reviewer for this book and I was very happy because the publisher choose me and is another experience that I really liked.

Also, I want to wish them congratulations to the author Sergio Ramazzina to write this book and contribute to opensource community.


Monday, July 1, 2013

PDI Apple Push Notifications Plugin is available on Pentaho Marketplace

Since 14 of June, the PDI Apple push notification is available on Pentaho Marketplace. Now, PDI can send push notifications to the two most popular smartphones platforms.

Check out what Matt Carters (Chief Data Integration for Pentaho) say about plugins available on Pentaho marketplace in the Pentaho Big Data Architecture presentation of Pentaho London User Group event (at 18 minute): http://skillsmatter.com/podcast/ajax-ria/pentaho-hadoop-user-stories-beer-pizza-and-more

Interested Links:
Pentaho Data Integration Marketplace Wiki: Link
Pentaho Data Integration Marketplace Source: Link
PDI Apple Push Notifications Plugin Wiki: Link
PDI Apple Push Notifications Plugin Source: Link
PDI Apple Push Notifications Plugin Artifacts: Link
PDI Manager Android App: Link

Thursday, June 6, 2013

Run ETL with Pentaho Data Integration using package files

The Kettle have a small functionality to run ETL, it uses Apache VFS that let you access a set of files from inside an archive and use them directly in your processes. However, you can use this for execute ETL in somewhere on the web.


Run in file system

I create this little sample (a job that executes a transformation). And I compress this two in the zip file.
So, I have this zip file on that path in my own computer: C:\Users\latinojoel\Desktop\sample.zip
The command line I need to do for execute ETL, is looks like that:
~\data-integration>Kitchen.bat -file=zip:\\"C:\Users\latinojoel\Desktop\sample.zip!/job.kjb" -level=Detail -param:MSG="Wow, It's works. Very funny! :-)"


Run from web resource

With Apache VFS, you can run a zip file from web too.
For example, you can access of my zip file using this URI.
The command line you need to do is that:
~\data-integration>Kitchen.bat -file="zip:https://dl.dropboxusercontent.com/u/54031846/sample.zip!/job.kjb" -level=Detail -param:MSG="Wow, It's works. Very funny! :-)"


Is available for Pan too. A sample:
~\data-integration>Pan.bat -file="zip:https://dl.dropboxusercontent.com/u/54031846/sample.zip!/transf.ktr" -level=Detail -param:MSG="Wow, It's works. Very funny! :-)"

Sunday, May 26, 2013

Pentaho Data Integration notifier job state

This a little sample how create a job notifier of another job is terminated with error or success.
The first step is create a job sample that represents job of ETL process (for example, a job that is responsible to populate DW).

Create a job example

The first step is create a transformation that receive from ${VALUE} variable and if that value match with 'Y' the transformation execute successfully else ${VALUE} have a different value of 'Y' the transformation execute with error.
See the following transformation workflow:

The second step is create job that execute the above transformation. See the following job workflow:

Create the main job

Create a transformation notifier

In this transformation you'll need define how you can be notified. In this sample you would be notified by email, android push notification (using PDI Manager) and apple push notification (is a new plugin, will be available in the few days).
Transformation receive a parameter that indicate if ETL executes with success or not. And basis on that, the message notification alters.
See the following ETL workflow:


Create a job notifier

This job is responsible to execute any job on same folder, what you need is pass by parameter the job name that you want execute. If the job run with success he'll execute above transformation passing 'Success' on parameter STATE.
See the following job workflow:

See the following how execute the job:



You can download all files on this in this link (note: you need configure the kettle.properties file).



Monday, March 18, 2013

PDI Android Push Notifications Plugin is available on Pentaho Marketplace

Android Push Notifications StepIs a week a go that PDI Android Push Notifications Plugin is available on Pentaho Marketplace. The current version is 0.9-GA.

More surprises are coming.


Interested Links:
Pentaho Data Integration Marketplace Wiki: Link
Pentaho Data Integration Marketplace Source: Link
PDI Android Push Notifications Plugin Wiki: Link
PDI Android Push Notifications Plugin Source: Link
PDI Android Push Notifications Plugin Artifacts: Link
PDI Manager Android App: Link

Wednesday, February 6, 2013

Now is possible you receive push notifications from Pentaho Data Integration


As I said in my first post in my blog, there are two things I want share with you:
  • PDI Android Push Notifications Plugin: This is a plugin that permit to you send push notifications from Pentaho Data Integration to any android application that enable GCM service. This plugin is written in Scala!!! Check out: https://github.com/latinojoel/pdi-android-pushnotifications. Note: this plugin yet is a release candidate.
  • PDI Manager Android App: Is a Android App that enable to you receive push notifications from PDI.

About PDI Android Push Notification Plugin


 Instalation
  1. You need Pentaho Data Integration installed;
  2. Download  the plugin from sourceforce, you can find in this link;
  3. Stop your Pentaho Data Integration if it's running;
  4. Uncompress AndroidPushNotification file;
  5. Copy AndroidPushNotification folder to <pdi-folder-installation>/plugins/steps; 
  6. Start your Data Integration and enjoy :-).
 Configuration
You need to be familiar with GCM (Google Cloud Messaging for Android), you can read more about this service in this link. As I say, this plugin is designed for send push notifications to any android application because all most important parameters are configurable.


Main Options screen

Properties screen

For sending push notification to PDI Manager Android App
In this moment, PDI Manager is very young :-), so it's only configured to receive four properties inside the push Status, Project, Date and Data
Use the follow API Key: 
AIzaSyAh7Nf-N7bE4xIwsVb7nk4mmls_yEQwZQA
See the image example below:
A little sampe.

In the next few days I will provide a better documentation. For now, I think this post help you test this new solution.

PS: Feel free to comment and if you find a bug or you have a idea for this solution, you can create a issue or pull request on GitHub
.


Tuesday, February 5, 2013

How create zip files with Maven

In some maven projects is important generate a  compressed file, for example to send by email.
So, in this post I will show you how you can generate compressed files using Maven.
In you pom.xml file you need incorporate the maven-assembly-plugin plugin, something like that:


<plugin>
 <groupId>org.apache.maven.plugins</groupId>
 <artifactId>maven-assembly-plugin</artifactId>
 <configuration>
  <descriptor>src/assembly/bin.xml</descriptor>
  <finalName>${plugin.name}</finalName>
 </configuration>
 <executions>
  <execution>
   <phase>package</phase>
   <goals>
    <goal>single</goal>
   </goals>
  </execution>
 </executions>
</plugin>

So, as you can see, you need define a bin.xml file in src/assembly/ path, you can define the name and path that you want. Is only mandatory have a specific structure content. That you can see a example below:

<assembly
 xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
 <formats>
  <format>zip</format>
  <format>tar</format>
  <format>tar.gz</format>
  <format>tar.bz2</format>
 </formats>
 <includeBaseDirectory>false</includeBaseDirectory>
 <dependencySets>
  <dependencySet>
   <includes>
    <include>groupId.example:artifactId.example</include>
   </includes>
   <unpack>false</unpack>
   <scope>runtime</scope>
   <outputDirectory>${plugin.name}</outputDirectory>
  </dependencySet>
 </dependencySets>
 <fileSets>
  <fileSet>
   <directory>resources</directory>
   <outputDirectory>${plugin.name}</outputDirectory>
   <includes>
    <include>file-example.xml</include>
    <include>file-example.png</include>
   </includes>
  </fileSet>
  <fileSet>
   <directory>target</directory>
   <outputDirectory>${plugin.name}</outputDirectory>
   <includes>
    <include>build-*.jar</include>
   </includes>
  </fileSet>
 </fileSets>
</assembly>

That example, permit create zip, tar, tar.gz, tar.bz2. If you want put all dependencies on the compress file, you don't need define includes tag.

More details about maven-assembly-pluginhttp://maven.apache.org/plugins/maven-assembly-plugin



I hope that information is useful for you. Feel free to comment this post.

Saturday, February 2, 2013

Coming something to Pentaho Data Integration


This is my first post in my personal blog, so you are ask why? The answer is because on last days I developed something that I think is very useful to pentaho community. In this moment I don’t want talk very must about that but I can say it’s to improve the quality work of ETL developers.
Coming coming…. :-)