April 13, 2011

Touring Pentaho CBF part 1

What is it?

The Pentaho Community Build Framework (CBF) is an ant script that can compile, setup and deploy the Pentaho BI-server. The CBF allows the central management of multiple server configurations, such as development and production, for multiple projects. The local copy of the Pentaho bi-server source files are not edited or changed in any way and each bi-server project has its own directory of changes for the bi-server, such as the web.xml or context.xml files, and its own solution directory. When run, the CBF apples the project specific changes (called "patches") to the bi-server, and deploys the project. The CBF was first created by Pedro Alves.

How does it work?

The root of the CBF directory needs a copy of the Pentaho sources file, a copy of the Tomcat source files, and one or more directories for each bi-server project. Each of these projects must be in a directory called project-XXXX where XXXX is the name of the project. For example, if there are three projects named "client1", "client2", and "testing" there should be three directories called project-client1, project-client2, and project-testing. When the CBF is run, two additional directories are created: a build directory called target-build for the compiled bi-server, and a directory where the final bi-server is prepared for distribution called target-dist. Here is the directory structure of the CBF looks like:


cbf

├── pentaho
├── project-test
│   ├── config
│   ├── patches
│   └── solution
├── target-build
├── target-dist
│   ├── administration-console
│   ├── licenses
│   └── server
└── tomcat


Every project directory has three sub directions: one for the CBF configuration (config), one for the project solutions (solutions), and one for any changes to be made to the bi-server (patches) when it is deployed.

When the ant script runs, the patches get applied by copying the entire patches directory on top of the root CBF directory. Therefore, in order for the patches to get applied correctly, the patches directory must mirror the structure of the root CBF directory; changes to the final bi-server need to be in a folder called target-dist and changes to the project solution directory (such as changes in the solution's system or admin directory) need to be a solution directory under a directory with the same name as the project. For example:


project-test

├── config
├── patches
│   ├── project-test
│   │   └── solution
│   │       └── system
│   └── target-dist
│       └── server
│           ├── conf
│           └── webapps
│               └── pentaho
│                   ├── META-INF
│                   └── WEB-INF
└── solution

Setup

To download the Pentaho BI-server sources, you can use subversion:

svn co svn://source.pentaho.org/svnroot/bi-platform-v2/tags/3.8.0-stable/

I renamed the folder to "pentaho":

mv 3.8.0-stable pentaho

Download Tomcat, the CBF, and make the project directories.

The main configuration for a project is located in the config direcory and is called build.properties. Here is an example:

#######################################
## GENERIC PROPERTIES
######################################

javac.path = /usr/bin/javac

# Solution path. Required for compile time
solution.path = ~/cbf/project-test/solution

# Solution path required for runtime. Defaults to the above value but can be different if we're
# deploying to another place. Should be used in WEB-INF/web.xml in the solution-path parameter
solution.deploy.path = NOT SET

#accepted values: tomcat or jboss
server.name = tomcat

# Tomcat 5.5 path:
tomcat.path = ~/cbf/tomcat

# Copy the sample solutions to our project directory? true | false
copy.pentaho.samples = true

# Java options for the run command
java.opts = -Xmx512m -XX:MaxPermSize=512m -Xrunjdwp:transport=dt_socket,address=8765,server=y,suspend=n

# to allow 3.8 to build correctly
project.revision=3.8.0-stable
dependency.pentaho-reporting.revision=3.8.0-stable

#####################################
## PROJECT DIRECTORIES - The defaults usually work fine
######################################
pentaho.dir = pentaho/
pentaho.build.dir = target-build/

#####################################
## DEPLOY OPTIONS
######################################
deploy.mode = rsync
deploy.args = -avz --exclude '.svn/' --exclude '*.log' --exclude 'work/' --exclude 'temp_user/' --exclude 'temp/' --exclude 'tmp/'
deploy.dest.server = user@localhost:/Path/To/Deploy/
deploy.dest.solution = user@localhost:/Path/To/Solution/

#####################################
## DYNAMIC TOKENS TO BE USED AS FILTERS
#####################################

DB_LOCATION = 127.0.0.1
DB_USER = someone
DB_PASSWORD = bar
BASE_URL = http://127.0.0.1:8080/pentaho

Finally, there can be different configurations for the project. Each configuration needs its own file in teh config directory called build-XXX.properites. For example, build-testing.properties, or build-production.properties.

An Example

After making the appropriate changes to the context.xml for a project, it should go in this direcory:

project-test/patches/target-dist/server/webapps/pentaho/META-INF

And the web.xml should go here:

project-test/patches/target-dist/server/webapps/pentaho/WEB-INF

A Second Example

The Community Dashboard Editor (CDE) uses files with the .wcdf  extension. The bi-server cannot apply ACLs on those files without making a change to the pentaho.xml file. The pentaho.xml file is located in the solution's system folder, but when the CBF compiles the bi-server, the solution system folder is written over. Any project specific changes needed in pentaho.xml must be put in the patches directory. So make the diretory, make a copy of the pentaho.xml file:

mkdir patches/project-test/solution/system/
cp ./solution/system/pentaho.xml patches/project-test/solution/system/


and make the following change in patches/project-test/solution/system/pentaho.xml:

<acl-files> xaction,url,prpt,xdash,xcdf,wcdf</acl-files>

Next time: Running it

April 8, 2011

Playing with LucidDB: namespace and users

One of the challenges (and part of the fun) of using a relatively new open source project, is useing the software despite the lack of complete documentation. The nice thing about open source is that if you have a question, there usually is a discussion board or list-serve where you can ask the creator directly. In the spirit of openness, I am posting some of little pointers about using the column based open source database LucidDB especially comparing to to MySQL. There is nothing new here, but if you come to LucidDB from MySQL there are a few (minor) gotchas that is would be nice to avoid. So here is a post about the LucidDB namespace and users. (BTW, LucidDB's approach to namespace and users should be applicable to many databases)

LucidDB is a column-store database designed for data warehousing and business intelligence. It is open source, and has a growing community, including a new corporate sponsor: Dynamo BI. Column-store databases are faster than row-store databases when executing the kinds of queries that are common in data warehouses. There are a number of speed comparisons of LucidDB on the web, and it always does rather well. Additionally, Lucid as a number of interesting features such as the ability to connect to other databases (and even Hadoop!) and run queries on the remote database.

Navigation and basic interaction:

Installing LucidDB is pretty easy, and there are directions on the LudicDB site, so let's start with some basic navigation, and compare it to MySQL.

In MySQL, the highest level of a namespace is the "database." So a single project might have a single database, but there might be other databases running on the same MySQL instance. Any table can be referred to by its qualified name as in:

select * from databaseName.tableName

When you refer to a table without a database name, MySQL will assume you are referring to the presently used database. This is equivalent to the above query.

use databaseName;
select * from tableName;


The highest level of the LucidDB namespace is "catalog" and under catalog is "schema." (I think this is closer to the SQL standard than MySQL.) Here is an example of a LucidDB query with a qualified name:

select * from catalogName.schamaName.tableName

Like MySQL, LucidDB assumes the present catalog and schema when you refer to a table without a catalog and schema. You can change catalog or schema in LucidDB by using the "SET" command (Note the single quotes around the catalog and schema names!):

set catalog 'catalogName';
set schema 'schemaName';
select * from tableName;


If you get lost, MySQL has a function to tell you what database you are in:

select database();

LucidDB does not have an equivalent function (though you could write one if you want). Instead, you must query one of the special systems views. These views expose information about the state of the system. For the current catalog and current schema, you can query the user_session_parameters view to get both:

select * from LOCALDB.sys_root.user_session_parameters where param_name in ('schemaName', 'catalogName');

You can use the same system view to get a lot of information about your current session as well.

In Mysql, the SHOW command is useful to see structure of the database (and a list of available databases). For example:

SHOW databases;
SHOW TABLES;
SHOW VIEWS;


(PostgresQL has its own commands: \dt to show tables). To get similar information from LucidDB you must again go to the systems views. The view contain all the information about tables, views, triggers, databases, etc. Refer to this link for details about all the information you can see, but here are some examples:

SELECT * FROM sys_root.DBA_SCHEMAS;
SELECT * FROM sys_root.DBA_TABLES;
SELECT * FROM sys_root.DBA_VIEWS;


Creating users

When you install LucidDB, there is a default administrative user called "sa" with no password. You can log into LucidDB and create a new user, and set the user's default schema:

create or replace user "sa" identified by 'aGoodPassword';

Or create a new user:

create user "userName" identified by 'aBetterPassword' default schema test_schema;

Lastly, there are a two useful systems views that have information about users, DBA_USERS and DBA_ROLES. However, they are only available in the (yet) unreleased version 0.9.4.

As a side note, it appears that you can set up LudidDB to use JAAS, but the documentation is unclear (to me), and I have not tried it.