Monday, January 10, 2011

Building Report Server

Although, there are many ways to setup a reporting server depending on the requirements. Sharing one of those.

Reporting Server provides reports i.e. daily, weekly, monthlty and quarterly reports to the clients. Data flow from many different servers (services) to one system called as Report Server. Client might be using more than one services. Therefore, it is necessary to collect the whole data at one place, analyze and generate reports for clients and billing purpose. This makes the data huge in TBs. Many times this server is used for billing clients for the services provided by the company. This makes this server very very critical and the data should be accurate.

Setup process has been breakup into few steps and explained below.

1. Storage Engine Selection:
The first and the most important is storage engine selection. Since, the data can be used for billing purpose. Therefore, it should be accurate and loosing data by crash is not acceptable at all. Mostly select, insert and delete (data purging discussed below) is required in this kind of setup. No update is required. Below command should be used for inserting data.

LOAD DATA INFILE ...

Since, this command is 30% faster than normal inserts.

Concurrent operations (select, insert) is required. As, innodb uses MVCC to achieve high concurrency and it's crash safe. Therefore, I've used innodb storage engine and recommend other's to use for critical activity like billing. Inoodb is a great Storage Engine !!!

2. Table Design:
The table must hold, all the important columns required for all the reports. Some times, it can have 15-20 columns. Partitioning is one of the great feature of MySQL 5.1 It provides good performace with data management. Many types of partitioning is avaliable. It depends which one fits into your senario. I have used range based partitioning on date column.

CREATE TABLE mt      
(  c1 bigint unsigned not null,
c2 varchar(30),
c3 datetime,
.
.
c9 date default NULL) engine=innodb
partition by range (to_days(c9))
(PARTITION p0 VALUES LESS THAN (to_days('2011-01-01')),
PARTITION p1 VALUES LESS THAN (to_days('2011-01-05')),
PARTITION p2 VALUES LESS THAN (to_days('2011-01-10')),
PARTITION p3 VALUES LESS THAN (to_days('2011-01-15')),
PARTITION p2 VALUES LESS THAN (to_days('2011-01-20')),
.
.
PARTITION p25 VALUES LESS THAN (to_days('2011-03-25')),
PARTITION p26 VALUES LESS THAN (to_days('2011-03-30')),
);

3. Purging Data: In order to make query operation faster, Old data should be purged on a regular basis. Dropping partiton is the best and fastest way to purge old data. But, make sure that this operation is performed at off peak hours. Otherwise, it can block the whole table. While dropping partition from the top and adding partition from the bottom, we can maintain some specific duration data for eg 3 months.

In most cases alter table command rebuild the whole table. But, not in below cases.
ALTER TABLE mt DROP PARTITION ...
ALTER TABLE mt ADD PARTITION ...
The above task can be automated (details can be found in information_schema) using procedure and cronjob or mysql event.

4. Replication: Mysql doesn't support many to one replication. Therfore, script is required to transfer the data from many servers (services) to one servers (Report Server)

5: Intelligent Application: As, we know that count is slow in innodb. Therefore, application can be designed in such a way that report for more than a month can be scheduled. The result can be send to client using sms or email.

I've build a reporting server with 2TB+ data and still growing. The ratio of data and index is 6:4

4 comments:

  1. Count without a where clause is slow. Is that frequent? You can also get estimates for count using SHOW TABLE STATUS and EXPLAIN. I suspect that count with a where clause is much more likely, and MyISAM doesn't make that fast.

    ReplyDelete
  2. No, count without where clause is not required. All count is based on secondary index with where clause and faster with InnoDB as compared to MyISAM.

    ReplyDelete
  3. Hi Krishna, Check out Tungsten. It can replicate many to one.

    ReplyDelete