2011/03/13

I/O matters

There are no large banks or processors in the part of the world I live in. Most banks see a few million transactions per month processed through their switch and credit card systems. A couple of banks and a large processor may reach some tens of millions transactions per month.

Naturally, a simple fact as "X transactions per month" does not tell the whole story. Other considerations are also important. External factors and timing usually combine to drive transaction-per-second numbers at, sometimes unexpected, heights. Christmas is a very good example; all banks and processors see a sharp increase in EFT traffic during that time of the year.

So we get terms like "15-minute average TPS rate" and "peek TPS rate". These terms indicate that an EFT system can get busy in a very different number of ways which are important. 20 million transactions per month translate to an average of 7,7 TPS. That may not sound as much. But what if half a million of those transactions occur at December 24, between 11:00 and 13:00? For those two hours, that translates to an average load of almost 70 TPS, to say nothing about the minute-peek TPS numbers. Needless to say that you need to plan for at least that 70 TPS situation, with lots of capacity to spare for the peeks.

What's the most important differentiating factor that determines the mileage of an EFT system? That's I/O capacity. This usually comes as a surprise to the uninitiated but it's only a logical conclusion. An EFT system does a lot of its magic through the database. The database is stored on the disk. The disk is slow; in fact it's the slowest subsystem inside modern computers and that's why there's so much investment in caching strategies.

Whenever I'm asked about system upgrade paths, I always point out that simple fact and ask customers to examine their options. Have a single server hosting the EFT system? Split the transaction log and database files to two different disks. Is that not enough? Have a look at those solid state appliances for single-server use. Too isolated and compartmentalized? Get one or more fiber channel cards, move the database to the enterprise storage and allocate a LUN with premium priority to it. These upgrade steps work wonders for the TPS throughput rates. The CPU is usually the last component eligible for upgrades, unless you're consistently having more than 50% usage sustained for a period exceeding several minutes.

No comments:

Post a Comment