Unwise Practice #1 - HARDWAREIt is a common and unwise practice to think that your development, test, and QA hardware should have a hardware configuration similar to your production environment. In fact, I assert and suggest that this practice is a really BAD idea. When development and test machines have memory and CPU resources on par with the production environment, it is far too easy for physical design problems to hide behind all of that memory and horsepower. Let's please stop this insanity and instead use hardware configurations that will assure success in production.
Hardware and Configuration Recommendations
Development and Test machines
- 1/2 to 1 CPU maximum. Your organization might also save on database and software license costs.
- Use SMALL memory allocations. The default bufferpool size is 1,000 4K pages, or about 3.96MB. This is TOO LARGE. Depending on the page size you are using, set your bufferpool sizes to 1-2MB maximum. Set SORTHEAP to 25% of the 256 default, or only 64. Other memory areas like Package Cache, Catalog Cache, and Locking memory can be set to about 50% of production sizes.
Tip: Save money on DB2 licenses and hardware by developing on DB2 Express-C (See FreeDB2.com ). DB2 Express-C allows up to 2 CPUs, 2GB memory (or 4GB if you purchase support), and unlimited database storage, and it runs on Windows or Linux. You won't need the 2 CPUs or the 2GB of memory; in fact, recycle your hardware by running your development and test databases on the slowest, oldest junk you can find. I've got a circa 2002 desktop machine with 1.3Ghz CPU and 512MB of memory and I can make it blast out 100's of SQL/Sec when the physical design (indexes, mostly) supports the workload.
I realize that I probably just enraged at least 50% of the developers on the planet. I'm sorry, but I also just saved your company thousands and thousands of dollars in license costs, hardware costs, and tens of thousands of dollars in production failure costs. You can send me hate mail now if you want, but be sure to send a thank you letter later when your job is saved and you win the company award for "most brilliant developer".
QA Environment Machines
- 50% of the number of Production CPUs (or less, see Tip below)
- 50% of the production environment memory allocations
Tip: Your organization may be able to save money on DB2 license costs by using DB2 Workgroup Edition for your QA servers. Workgroup Edition allows up to 4CPUs and more generous memory allocations. Also, do not be trapped into thinking the QA CPU count and memory allocation needs to match production. Production should have greater resources available to handle occasional peaks and rogue elephants (unplanned decision support type queries).
Well, the QA people were ROFL (Rolling On the Floor Laughing) with developers getting junk 2002 hardware for their data servers until I rained on their parade with dumbed down hardware for QA as well. Sorry, but I am trying to help you be successful in production at minimal hardware and licensing cost.
Unwise Practice #2 - ACTIVITY RATESRates provide great entertainment value. We hear about and see too many organizations that have costly tools that can simulate a billion users performing transaction rates in excess of the US federal deficit. Good grief. Instead of trying to achieve record setting activity rates, focus on transaction costs instead. Some of DBI's competitors provide tools which assert that rates are KPIs (Key Performance Indicators). We don't agree (actually, we can be seen ROFL whenever we hear about these tools) because rates can vary by time of day, day of the week, end of the quarter, or other business cycles. Costs, on the other hand, in the absence of any configuration or physical design changes, will be level over time independent of rates.
After you've successfully measured and managed transaction and SQL costs (more details below), then go ahead and run your activity simulators at 100%, 150%, and 200% of anticipated production environment activity - but leave your fantasies of a billion users and eleven trillion transactions per second at home as this is an exercise in futility that will likely do more harm, and cause more frustration, than good.
Unwise Practice #3 - Betting your business on EXPLAINFortunately, the majority of the DB2 community has a healthy respect for the value and importance of EXPLAIN. Explaining your statements (SQL and XML) in development, test, and QA can help prevent forest fires in production. If you see a timeron cost in excess of a few hundred, and certainly in excess of 1,000, you've likely got some SQL/XML or physical design opportunities for improvement to work through before QA and production.
However, please be advised that an Explain plan timeron cost is just one valid cost metric that you should be evaluating.
To get a driver's license to operate a motor vehicle (truck, car (or "cage" in Harley speak), or motorcycle), you have to pass both written and road tests to assure your government that you are ready. DB2 is no different. We have to answer lots of cost metric questions successfully and pass road tests in Development, Test, and QA environments.
The risk or problem with relying on Explain costs alone is that Explains do not consider frequency of execution. A simple statement having only a moderate timeron cost (250-1,000) can sink a great ship, or crush the CPUs in your production server, if it is executed with great frequency. We have seen this many, many, many times over the years. If you haven't read it yet, please read this blog post on SQL Cost Aggregation.
The real litmus test for Production Readiness - COSTSMaybe you've been asked to write a new application with several different transactions, or maybe you've been asked to enhance an existing application with some new features and functionality. The application or enhancements will be developed in bits and pieces - a new screen, a new form, a new Ctrl-Shift-Right-Click on this thing function at a time. You (hopefully) work with a competent development DBA to build the tables, and then you sling the code. Some number of hours later, you've created a thing of beauty and shown your work to the business user. It's just what he or she was hoping for! After some back slapping and high-fiving, now the business people want it in production yesterday.
What does it COST to run this masterpiece?!?!?!?!?
Performance Quality from the Start - Measuring CostsYou can achieve quality, efficient, and fast performance at the beginning of your project if you understand what it costs to execute workloads of transactions and work to reduce processing costs upfront. Here's what you should do:
- Turn on all of the database monitoring switches
- db2 "update monitor switches using bufferpool ON sort ON table ON lock ON uow ON statement ON timestamp ON"
- Exercise the functionality of your application or enhancement with one or two users. This is simple unit testing - don't get yourself in a knot over this.
- Get Snapshots for the database, tables, and statements
- db2 "get snapshot for tables on DBNAME" > table-snapshot1.txt
- db2 "get snapshot for database on DBNAME" > database-snapshot1.txt
- db2 "get snapshot for dynamic sql on DBNAME > sql-snapshot1.txt
- Exercise the functionality of your application or enhancement again, but this time try to exercise each key function two times using only one human
- Get Snapshots for the database, tables, and statements
- db2 "get snapshot for tables on DBNAME" > table-snapshot2.txt
- db2 "get snapshot for database on DBNAME" > database-snapshot2.txt
- db2 "get snapshot for dynamic sql on DBNAME > sql-snapshot2.txt
- Now let's do the math. Unless otherwise noted, delta calculations between the first and second set of snapshots should be used. Please note that we are NOT going to look at any rates. We will look at cost and ratio KPIs that will tell us about the health and efficiency of the transactions.
- What is the Average Result Set Size (ARSS)? If your application is like 99.9% of OLTP applications around the world, the ARSS value should be 10 or less.
- What is the Database Index Read Efficiency (IREF)? Normally, it should be less than 10.
- What is the Database Synchronous Read Percentage (SRP)? The SRP could be influenced by bufferpool sizes which should be tiny in your development/test databases, but we'd still like to see the SRP value be at least 80% or higher. If so, give yourself another "gold star" towards being ready for production. If SRP is 50-80%, one or more indexes are probably missing - coordinate efforts with the DBA. If SRP is less than 50%, don't you dare put this application in production!!!
- Read the blog on key cost metrics.
- How many transactions were completed?
- How many Selects per transaction (SELTX)?
- How many inserts, updates, and deletes per transaction (DMLTX)?
- How many Sorts per Transaction (SRTTX)?
- How many rows are fetched by the average transaction?
- How many rows are read by the average transaction?
- What is the maximum value for NLEVELS across all indexes in the database?
- db2 "Select max(nlevels) as MAXNLEVELS from syscat.indexes"
- Read about THE MOST IMPORTANT COST, then compute Bufferpool Logical Reads per transaction (BPLRTX) for your database.
- Compute ((#SELTX + #DMLTX) x (MAXNLEVELS + 2)) x 2. Is your BPLRTX greater than this value? IF SO, then this application/enhancement is NOT ready for production - indexes are either missing or sub-optimal. Please work with your DBA.
- Read about Detecting Index Leaf Page Scans and compute Bufferpool Logical Index Reads per Transaction (BPLITX). Based on this formula, is it likely that you have leaf page scans occurring? With small bufferpools, scans are less likely, so if this cost metric is indicating a leaf page scan problem then you really need to pay attention to it. Please work with your DBA to define better indexes that will avoid costly leaf page scans.
- Don't waste your time computing Bufferpool Hit Ratios - BP hit ratios with small memory pools should be rather low, and, in production, high hit ratios are often as deceiving as a cheating spouse - and can be equally as costly.
- Refer to your database and table snapshots, and compute the average number of Rows Read per Transaction (TBRRTX) for each table. I'm sorry - it's a lot of math, but if you find any TBRRTX > 10 then it is unlikely this application is ready for production. If TBRRTX > 100, then you've definitely got performance problems that need to be fixed. If TBRRTX > 1000, putting this application into production would be career suicide. Either fix the physical design/indexes to support the SQL running your transactions or update your resume. Make a list of Tables having TBRRTX > 10. For each table, plan on taking your DBA friend to lunch.
- Next, use the SQL Snapshots and perform a complete SQL Workload Aggregate Cost Analysis. This analysis will help you isolate the specific statements that may cause your application/enhancement to fail in production. If you skipped the prior hyperlink, read this blog post now for instructions.
- Across your SQL workload, do any of the statements consume more than 25% of the CPU in the aggregate?
- Across your SQL workload, do any of the statements consume more than 25% of the total Rows Read?
- Across your SQL workload, did any of the statements consume more than 25% of the total elapsed Execution time?
- Compute the IREF for each statement by dividing Rows Read/Rows Selected. Do you find any ratio values greater than 10? If you don't see Rows Selected in the output, complain to your favorite IBM representative - you'll need to work with an event monitor instead.
- Across your SQL workload, do any of the statements consume more than 40% of the total sort time? If so, give earnest consideration to creating a CLUSTER index to mitigate the sort cost.
- What is the average execution time for each statement in the workload? Do any of these averages exceed 1 second? If so, you probably won't meet a one second service level agreement in production. A well tuned statement with high quality supporting indexes should be able to achieve 1/2 second or better average response time - even on a circa 2002 piece of junk hardware and small bufferpools.
- For each statement, what is the aggregate BPLRTX across statement patterns ((Bufferpool Data Logical Reads + Bufferpool Index Logical Reads ) / Number of Executions)? If you find a statement BPLRTX that exceeds (MAXNLEVELS + 2), this statement will have excessive or prohibitive cost to run in production.
- For each statement, what is the aggregate BPLITX across statement patterns (Bufferpool Index Logical Reads / Number of Executions)? BPLITX values that exceed (MAXNLEVELS + 1) are danger signs of index leaf page scans.
- Across your SQL workload, do any of the statements consume more than 25% of the total Bufferpool Logical Data Reads? Or more than 25% of the total Bufferpool Logical Index Reads?
If, on the other hand, the aforementioned SQL statement cost metrics all look fairly good, if none of your tables have TBRRTX > 10, and all of the database KPI measurements check out within reason, then you might want to additionally complete a comprehensive response time analysis to learn how much time is inside the database and out. If response times are slow, and the percentage of Inside Database time is less than 25%, then you'll need to evaluate your application code, application logic, and possibly get your networking people involved as well.
Quality AssuranceQA professionals should also be completing these same analyses but on larger hardware with heavier workloads (as previously described). When you look at ratios and costs, and NOT rates, you can assure efficient, effective, low cost, high quality performance before the application/enhancement arrives in production.
If you fail to identify, isolate, and enforce the cost reduction of high cost statements, the operational production database DBA team will end up hunting elephants and trying to exterminate mosquito swarms that will suck the life, productivity, and profitability out of your business.
- For development and test, use junkie old (maybe recycled) hardware with minimal memory and allocate small memory resources. Do not let performance problems hide behind large memory heaps.
- For QA, more computing resources are allowed but not as much as production.
- Pay no attention to rates. Rates are entertaining but not the way to tune applications. Instead, look at cost and ratio KPI metrics that are valid at any activity rate on any hardware.
- Remember to buy DBAs lunch often. They like that.
A shameless marketing moment
If you have the right suite of DB2 LUW Performance Monitoring, Tuning, and Management tools, all of the aforementioned analysis can be automated with just a few mouse clicks in seconds.
DBI's Brother-Panther® and Brother-Thoroughbred® should be used liberally throughout the application life cycle in development, test, QA, and production environments. For operational production environments, we suggest adding Brother-Eagle® for "lights-on" real-time database monitoring and Brother-Hawk™ for very sophisticated, advanced, flexible, and extensible "lights-out" alerting.
To see how the DBI DB2 LUW Performance Tools compare to our closest competitors, visit our DB2 LUW Performance Tools Comparison page.
If being proactive about performance is important, if you require alert capability, or if you loathe the IBM DB2 Health Monitor like 95% of the DB2 LUW Community, please visit our Brother-Hawk versus DB2 Health Monitor comparison chart.
Finally, if your organization suffers from hardware addiction (you know, throwing CPUs at performance problems), if your organization is migrating from Oracle to DB2 LUW 9.7, if your organization would like to reduce hardware costs, lower energy costs, and save on database licensing costs, or if you'd just like to get an independent, objective, expert opinion on the state of DB2 LUW performance in your shop, simply Contact DBI and we'll help you.
Your Performance IS Our Business.
President & CEO, DBI
IBM GOLD Consultant and Data Management Champion
DBI is an IBM Advanced, Industry Optimized, Business Partner
DBI Customers and Partners span the globe in every industy.