By implementing an effective data maintenance strategy when a site goes live, you can
avoid database issues such as performance degradation, exceeding available storage, and increased
backup and restore times.
Creating an effective data maintenance strategy involves the following tasks:
1. Identify the data to maintain
Monitor
database table size and growth rate and collect growth statistics from the production database on
either a weekly or monthly basis. You are recommended to collect this information during the weekend
to prevent disruptions to work that is being completed during the week. The method that you use to
track database growth over time depends on your database type.
Option
Description
dbgrowth utility
Set up the dbgrowth database
table:
CREATE TABLE dbgrowth (
rundate DATE DEFAULT CURRENT DATE NOT NULL,
tabschema VARCHAR(128) NOT NULL,
tabname VARCHAR(128) NOT NULL,
card BIGINT DEFAULT 0 NOT NULL
) IN USERSPACE1 INDEX IN USERSPACE1;
ALTER TABLE dbgrowth ADD PRIMARY KEY (rundate,tabschema,tabname);
Create a cron job to populate the dbgrowth table every Sunday. Create a
script that is called dbgrowth.sql with the following
contents:
SELECT 'WARNING: The last statistics for table ' || rtrim(tabschema) || '.' || tabname || ' ' || ' were taken on '
|| char(date(stats_time)) || '. The dbgrowth script requires the stats be up to date.' warning FROM syscat.tables where date(stats_time) < current date;
-- in case it is run twice
DELETE FROM dbgrowth WHERE rundate = CURRENT DATE;
-- insert
INSERT INTO dbgrowth ( tabschema, tabname, card ) SELECT tabschema, tabname, card FROM syscat.tables WHERE card >= 0;
Use the following SQL to list the fastest growing
tables:
SELECT CAST( RTRIM(a.tabschema) || '.' || a.tabname AS VARCHAR(40)) tabname, b.card begincard, a.card endcard, DAYS(a.rundate)
- DAYS(b.rundate) numdays, a.card - b.card growth, (a.card-b.card)/(DAYS(a.rundate)-DAYS(b.rundate)) dailyavg
FROM dbgrowth a,
dbgrowth b
WHERE a.tabschema = b.tabschema
AND b.tabname = a.tabname
AND a.tabschema IN ( 'DB2INST1' )
AND a.rundate = ( SELECT MAX(rundate) FROM dbgrowth )
AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
AND a.card - b.card > 1000
ORDER BY a.card - b.card DESC
FETCH FIRST 50 ROWS ONLY WITH UR;
To compare with the previous one use:
AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
To compare with a specific date use:
AND b.rundate = ( SELECT MAX(rundate) - 7 DAYS FROM dbgrowth )
The dbgrowth utility saves the current row count for each database table
that is found on the card column of the syscat.tables table. This number is
accurate only if the statistics of this database table are current.
dbgrowth utility
Set up the dbgrowth database
table:
CREATE TABLE dbgrowth (
rundate DATE NOT NULL,
tabschema VARCHAR2 (30) NOT NULL,
tabname VARCHAR2 (50) NOT NULL,
card NUMBER DEFAULT 0 NOT NULL
);
ALTER TABLE dbgrowth ADD PRIMARY KEY (rundate,tabschema,tabname);
Create a cron job to populate the dbgrowth table every Sunday. Create a
script that is called dbgrowth.sql with the following
contents:
SELECT 'WARNING: The last statistics for schema ' || rtrim(owner) || ' were taken on ' || to_char(LAST_ANALYZED,'dd-mon-yyyy')
|| '. The dbgrowth script requires the stats be up to date.' warning
FROM dba_tables where round(LAST_ANALYZED) < round(sysdate) and owner = 'WCS' and rownum < 2;
-- in case it is run twice
DELETE FROM dbgrowth WHERE rundate = round(sysdate);
-- insert
INSERT INTO dbgrowth ( rundate, tabschema, tabname, card ) SELECT round(sysdate), owner, table_name, nvl(num_rows,0)
FROM dba_tables WHERE OWNER = 'WCS';
where
WCS is the schema name for the WebSphere Commerce repository.
Use the following SQL to list the fastest growing
tables:
SELECT RTRIM(a.tabschema) || '.' || a.tabname tabname, b.card begincard, a.card endcard, a.rundate
- b.rundate numdays, a.card - b.card growth, (a.card-b.card)/(a.rundate-b.rundate) dailyavg
FROM dbgrowth a,
dbgrowth b
WHERE a.tabschema = b.tabschema
AND b.tabname = a.tabname
AND a.tabschema IN ( 'WCS' )
AND a.rundate = ( SELECT MAX(rundate) FROM dbgrowth )
AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
AND a.card - b.card > 1000
AND ROWNUM < 50
ORDER BY a.card - b.card DESC;
To compare with the previous one use:
AND b.rundate = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
To compare with a specific date use:
AND b.rundate = ( SELECT MAX(rundate) - 7 DAYS FROM dbgrowth )
The dbgrowth utility saves the current row count for each database table
that is found on the card column of the dba_tables table. This number is accurate
only if the statistics of this database table are current.
Oracle Flashback Technology
Oracle Flashback Technology is a group of Oracle database features that you
can use to view the past states of database objects. You can also
return database objects to a previous state without using
point-in-time media recovery.
Use Oracle Flashback Technology to
track historical table data. For more information about using
Oracle Flashback Technology, see Oracle Technology Network.
Search for Oracle Flashback Technology.
tips:
Use the growth statistics to identify data that can be deleted from fast growing tables or that
can be moved to a different database. The following database tables are typically the fastest
growing tables:
ORDERS, which includes completed, pending, and discarded orders
If you frequently delete folders in a production environment, run dbclean on
the FOLDER table regularly.
Review the list of default delete scenarios for particular object and type combinations to
determine which database tables and which rows to delete for a particular object and object type.
For more information, see Database Cleanup utility objects.
2. Define policies
A policy definition describes the
length of time to keep information in the database before it is moved to a Decisions Support System
(DSS) or cleaned up. Policies that reflect your current business practices help to maintain a
consistent database. For example, the following sample policies define how to long to keep order and
guest user information:
The length of time to store completed orders in the database.
The number of days before a guest user with no completed orders is deleted.
The length of time to store pending orders for guest and registered users.
3. Enforce policies with scripting
The
Database Cleanup utility is a WebSphere Commerce utility that is designed to help you clean up
the database. This utility contains a predefined list of delete statements to delete common WebSphere Commerce objects such as users, orders, and catalog data. Tune the default statements that are
available for use with the script to correspond to your customized settings. This customized script
removes records that are marked for deletion, as well as any information that is defined as outdated
in your database maintenance policies. For more information, see Database Cleanup utility.
Testing
Use both functional and nonfunctional testing to verify your data
maintenance strategy.
The following table lists and describes the testing options.
Type of testing
Description
Functional
Use functional testing to verify that the deletions performed when
dbclean runs are correct:
After testing is complete, closely verify cascade deletes on foreign key relationships since
deleting a row from one database table might result in records in other tables being affected. For
example, deleting a row from the OFFER database table might affect related records in the ORDERITEMS
database table.
After you run the dbclean utility script, request that business users
validate the state of the database to ensure that no data was unexpectedly deleted.
Nonfunctional
Use nonfunctional testing to determine the system resources that are required
to support the dbclean utility. To perform this type of testing, you must have a
performance environment and the ability to simulate load.
Running the dbclean
utility might add the following types of stress to the system:
Higher than normal CPU on the database server.
A decrease in overall site response time.
A lock contention, such as a deadlock, timeout, or lock-wait.