Implementing a data maintenance strategy

By implementing an effective data maintenance strategy when a site goes live, you can avoid database issues such as performance degradation, exceeding available storage, and increased backup and restore times.

Creating an effective data maintenance strategy involves the following tasks:
Diagram showing database maintenance strategy.

1. Identify the data to maintain

Monitor database table size and growth rate and collect growth statistics from the production database on either a weekly or monthly basis. You are recommended to collect this information during the weekend to prevent disruptions to work that is being completed during the week. The method that you use to track database growth over time depends on your database type.
Option Description
dbgrowth utility
  1. Set up the dbgrowth database table:
    CREATE TABLE dbgrowth (
          rundate        DATE DEFAULT CURRENT DATE NOT NULL,
          tabschema      VARCHAR(128) NOT NULL,
          tabname        VARCHAR(128) NOT NULL,
          card           BIGINT DEFAULT 0 NOT NULL
    ) IN USERSPACE1 INDEX IN USERSPACE1;
    
    ALTER TABLE dbgrowth ADD PRIMARY KEY (rundate,tabschema,tabname);
  2. Create a cron job to populate the dbgrowth table every Sunday. Create a script that is called dbgrowth.sql with the following contents:
    SELECT 'WARNING: The last statistics for table ' || rtrim(tabschema) || '.' || tabname || ' ' || ' were taken on '  
    || char(date(stats_time)) || '. The dbgrowth script requires the stats be up to date.' warning FROM syscat.tables where date(stats_time) < current date;
    
    -- in case it is run twice
    DELETE FROM dbgrowth WHERE rundate = CURRENT DATE;
    
    -- insert
    INSERT INTO dbgrowth ( tabschema, tabname, card ) SELECT tabschema, tabname, card FROM syscat.tables WHERE card >= 0;
  3. Use the following SQL to list the fastest growing tables:
    SELECT CAST( RTRIM(a.tabschema) || '.' || a.tabname AS VARCHAR(40)) tabname, b.card begincard, a.card endcard, DAYS(a.rundate) 
    - DAYS(b.rundate) numdays, a.card - b.card growth, (a.card-b.card)/(DAYS(a.rundate)-DAYS(b.rundate)) dailyavg
      FROM dbgrowth a,
           dbgrowth b
     WHERE a.tabschema  = b.tabschema
       AND b.tabname    = a.tabname
       AND a.tabschema IN ( 'DB2INST1' )
       AND a.rundate    = ( SELECT MAX(rundate) FROM dbgrowth )
       AND b.rundate    = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
       AND a.card - b.card > 1000
     ORDER BY a.card - b.card DESC
     FETCH FIRST 50 ROWS ONLY WITH UR;
    
    To compare with the previous one use:
        AND b.rundate    = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
    
    To compare with a specific date use:
        AND b.rundate    = ( SELECT MAX(rundate) - 7 DAYS FROM dbgrowth )
The dbgrowth utility saves the current row count for each database table that is found on the card column of the syscat.tables table. This number is accurate only if the statistics of this database table are current.
dbgrowth utility
  1. Set up the dbgrowth database table:
    CREATE TABLE dbgrowth (
              rundate        DATE NOT NULL,
              tabschema      VARCHAR2 (30) NOT NULL,
              tabname        VARCHAR2 (50) NOT NULL,
              card           NUMBER DEFAULT 0 NOT NULL
        );
    
        ALTER TABLE dbgrowth ADD PRIMARY KEY (rundate,tabschema,tabname);
  2. Create a cron job to populate the dbgrowth table every Sunday. Create a script that is called dbgrowth.sql with the following contents:
    SELECT 'WARNING: The last statistics for schema ' || rtrim(owner) || ' were taken on '  || to_char(LAST_ANALYZED,'dd-mon-yyyy') 
    || '. The dbgrowth script requires the stats be up to date.' warning 
    FROM dba_tables where round(LAST_ANALYZED) < round(sysdate) and owner = 'WCS' and rownum < 2;
    
    -- in case it is run twice
        DELETE FROM dbgrowth WHERE rundate = round(sysdate);
    
    -- insert
        INSERT INTO dbgrowth ( rundate, tabschema, tabname, card ) SELECT round(sysdate), owner, table_name, nvl(num_rows,0) 
    FROM dba_tables WHERE OWNER = 'WCS';
    where WCS is the schema name for the WebSphere Commerce repository.
  3. Use the following SQL to list the fastest growing tables:
    SELECT RTRIM(a.tabschema) || '.' || a.tabname tabname, b.card begincard, a.card endcard, a.rundate 
    - b.rundate numdays, a.card - b.card growth, (a.card-b.card)/(a.rundate-b.rundate) dailyavg
          FROM dbgrowth a,
               dbgrowth b
         WHERE a.tabschema  = b.tabschema
           AND b.tabname    = a.tabname
           AND a.tabschema IN ( 'WCS' )
           AND a.rundate    = ( SELECT MAX(rundate) FROM dbgrowth )
           AND b.rundate    = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
           AND a.card - b.card > 1000
           AND ROWNUM < 50
         ORDER BY a.card - b.card DESC;
    
    To compare with the previous one use:
            AND b.rundate    = ( SELECT MAX(rundate) FROM dbgrowth WHERE rundate < (SELECT MAX(rundate) FROM dbgrowth ))
    
    To compare with a specific date use:
            AND b.rundate    = ( SELECT MAX(rundate) - 7 DAYS FROM dbgrowth )
The dbgrowth utility saves the current row count for each database table that is found on the card column of the dba_tables table. This number is accurate only if the statistics of this database table are current.
Oracle Flashback Technology Oracle Flashback Technology is a group of Oracle database features that you can use to view the past states of database objects. You can also return database objects to a previous state without using point-in-time media recovery.

Use Oracle Flashback Technology to track historical table data. For more information about using Oracle Flashback Technology, see Oracle Technology Network. Search for Oracle Flashback Technology.

tips:
  • Use the growth statistics to identify data that can be deleted from fast growing tables or that can be moved to a different database. The following database tables are typically the fastest growing tables:
    • ORDERS, which includes completed, pending, and discarded orders
    • MEMBER, which includes guest and registered users
    • ADDRESS, which includes permanent and temporary addresses
    • CACHEIVL
    • STAGLOG
  • If you frequently delete folders in a production environment, run dbclean on the FOLDER table regularly.
  • Review the list of default delete scenarios for particular object and type combinations to determine which database tables and which rows to delete for a particular object and object type. For more information, see Database Cleanup utility objects.

2. Define policies

A policy definition describes the length of time to keep information in the database before it is moved to a Decisions Support System (DSS) or cleaned up. Policies that reflect your current business practices help to maintain a consistent database. For example, the following sample policies define how to long to keep order and guest user information:
  • The length of time to store completed orders in the database.
  • The number of days before a guest user with no completed orders is deleted.
  • The length of time to store pending orders for guest and registered users.

3. Enforce policies with scripting

The Database Cleanup utility is a WebSphere Commerce utility that is designed to help you clean up the database. This utility contains a predefined list of delete statements to delete common WebSphere Commerce objects such as users, orders, and catalog data. Tune the default statements that are available for use with the script to correspond to your customized settings. This customized script removes records that are marked for deletion, as well as any information that is defined as outdated in your database maintenance policies. For more information, see Database Cleanup utility.

Testing

Use both functional and nonfunctional testing to verify your data maintenance strategy.
The following table lists and describes the testing options.
Type of testing Description
Functional Use functional testing to verify that the deletions performed when dbclean runs are correct:
  1. After testing is complete, closely verify cascade deletes on foreign key relationships since deleting a row from one database table might result in records in other tables being affected. For example, deleting a row from the OFFER database table might affect related records in the ORDERITEMS database table.
  2. After you run the dbclean utility script, request that business users validate the state of the database to ensure that no data was unexpectedly deleted.
Nonfunctional Use nonfunctional testing to determine the system resources that are required to support the dbclean utility. To perform this type of testing, you must have a performance environment and the ability to simulate load.
Running the dbclean utility might add the following types of stress to the system:
  • Higher than normal CPU on the database server.
  • A decrease in overall site response time.
  • A lock contention, such as a deadlock, timeout, or lock-wait.