Oracle Notes

Sharing Oracle technology knowledge and experiences.

Oracle GoldenGate Best Practices and Tips

Posted by sdefilip on July 3, 2010

Lately I’ve been working, once again, with GoldenGate (now Oracle GoldenGate) data integration software. GoldenGate offers tremendously useful capabilities which include CDC (Change Data Capture), Data Warehouse ETL, efficient/low impact data replication from diverse database management systems, real time standby database maintenance (for high-availability, upgrades and patches, feeding Oracle Data Integrator (ODI) and data distribution. So, I thought I’d offer some GoldenGate Best Practices and Tips that I’ve learned largely by making mistakes:

I. Best Practices

PARALLEL PROCESSING

Ensure the system has enough shared memory. GoldenGate runs as an Oracle process. Each Extract or Replicat process requires upwards of 25-50 MB of system shared memory. This means less memory for the Oracle DBMS, especially the SGA.

Use parallel Replicat groups on the target system to reduce latency thru parallelism. Consider parallel Extract groups for tables that are fetch intensive (e.g., those that trigger SQL procedures).

Group tables that have R.I. to each other in the same Extract-Replicat pair.

Pair each Replicat with its own trail and corresponding Extract process.

When using parallel Replicats, configure each one to process a different portion of the overall data.

PASSTHRU PARAMETER

Consider using this parameter if there is no filtering, conversion or mapping required and you’re using DATAPUMP.

In pass-through mode, the Extract process does not look up table definitions, either from the database or from a data definitions file. Pass-through mode increases the throughput of the data pump, because all of the functionality that looks up object definitions is bypassed.

This saves database fetches to improve performance.

INSERTAPPEND

A new GoldenGate 10.4 feature.

Use for large transactions .

Puts records at end of table rather than doing a more costly insert into other areas of table.

DATAPUMP (not the Oracle DB utility)

1. Primary Extract group writes to a trail on the source system.

2. Reads this trail and sends the data across the network to a remote

trail on the target.

3. Adds storage flexibility and also serves to isolate the primary

Extract process from TCP/IP activity.

4. Can be configured for online or batch.

5. Can perform data filtering, mapping, and conversion, or it can be

configured in pass-through mode, where data is passively

transferred as-is, without manipulation.

6. Use to perform filtering thereby removing that processing overhead

from the primary extract group.

7. Use one or more pumps for each source and each target for

parallelism.

In most business cases, it is best practice to use a data pump. Some reasons for using a data pump include the following:

● Protection against network and target failures:

In a basic GoldenGate configuration, with only a trail on the target system, there is nowhere on the source system to store data that the Extract process continuously extracts into memory. If the network or the target system becomes unavailable, the primary Extract could run out of memory and abend. However, with a trail and data pump on the source system, captured data can be moved to disk, preventing the abend. When connectivity is restored, the data pump extracts the data from the source trail and sends it to the target system(s).

● You are implementing several phases of data filtering or transformation. When using complex filtering or data transformation configurations, you can configure a data pump to perform the first transformation either on the source system or on the target system,

and then use another data pump or the Replicat group to perform the second transformation.

● Consolidating data from many sources to a central target. When synchronizing multiple source databases with a central target database, you can store extracted data on each source system and use data pumps on each of those systems to send the data to a trail

on the target system. Dividing the storage load between the source and target systems reduces the need for massive amounts of space on the target system to accommodate data arriving from multiple sources.

● Synchronizing one source with multiple targets. When sending data to multiple target systems, you can configure data pumps on the source system for each target. If network connectivity to any of the targets fails, data can still be sent to the other targets

————————————————————————————————-

STEP BY STEP Datapump Configuration

ON THE SOURCE SYSTEM

To configure the Manager process . (Reference: Oracle GoldenGate Administration Guide, Version 10.4):

1. On the source, configure the Manager process according to the instructions in Chapter 2.

2. In the Manager parameter file, use the PURGEOLDEXTRACTS parameter to control the purging of files from the local trail.

To configure the primary Extract group:

3. On the source, use the ADD EXTRACT command to create a primary Extract group. For documentation purposes, this group is called ext.

ADD EXTRACT , TRANLOG, BEGIN [, THREADS ]

❍ Use TRANLOG as the data source option.

4. On the source, use the ADD EXTTRAIL command to create a local trail. The primary Extract writes to this trail, and the data-pump Extract reads it.

ADD EXTTRAIL , EXTRACT

❍ Use the EXTRACT argument to link this trail to the primary Extract group. The primary Extract group writes to this trail, and the data pump group reads it.

5. On the source, use the EDIT PARAMS command to create a parameter file for the primary Extract group. Include the following parameters plus any others that apply to your database environment:

— Identify the Extract group:

EXTRACT

— Specify database login information as needed for the database:

[SOURCEDB ,][USERID [, PASSWORD ]]

— Specify the local trail that this Extract writes to:

EXTTRAIL

— Specify tables to be captured:

TABLE .
;

To configure the data pump Extract group:

6. On the source, use the ADD EXTRACT command to create a data pump group. For documentation purposes, this group is called pump. In reality, I suggest naming this to match the extract name e.g., dpext1.

ADD EXTRACT , EXTTRAILSOURCE , BEGIN

❍ Use EXTTRAILSOURCE as the data source option, and specify the name of the local trail.

7. On the source, use the ADD RMTTRAIL command to specify a remote trail that will be created on the target system.

ADD RMTTRAIL , EXTRACT

❍ Use the EXTRACT argument to link the remote trail to the data pump group. The linked data pump writes to this trail.

8. On the source, use the EDIT PARAMS command to create a parameter file for the data pump. Include the following parameters plus any others that apply to your database environment.

————————————————————————————————

— Identify the data pump group:

EXTRACT

— Specify database login information as needed for the database:

[SOURCEDB ,][USERID [, PASSWORD ]]

— Specify the name or IP address of the target system:

RMTHOST , MGRPORT

— Specify the remote trail on the target system:

RMTTRAIL

— Allow mapping, filtering, conversion or pass data through as-is:

[PASSTHRU | NOPASSTHRU]

— Specify tables to be captured:

TABLE ;

NOTE – To use PASSTHRU mode, the names of the source and target objects must be identical. No column mapping, filtering, SQLEXEC functions, transformation, or other functions that require data manipulation can be specified in the parameter file. You can combine normal processing with pass-through processing by pairing PASSTHRU and NOPASSTHRU with different TABLE statements.

ON THE TARGET SYSTEM

To configure the Manager process:

9. On the target, configure the Manager process according to the instructions in the Administration Guide, Chapter 2.

10. In the Manager parameter file, use the PURGEOLDEXTRACTS parameter to control the purging of files from the local trail.

To configure the Replicat group:

11. On the target, create a Replicat checkpoint table. This is a best practice. For instructions, see “Creating a checkpoint table” on page 121 of Administration Guide.

12. On the target, use the ADD REPLICAT command to create a Replicat group. For documentation purposes, this group is called rep.

ADD REPLICAT , EXTTRAIL , BEGIN

❍ Use the EXTTRAIL argument to link the Replicat group to the remote trail.

13. On the target, use the EDIT PARAMS command to create a parameter file for the Replicat group. Include the following parameters plus any others that apply to your database environment.

— Identify the Replicat group:

REPLICAT

— State whether or not source and target definitions are identical:

SOURCEDEFS | ASSUMETARGETDEFS

— Specify database login information as needed for the database:

[TARGETDB ,] [USERID [, PASSWORD ]]

— Specify error handling rules:

REPERROR (, )

— Specify tables for delivery:

MAP .

, TARGET .

[, DEF <template name>];

(Reference: Oracle GoldenGate Administration Guide, Version 10.4)

REPLICAT CHECKPOINT TABLE

Create on target.

Can be used by all Replicat groups.

Prevents loss of data in case Replicat needs to be restarted.

PRIMARY KEYS

Must have a unique, non-null identifier, even if not the primary key.

Option is to use KEYCOLS option of MAP and TABLE parameters to

create a substitute key. Otherwise, GG uses multiple columns from

the table; will usually degrade performance. For data integrity,

the key used must contain the same values in corresponding rows of

all subject databases and contain the same columns in databases

where key resides

ADD TRACETABLE

Use default name of “GGS_TRACE”.

FAULT TOLERANCE

In a data distribution configuration, include a primary Extract

group and a data-pump Extract group in the source configuration, one

for each target.

If network connectivity fails, data can still be sent to the target.

————————————————————————————————-

II. PERFORMANCE

This section will cover anything that is performance related that was not also in section “I. Best Practices”.

A. MONITORING

Run Status of Particular Process

To find the run status of a particular process:

GGSCI (development) 23> status manager

Manager is running (IP port development.7809).

GGSCI (development) 24> status extract ext1

EXTRACT EXT1: RUNNING

Detailed information of a particular process:

GGSCI (development) 6> info extract ext1, detail

EXTRACT EXT1 Last Started 2010-01-23 11:19 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:00:02 ago)

Log Read Checkpoint Oracle Redo Logs

2010-01-23 10:45:18 Seqno 786, RBA 44710400

Target Extract Trails:

Remote Trail Name Seqno RBA Max MB

/u01/oracle/software/goldengate/dirdat/lt 55644 2 55644 10

Extract Source Begin End

/u02/oradata/acme/redo03.log 2010-01-23 11:13 2010-01-23 10:45

/u02/oradata/acme/redo02.log 2010-01-23 11:04 2010-01-22 11:13

/u02/oradata/acme/redo02.log 2010-01-23 10:42 2010-01-21 11:04

Not Available * Initialized * 2010-01-20 10:42

Current directory /u01/oracle/software/goldengate

Report file /u01/oracle/software/goldengate/dirrpt/EXT1.rpt

Parameter file /u01/oracle/software/goldengate/dirprm/ext1.prm

Checkpoint file /u01/oracle/software/goldengate/dirchk/EXT1.cpe

Process file /u01/oracle/software/goldengate/dirpcs/EXT1.pce

Stdout file /u01/oracle/software/goldengate/dirout/EXT1.out

Error log /u01/oracle/software/goldengate/ggserr.log

Detailed information of a particular process:

GGSCI (devu007) 6> info extract ext1, detail

EXTRACT EXT1 Last Started 2010-02-19 11:19 Status RUNNING

Checkpoint Lag 00:00:00 (updated 00:00:02 ago)

Log Read Checkpoint Oracle Redo Logs

2010-02-19 10:45:18 Seqno 786, RBA 44710400

Target Extract Trails:

Remote Trail Name Seqno

/u01/oracle/software/goldengate/dirdat/lt 2

RBA Max MB

55644 10

Extract Source Begin End

/u02/oradata/acme/redo03.log 2010-02-19 11:13 2010-02-26 10:45

/u02/oradata/acme/redo02.log 2010-02-19 11:04 2010-02-19 11:13

/u02/oradata/acme/redo02.log 2010-02-18 10:42 2010-02-19 11:04

Not Available * Initialized * 2010-02-18 10:42

Current directory /u01/oracle/software/goldengate

Report file /u01/oracle/software/goldengate/dirrpt/EXT1.rpt

Parameter file /u01/oracle/software/goldengate/dirprm/ext1.prm

Checkpoint file /u01/oracle/software/goldengate/dirchk/EXT1.cpe

Process file /u01/oracle/software/goldengate/dirpcs/EXT1.pce

Stdout file /u01/oracle/software/goldengate/dirout/EXT1.out

Error log /u01/oracle/software/goldengate/ggserr.log

View processing rate – can use ‘hr’,’min’ or ‘sec’ as a parameter:

GGSCI (devu007) 37> stats extract ext2 reportrate hr

Sending STATS request to EXTRACT EXT2 …

Start of Statistics at 2010-02-19 10:04:46.

Output to /u01/oracle/ggs/dirdat/cc:

Extracting from SH.CUSTOMERS to SH.CUSTOMERS:

*** Total statistics since 2010-02-19 09:29:48 ***

Total inserts/hour: 0.00

Total updates/hour: 95258.62

Total deletes/hour: 0.00

Total discards/hour: 0.00

Total operations/hour: 95258.62

*** Daily statistics since 2010-02-19 09:29:48 ***

Total inserts/hour: 0.00

Total updates/hour: 95258.62

Total deletes/hour: 0.00

Total discards/hour: 0.00

Total operations/hour: 95258.62

*** Hourly statistics since 2010-02-19 10:00:00 ***

No database operations have been performed.

*** Latest statistics since 2010-02-19 09:29:48 ***

Total inserts/hour: 0.00

Total updates/hour: 95258.62

Total deletes/hour: 0.00

Total discards/hour: 0.00

Total operations/hour: 95258.62

End of Statistics.

View latency between the records processed by Goldengate and the timestamp in the data source:

GGSCI (development) 13> send extract ext2, getlag

Sending GETLAG request to EXTRACT EXT2 …

Last record lag: 3 seconds.

At EOF, no more records to process.

GGSCI (development) 15> lag extract ext*

Sending GETLAG request to EXTRACT EXT1 …

Last record lag: 1 seconds.

At EOF, no more records to process.

Sending GETLAG request to EXTRACT EXT2 …

Last record lag: 1 seconds.

At EOF, no more records to process.

Viewing the GoldenGate error log as well as history of commands executed and other events:

For UNIX:

vi ggserr.log

Or

GGSCI command: VIEW GGSEVT

View the process report:

GGSCI (development) 2> view report ext1

GGSCI (development) 2> view report rep1

B. PERFORMANCE TIPS

To reduce bandwidth requirements:

Use compression options of the RMTHOST parameter to compress data

before it is sent across the network. Weigh the benefits of

compression against the CPU resources that are required to perform

the compression.

To increase the TCP/IP packet size:

a.  Use the TCPBUFSIZE option of the RMTHOST parameter to increase the

size of the TCP socket buffer that Extract maintains. By increasing

the size of the buffer, you can send larger packets to the target

system. Consult with Network Support before setting TCPBUFSIZE.

b.  Use SQL Arrays

The BATCHSQL parameter will increase the performance of Replicat. BATCHSQL causes Replicat to create arrays for similar SQL statements and apply them at an accelerated rate. Normally, Replicat applies one SQL statement at a time.

• At 100 bytes of data per row change, BATCHSQL can

improve Replicat’s performance by up to 300 percent; actual

performance benefits will vary, depending on the oeverall

demand on system resources.

• At around 5,000 bytes of data per row change, the benefits of

using BATCHSQL diminish.

. Improve I/O within the system configuration

• Place trail files on the fastest disk controller.

• Use RAID 0+1 disk configuration because GoldenGate performs

sequential writes.

* Be careful with the following points. Increasing the values of

these parameters will improve performance but take longer to

reprocess your data in case of process failure.

• Use the CHECKPOINTSECS in Extract or Replicat; if increased,

less frequent checkpoints; increases data to be reprocessed if

process fails; keep transaction logs available in case of

reprocessing

• Use the GROUPTRANSOPS; increases number of SQL operations in a Replicat ;

reduces I/O to checkpoint file and checkpoint table

Data Filtering and Conversion:

Use primary Extract for data capture only.

Use a data pump on the source to perform filtering and thereby send

less data over the network.

Alternatively, use Replicat for conversion and, if the network can

handle large amounts of data, also for filtering.

**(Upcoming Post – “Troubleshooting GoldenGate Extracts”)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: