Lately I’ve been working, once again, with GoldenGate (now Oracle GoldenGate) data integration software. GoldenGate offers tremendously useful capabilities which include CDC (Change Data Capture), Data Warehouse ETL, efficient/low impact data replication from diverse database management systems, real time standby database maintenance (for high-availability, upgrades and patches, feeding Oracle Data Integrator (ODI) and data distribution. So, I thought I’d offer some GoldenGate Best Practices and Tips that I’ve learned largely by making mistakes:
I. Best Practices
PARALLEL PROCESSING
Ensure the system has enough shared memory. GoldenGate runs as an Oracle process. Each Extract or Replicat process requires upwards of 25-50 MB of system shared memory. This means less memory for the Oracle DBMS, especially the SGA.
Use parallel Replicat groups on the target system to reduce latency thru parallelism. Consider parallel Extract groups for tables that are fetch intensive (e.g., those that trigger SQL procedures).
Group tables that have R.I. to each other in the same Extract-Replicat pair.
Pair each Replicat with its own trail and corresponding Extract process.
When using parallel Replicats, configure each one to process a different portion of the overall data.
PASSTHRU PARAMETER
Consider using this parameter if there is no filtering, conversion or mapping required and you’re using DATAPUMP.
In pass-through mode, the Extract process does not look up table definitions, either from the database or from a data definitions file. Pass-through mode increases the throughput of the data pump, because all of the functionality that looks up object definitions is bypassed.
This saves database fetches to improve performance.
INSERTAPPEND
A new GoldenGate 10.4 feature.
Use for large transactions .
Puts records at end of table rather than doing a more costly insert into other areas of table.
DATAPUMP (not the Oracle DB utility)
1. Primary Extract group writes to a trail on the source system.
2. Reads this trail and sends the data across the network to a remote
trail on the target.
3. Adds storage flexibility and also serves to isolate the primary
Extract process from TCP/IP activity.
4. Can be configured for online or batch.
5. Can perform data filtering, mapping, and conversion, or it can be
configured in pass-through mode, where data is passively
transferred as-is, without manipulation.
6. Use to perform filtering thereby removing that processing overhead
from the primary extract group.
7. Use one or more pumps for each source and each target for
parallelism.
In most business cases, it is best practice to use a data pump. Some reasons for using a data pump include the following:
● Protection against network and target failures:
In a basic GoldenGate configuration, with only a trail on the target system, there is nowhere on the source system to store data that the Extract process continuously extracts into memory. If the network or the target system becomes unavailable, the primary Extract could run out of memory and abend. However, with a trail and data pump on the source system, captured data can be moved to disk, preventing the abend. When connectivity is restored, the data pump extracts the data from the source trail and sends it to the target system(s).
● You are implementing several phases of data filtering or transformation. When using complex filtering or data transformation configurations, you can configure a data pump to perform the first transformation either on the source system or on the target system,
and then use another data pump or the Replicat group to perform the second transformation.
● Consolidating data from many sources to a central target. When synchronizing multiple source databases with a central target database, you can store extracted data on each source system and use data pumps on each of those systems to send the data to a trail
on the target system. Dividing the storage load between the source and target systems reduces the need for massive amounts of space on the target system to accommodate data arriving from multiple sources.
● Synchronizing one source with multiple targets. When sending data to multiple target systems, you can configure data pumps on the source system for each target. If network connectivity to any of the targets fails, data can still be sent to the other targets
————————————————————————————————-
STEP BY STEP Datapump Configuration
ON THE SOURCE SYSTEM
To configure the Manager process . (Reference: Oracle GoldenGate Administration Guide, Version 10.4):
1. On the source, configure the Manager process according to the instructions in Chapter 2.
2. In the Manager parameter file, use the PURGEOLDEXTRACTS parameter to control the purging of files from the local trail.
To configure the primary Extract group:
3. On the source, use the ADD EXTRACT command to create a primary Extract group. For documentation purposes, this group is called ext.
ADD EXTRACT , TRANLOG, BEGIN [, THREADS ]
❍ Use TRANLOG as the data source option.
4. On the source, use the ADD EXTTRAIL command to create a local trail. The primary Extract writes to this trail, and the data-pump Extract reads it.
ADD EXTTRAIL , EXTRACT
❍ Use the EXTRACT argument to link this trail to the primary Extract group. The primary Extract group writes to this trail, and the data pump group reads it.
5. On the source, use the EDIT PARAMS command to create a parameter file for the primary Extract group. Include the following parameters plus any others that apply to your database environment:
– Identify the Extract group:
EXTRACT
– Specify database login information as needed for the database:
[SOURCEDB ,][USERID [, PASSWORD ]]
– Specify the local trail that this Extract writes to:
EXTTRAIL
– Specify tables to be captured:
TABLE .
;
To configure the data pump Extract group:
6. On the source, use the ADD EXTRACT command to create a data pump group. For documentation purposes, this group is called pump. In reality, I suggest naming this to match the extract name e.g., dpext1.
ADD EXTRACT , EXTTRAILSOURCE , BEGIN
❍ Use EXTTRAILSOURCE as the data source option, and specify the name of the local trail.
7. On the source, use the ADD RMTTRAIL command to specify a remote trail that will be created on the target system.
ADD RMTTRAIL , EXTRACT
❍ Use the EXTRACT argument to link the remote trail to the data pump group. The linked data pump writes to this trail.
8. On the source, use the EDIT PARAMS command to create a parameter file for the data pump. Include the following parameters plus any others that apply to your database environment.
————————————————————————————————
– Identify the data pump group:
EXTRACT
– Specify database login information as needed for the database:
[SOURCEDB ,][USERID [, PASSWORD ]]
– Specify the name or IP address of the target system:
RMTHOST , MGRPORT
– Specify the remote trail on the target system:
RMTTRAIL
– Allow mapping, filtering, conversion or pass data through as-is:
[PASSTHRU | NOPASSTHRU]
– Specify tables to be captured:
TABLE ;
NOTE – To use PASSTHRU mode, the names of the source and target objects must be identical. No column mapping, filtering, SQLEXEC functions, transformation, or other functions that require data manipulation can be specified in the parameter file. You can combine normal processing with pass-through processing by pairing PASSTHRU and NOPASSTHRU with different TABLE statements.
ON THE TARGET SYSTEM
To configure the Manager process:
9. On the target, configure the Manager process according to the instructions in the Administration Guide, Chapter 2.
10. In the Manager parameter file, use the PURGEOLDEXTRACTS parameter to control the purging of files from the local trail.
To configure the Replicat group:
11. On the target, create a Replicat checkpoint table. This is a best practice. For instructions, see “Creating a checkpoint table” on page 121 of Administration Guide.
12. On the target, use the ADD REPLICAT command to create a Replicat group. For documentation purposes, this group is called rep.
ADD REPLICAT , EXTTRAIL , BEGIN
❍ Use the EXTTRAIL argument to link the Replicat group to the remote trail.
13. On the target, use the EDIT PARAMS command to create a parameter file for the Replicat group. Include the following parameters plus any others that apply to your database environment.
– Identify the Replicat group:
REPLICAT
– State whether or not source and target definitions are identical:
SOURCEDEFS | ASSUMETARGETDEFS
– Specify database login information as needed for the database:
[TARGETDB ,] [USERID [, PASSWORD ]]
– Specify error handling rules:
REPERROR (, )
– Specify tables for delivery:
MAP .
, TARGET .
[, DEF <template name>];
(Reference: Oracle GoldenGate Administration Guide, Version 10.4)
REPLICAT CHECKPOINT TABLE
Create on target.
Can be used by all Replicat groups.
Prevents loss of data in case Replicat needs to be restarted.
PRIMARY KEYS
Must have a unique, non-null identifier, even if not the primary key.
Option is to use KEYCOLS option of MAP and TABLE parameters to
create a substitute key. Otherwise, GG uses multiple columns from
the table; will usually degrade performance. For data integrity,
the key used must contain the same values in corresponding rows of
all subject databases and contain the same columns in databases
where key resides
ADD TRACETABLE
Use default name of “GGS_TRACE”.
FAULT TOLERANCE
In a data distribution configuration, include a primary Extract
group and a data-pump Extract group in the source configuration, one
for each target.
If network connectivity fails, data can still be sent to the target.
————————————————————————————————-
II. PERFORMANCE
This section will cover anything that is performance related that was not also in section “I. Best Practices”.
A. MONITORING
Run Status of Particular Process
To find the run status of a particular process:
GGSCI (development) 23> status manager
Manager is running (IP port development.7809).
GGSCI (development) 24> status extract ext1
EXTRACT EXT1: RUNNING
Detailed information of a particular process:
GGSCI (development) 6> info extract ext1, detail
EXTRACT EXT1 Last Started 2010-01-23 11:19 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:02 ago)
Log Read Checkpoint Oracle Redo Logs
2010-01-23 10:45:18 Seqno 786, RBA 44710400
Target Extract Trails:
Remote Trail Name Seqno RBA Max MB
/u01/oracle/software/goldengate/dirdat/lt 55644 2 55644 10
Extract Source Begin End
/u02/oradata/acme/redo03.log 2010-01-23 11:13 2010-01-23 10:45
/u02/oradata/acme/redo02.log 2010-01-23 11:04 2010-01-22 11:13
/u02/oradata/acme/redo02.log 2010-01-23 10:42 2010-01-21 11:04
Not Available * Initialized * 2010-01-20 10:42
Current directory /u01/oracle/software/goldengate
Report file /u01/oracle/software/goldengate/dirrpt/EXT1.rpt
Parameter file /u01/oracle/software/goldengate/dirprm/ext1.prm
Checkpoint file /u01/oracle/software/goldengate/dirchk/EXT1.cpe
Process file /u01/oracle/software/goldengate/dirpcs/EXT1.pce
Stdout file /u01/oracle/software/goldengate/dirout/EXT1.out
Error log /u01/oracle/software/goldengate/ggserr.log
Detailed information of a particular process:
GGSCI (devu007) 6> info extract ext1, detail
EXTRACT EXT1 Last Started 2010-02-19 11:19 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:02 ago)
Log Read Checkpoint Oracle Redo Logs
2010-02-19 10:45:18 Seqno 786, RBA 44710400
Target Extract Trails:
Remote Trail Name Seqno
/u01/oracle/software/goldengate/dirdat/lt 2
RBA Max MB
55644 10
Extract Source Begin End
/u02/oradata/acme/redo03.log 2010-02-19 11:13 2010-02-26 10:45
/u02/oradata/acme/redo02.log 2010-02-19 11:04 2010-02-19 11:13
/u02/oradata/acme/redo02.log 2010-02-18 10:42 2010-02-19 11:04
Not Available * Initialized * 2010-02-18 10:42
Current directory /u01/oracle/software/goldengate
Report file /u01/oracle/software/goldengate/dirrpt/EXT1.rpt
Parameter file /u01/oracle/software/goldengate/dirprm/ext1.prm
Checkpoint file /u01/oracle/software/goldengate/dirchk/EXT1.cpe
Process file /u01/oracle/software/goldengate/dirpcs/EXT1.pce
Stdout file /u01/oracle/software/goldengate/dirout/EXT1.out
Error log /u01/oracle/software/goldengate/ggserr.log
View processing rate – can use ‘hr’,'min’ or ‘sec’ as a parameter:
GGSCI (devu007) 37> stats extract ext2 reportrate hr
Sending STATS request to EXTRACT EXT2 …
Start of Statistics at 2010-02-19 10:04:46.
Output to /u01/oracle/ggs/dirdat/cc:
Extracting from SH.CUSTOMERS to SH.CUSTOMERS:
*** Total statistics since 2010-02-19 09:29:48 ***
Total inserts/hour: 0.00
Total updates/hour: 95258.62
Total deletes/hour: 0.00
Total discards/hour: 0.00
Total operations/hour: 95258.62
*** Daily statistics since 2010-02-19 09:29:48 ***
Total inserts/hour: 0.00
Total updates/hour: 95258.62
Total deletes/hour: 0.00
Total discards/hour: 0.00
Total operations/hour: 95258.62
*** Hourly statistics since 2010-02-19 10:00:00 ***
No database operations have been performed.
*** Latest statistics since 2010-02-19 09:29:48 ***
Total inserts/hour: 0.00
Total updates/hour: 95258.62
Total deletes/hour: 0.00
Total discards/hour: 0.00
Total operations/hour: 95258.62
End of Statistics.
View latency between the records processed by Goldengate and the timestamp in the data source:
GGSCI (development) 13> send extract ext2, getlag
Sending GETLAG request to EXTRACT EXT2 …
Last record lag: 3 seconds.
At EOF, no more records to process.
GGSCI (development) 15> lag extract ext*
Sending GETLAG request to EXTRACT EXT1 …
Last record lag: 1 seconds.
At EOF, no more records to process.
Sending GETLAG request to EXTRACT EXT2 …
Last record lag: 1 seconds.
At EOF, no more records to process.
Viewing the GoldenGate error log as well as history of commands executed and other events:
For UNIX:
vi ggserr.log
Or
GGSCI command: VIEW GGSEVT
View the process report:
GGSCI (development) 2> view report ext1
GGSCI (development) 2> view report rep1
B. PERFORMANCE TIPS
To reduce bandwidth requirements:
Use compression options of the RMTHOST parameter to compress data
before it is sent across the network. Weigh the benefits of
compression against the CPU resources that are required to perform
the compression.
To increase the TCP/IP packet size:
a. Use the TCPBUFSIZE option of the RMTHOST parameter to increase the
size of the TCP socket buffer that Extract maintains. By increasing
the size of the buffer, you can send larger packets to the target
system. Consult with Network Support before setting TCPBUFSIZE.
b. Use SQL Arrays
The BATCHSQL parameter will increase the performance of Replicat. BATCHSQL causes Replicat to create arrays for similar SQL statements and apply them at an accelerated rate. Normally, Replicat applies one SQL statement at a time.
• At 100 bytes of data per row change, BATCHSQL can
improve Replicat’s performance by up to 300 percent; actual
performance benefits will vary, depending on the oeverall
demand on system resources.
• At around 5,000 bytes of data per row change, the benefits of
using BATCHSQL diminish.
. Improve I/O within the system configuration
• Place trail files on the fastest disk controller.
• Use RAID 0+1 disk configuration because GoldenGate performs
sequential writes.
* Be careful with the following points. Increasing the values of
these parameters will improve performance but take longer to
reprocess your data in case of process failure.
• Use the CHECKPOINTSECS in Extract or Replicat; if increased,
less frequent checkpoints; increases data to be reprocessed if
process fails; keep transaction logs available in case of
reprocessing
• Use the GROUPTRANSOPS; increases number of SQL operations in a Replicat ;
reduces I/O to checkpoint file and checkpoint table
Data Filtering and Conversion:
Use primary Extract for data capture only.
Use a data pump on the source to perform filtering and thereby send
less data over the network.
Alternatively, use Replicat for conversion and, if the network can
handle large amounts of data, also for filtering.
**(Upcoming Post - “Troubleshooting GoldenGate Extracts”)