Bobby Durrett's DBA Blog

Subscribe to Bobby Durrett's DBA Blog feed
Oracle database performance
Updated: 4 hours 35 min ago

SymPy Tutorial Repository

Fri, 2021-04-09 11:02

I have been playing with the Python SymPy package and created a repository with my test scripts and notes:

Might be helpful to someone. I just got started.

I had used Maxima before. SymPy and Maxima are both what Wikipedia calls “Computer Algebra Systems.” They have a nice list here:

I got a lot of use out of Maxima but I think it makes sense to switch the SymPy because it is written in Python and works well with other mainstream Python packages that I use like Matplotlib. They both fall under the SciPy umbrella of related tools so for me if I need some computer algebra I probably should stick with SymPy.

Maxima and SymPy are both free.


Categories: DBA Blogs

ORA-14767 when day of month > 28 with interval partitioning month interval

Wed, 2021-04-07 18:07
  4  )
  7  (
  8    PARTITION data_p1 VALUES LESS THAN (TO_DATE('01/29/2017', 'MM/DD/YYYY'))
  9  );
ERROR at line 1:
ORA-14767: Cannot specify this interval with existing high bounds

  4  )
  7  (
  8    PARTITION data_p1 VALUES LESS THAN (TO_DATE('01/28/2017', 'MM/DD/YYYY'))
  9  );

Table created.

Creating a range partitioned table with a date type partitioning column and a month interval must have a starting partition that has a day < 29 or it gets an ORA-14767 error.

The error message “Cannot specify this interval with existing high bounds” is not helpful. How about something like “Need a day of the month that exists in every month”? February only has 28 days most years so 28 is the max.


Categories: DBA Blogs

Free Machine Learning Class from MIT

Thu, 2021-04-01 14:05

I noticed this new class from MIT:

It is about machine learning and is free. I think it has some built in exercises with automatic grading but no instructor to interact with.

Since ML is such a hot topic I thought I would share it. I have not taken the class.


Categories: DBA Blogs

60,000 sessions in 64 gigabyte VM using shared servers

Tue, 2021-03-30 11:40

Just a quick note. I have an application that is generating thousands of inactive sessions and with the default dedicated server configuration we are having to add more and more memory to our virtual host to support the connections. We estimate that the application may need 45,000 mostly inactive sessions once the application is fully rolled out. So, I thought about how much memory would be required to support 45,000 sessions using shared servers. In an earlier post I mentioned how I got the sessions up to about 11,000 so I just took the Java program from that post and tried to adjust memory parameters to support over 45,000. I got it up to 0ver 60,000 so the test was essentially successful. I don’t think I would want to run a system with 60,000 sessions on a single node, but it is nice to see that it is to some degree possible.

I used a 64 gigabyte Linux VM and set these parameters:


Pretty sure that the large pool grew dynamically to fill the sga space not taken up by the shared pool. 52-36=16 gigabyte large pool.

Anyway, I don’t have time to write this up carefully now, but I wanted to publish the parameters.

Here is the previous post with the Java program I used to open 1000 connections:

I ended up running 30 of these on 3 servers for a total of 90,000 potential logins and got up to over 63,000.


Categories: DBA Blogs


Thu, 2021-03-18 18:06

I ran this query with a hint:

SQL> select /*+ full(my_tables) */ blocks
  2  from my_tables
  3  where
  4  owner = 'SYS' and
  5  table_name = 'TAB$';


I ran this select to get the plan:

select * from table(dbms_xplan.display_cursor(null,null,'ALL'));

I was getting this error:

Column Projection Information (identified by operation id):
 1 - "BLOCKS"[NUMBER,22]
 ORA-00904: : invalid identifier

I found that my user or public needed an execute grant for DBMS_LOB to fix this:

SQL> grant execute on DBMS_LOB to PUBLIC;

Grant succeeded.

I am not sure why this grant was not in place on this database but it took a while to figure this out so I thought I would put it out there. I found the error in a trace and I suspected the issue was due to permissions. The trace was like:

PARSE ERROR ... err=904

So that gave me the idea that I needed an execute grant on DBMS_LOB. EXECUTE ANY PROCEDURE did not do it.

After the grant it shows the hint report. This is on 19c:

Column Projection Information (identified by operation id):
 1 - "BLOCKS"[NUMBER,22]
 Hint Report (identified by...
 Total hints for statement: 1
 1 -  SEL$1 / MY_TABLES@SEL$1
            -  full(my_tables)


P.S. Full log of the script that got the error:

Full log of the working script:

Full length trace lines:

Categories: DBA Blogs

Finding queries with bad plans from an AWR report of a load test

Fri, 2021-03-12 16:10

I want to document some recent steps that I have been taking to support new development on a transactional system. Every time the system has a new release, if that release includes Oracle SQL changes, I am asked to create and review an AWR report covering the time of a load test (usually several tests) and to see if I see any problems. In the past I looked for longer running application SQL but recently I changed to look at anything that averages over .1 seconds and that has been helpful. So, that is what this post is about. Obviously, if you have faster hardware or different workloads this rule of thumb will not help you. But maybe other higher-volume transactional systems will follow similar patterns.

Here is the top elapsed SQL from a recent load test:

SQL Ordered by Elapsed Time

I have only shown some of the columns to hide some details that I do not want to make public. Every SQL here whose “Elapsed Time per Exec (s)” value is above .1 seconds is not part of the application except the last one, 6kmnq0uj99a3c. This was a load test on a non-production system that ramped up a bunch of activity over several hours. This problem query only ran 664 times so if that is representative of how often it runs in production it may not really matter that it is inefficient. But you never know for sure, so I reviewed it anyway. All the queries listed that run in .03, .01, .02, and .00 seconds are representative of the typical queries with good plans on this system. So, that is why .1 ended up standing out. Also, not long ago I found two or three running in production with more than .1 seconds average runtime and they needed new indexes to bring them back under the .1 second threshold. So, for me .1 seconds is the current magical number.

To test it I used two of my scripts.

  • bind2.sql – to extract values used in one of the executions
  • test2.sql – to find out which table the execution spent the most time on

I replaced the bind variables with constants and ran the query in my test2.sql script and found that most of the time was on a certain range scan on what looked like the correct index. But on closer inspection I realized that a type conversion had prevented the last column of the index from being used. Here is what it looked like with the columns renamed to hide the production names.


One table has the “MY_NUMBER” column as a character string and the other as a number. So, it was doing a range scan and not a unique scan. I changed the query to convert the number to a character string and the plan used a unique scan.


Table ABC was the one that was doing a range scan on three columns and not on MY_NUMBER, the last column in the index. MY_NUMBER is a character column on ABC. XYZ was the other table with MY_NUMBER as a NUMBER type column. I am forcing the conversion of XYZ.MY_NUMBER to a character for the comparison instead of letting the optimizer choose to convert ABC.MY_NUMBER to a number which would suppress the use of the last column of the index on table ABC.

My point was not to talk about implicit type conversions preventing indexes from being used although that is very interesting. My point is that a plan like this that is pretty efficient could run less than .1 seconds if the index was used correctly. And if the application users end up scaling the use of this query way up to thousands or tens of thousands of executions per hour that unique scan could make a huge difference over the range scan without the last column of the index. Your CPUs might be 10 times faster than mine so your threshold might be lower than .1 seconds, but I think the idea is the same. There is some threshold that indicates a simple, frequently used, transactional SQL may not be using the right indexes. Does not apply in all cases but at the moment this is a useful rule of thumb for me.

I had just written the previous paragraphs before getting an email that our QA team had run another load test with the to_char explicit type conversion in place. It did not make as great of an improvement as I expected. Here are some edited outputs from my sqlstat.sql script:

Original query 6kmnq0uj99a3c:

 09-MAR-21 11.00.08 AM               79         171.306696
 09-MAR-21 12.00.35 PM               84         176.152667
 09-MAR-21 01.00.03 PM               80         178.420588
 09-MAR-21 02.00.32 PM               80         171.877913
 09-MAR-21 03.00.01 PM               81         174.509975
 09-MAR-21 04.00.29 PM               83         180.367157

New query 2ndfgypwp3qf0 with the to_char to allow the unique index scan:

 12-MAR-21 09.00.15 AM               80         107.822088
 12-MAR-21 10.00.44 AM               83         104.453446
 12-MAR-21 11.00.12 AM               81          105.34042
 12-MAR-21 12.00.42 PM               80          103.05625
 12-MAR-21 01.00.12 PM               79         106.738557
 12-MAR-21 02.00.42 PM               82         101.285183
 12-MAR-21 03.00.12 PM               81         105.172531

Kind of disappointing. I expected a greater improvement based on my testing. Still, .1 seconds per execution is better than .17. Maybe if the tables grow with more data over time this improvement will be greater.

Even though this query did not turn out to have a dramatic improvement I did find a way to improve the plan. My .1 seconds cutoff pointed me to a query that did not have the ideal use of indexes and lead to an improvement in performance. In other cases, in the past I have seen 20x improvements so it is worth reviewing the ones over .1 seconds.


Categories: DBA Blogs

Simple tools I use

Thu, 2021-03-11 17:48

Just a quick note about two tools I use:

TextPad – my favorite text editor. I know everyone has their own, but this is mine.

UnixUtls – Unix like tools on a Windows command line.


Categories: DBA Blogs

Wrapped Lines and Squished Pictures

Wed, 2021-03-10 17:11

I have been having trouble using WordPress on this blog. I keep having long lines wrapped instead of having a slider that readers can use to see the end of the lines. Also, pictures that looked fine when I posted them later look squished together. Yuck.

Long Lines

First, I will try to put some longer lines of output here using the preformatted type of block:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms Cluster Average ms Application Average ms Concurrency Average ms Average buffer gets Average disk reads Average disk write megabytes Average rows processed
 6kmnq0uj99a3c        65249283 09-MAR-21 PM               80         178.420588        163.875             0                  0                      0                      0          13345.9375                  0                            0                  829.6
 6kmnq0uj99a3c        65249283 09-MAR-21 PM               80         171.877913        159.875             0                  0                      0                      0          13122.1375                  0                            0               816.0125
 6kmnq0uj99a3c        65249283 09-MAR-21 PM               81         174.509975     159.876543             0                  0                      0                      0          13145.2346                  0                            0             818.111111
 6kmnq0uj99a3c        65249283 09-MAR-21 PM               83         180.367157     164.939759             0                  0                      0                      0          13286.4337                  0                            0             825.843373
 6kmnq0uj99a3c        65249283 09-MAR-21 PM               40           26.11575           21.5        1.9689                  0                      0                      0               915.7              3.425                            0                     51

Notice how it wraps around and looks unreadable. I could swear that either a preformatted or a code block did not wrap in the recent past. Here is the same text in a code block:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms Cluster Average ms Application Average ms Concurrency Average ms Average buffer gets Average disk reads Average disk write megabytes Average rows processed
------------- --------------- ------------------------- ---------------- ------------------ -------------- ------------- ------------------ ---------------------- ---------------------- ------------------- ------------------ ---------------------------- ----------------------
6kmnq0uj99a3c        65249283 09-MAR-21 PM               80         178.420588        163.875             0                  0                      0                      0          13345.9375                  0                            0                  829.6
6kmnq0uj99a3c        65249283 09-MAR-21 PM               80         171.877913        159.875             0                  0                      0                      0          13122.1375                  0                            0               816.0125
6kmnq0uj99a3c        65249283 09-MAR-21 PM               81         174.509975     159.876543             0                  0                      0                      0          13145.2346                  0                            0             818.111111
6kmnq0uj99a3c        65249283 09-MAR-21 PM               83         180.367157     164.939759             0                  0                      0                      0          13286.4337                  0                            0             825.843373
6kmnq0uj99a3c        65249283 09-MAR-21 PM               40           26.11575           21.5        1.9689                  0                      0                      0               915.7              3.425                            0                     51

Basically, the same problem although font and background are different. One thing I have done in the past is use a GitHub Gist. I would past the text into a gist and put the URL inline like this:

This no longer seems to work. I had to go back and change a bunch of posts with links like this to embed the gist in the posts. To do that I had an amusing set of steps:

  1. Create a new Paragraph block
  2. Add one space
  3. Choose Edit as HTML
  4. Paste in embedded gist between the <p> and </p>

Example of what I have to past in:

<p><script src=""></script></p>

Here are the long lines as an embedded gist:

It would be great if there were a simpler way to do this. Maybe there is.


The second challenge is that when I paste in screenshots, they get all squished. Here is a graphical version of the same type data:

The picture is not square, so it gets squished in. It is nice that you can click on it and see the big version, but I would like it to not be so ugly beforehand.

Thumbnail is 150 x 150 and very small.

75% is still squished

50% is not squished but the text is small. At least you can click on it and the big version pops up.

As I am writing this I realize there is a guide that you can click on to manually size the picture and it shows you have far to the right you can size it before it starts getting squished.

So, I guess for now I am stuck with either making my text lines short enough to fit or sticking them in a Gist. For images I just need to size them with the little tool to keep them within the margins, so they do not get pushed in to fit.


Categories: DBA Blogs

DBVERIFY (dbv) outputs block_id for bigfiles

Wed, 2021-03-03 11:50

I posted a question about this and didn’t find an answer. I searched both Oracle’s support site and the internet in general. I ran the DBVERIFY utility dbv against a bigfile datafile that had corrupt blocks and wanted to relate the blocks back to a table using the DBA_EXTENTS view. For smallfile datafiles I could have used dbms_utility.data_block_address_block. But the manual says that it does not work with bigfiles. I did a small test and found that with bigfiles the address output by dbv is just the block_id within the data file. With a smallfile tablespace it was some combination of block_id and file_id. Really, it is more helpful for dbv to spit out the block_id if you are running it against a datafile because you already know which datafile you have. I will include some of the output of the test below.

Steps of my test:

  1. create a small bigfile tablespace
  2. create empty table nologging
  3. take a rman backup
  4. do a nologging update
  5. delete tablespace/datafile
  6. restore and recover it
  7. verify corruption exists
  8. run dbv to get DBA – block address
  9. run rman backup validate to get file id and block id
  10. select from dba_extents to get block locations for table
1 - create a small bigfile tablespace

create bigfile tablespace big datafile '/home/oracle/product/oradata/ORCL/big.dbf' size 10M;

[oracle@ora19 bigfiletests]$ sqlplus / as sysdba

SQL*Plus: Release - Production on Wed Mar 3 07:46:15 2021

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Connected to:
Oracle Database 19c Enterprise Edition Release - Production

SQL> create bigfile tablespace big datafile '/home/oracle/product/oradata/ORCL/big.dbf' size 10M;

Tablespace created.

This was on a small 19c test database on a Linux VM.

2 - create empty table - maybe ctas and truncate select * from dba_tables;

ORCL:SYSTEM>create table bigfiletest tablespace big nologging as select * from dba_tables where 1=2;

Table created.

Had to create table with NOLOGGING to make the insert append below unrecoverable.

3 - take a rman backup
[oracle@ora19 ORCL]$ rman target /

Recovery Manager: Release - Production on Wed Mar 3 08:11:29 2021

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL (DBID=1534990846)

RMAN> backup database;

Starting backup at 03-MAR-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=37 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/home/oracle/product/oradata/ORCL/system01.dbf
input datafile file number=00003 name=/home/oracle/product/oradata/ORCL/sysaux01.dbf
input datafile file number=00004 name=/home/oracle/product/oradata/ORCL/undotbs01.dbf
input datafile file number=00005 name=/home/oracle/product/oradata/ORCL/big.dbf
input datafile file number=00007 name=/home/oracle/product/oradata/ORCL/users01.dbf
channel ORA_DISK_1: starting piece 1 at 03-MAR-21
channel ORA_DISK_1: finished piece 1 at 03-MAR-21
piece handle=/home/oracle/product/db/19.0.0/dbs/04voq09o_4_1_1 tag=TAG20210303T081136 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:02:05
Finished backup at 03-MAR-21

Starting Control File and SPFILE Autobackup at 03-MAR-21
piece handle=/home/oracle/product/db/19.0.0/dbs/c-1534990846-20210303-02 comment=NONE
Finished Control File and SPFILE Autobackup at 03-MAR-21

Just a regular backup before doing the unrecoverable insert append into nologging table.

4 - do a nologging update - insert append select * from dba_tables commit

ORCL:SYSTEM>insert /*+ append */ into bigfiletest
  2  select * from dba_tables where rownum < 2;

1 row created.


Commit complete.

Just one row – should be one corrupt block.

5 - delete tablespace/datafile

[oracle@ora19 ORCL]$ cd /home/oracle/product/oradata/ORCL
[oracle@ora19 ORCL]$ ls -altr
total 2813440
drwxr-x---. 3 oracle oinstall         17 Jul 30  2019 ..
-rw-r-----. 1 oracle oinstall  209715712 Mar  3 07:50 redo02.log
-rw-r-----. 1 oracle oinstall  209715712 Mar  3 07:50 redo03.log
-rw-r-----. 1 oracle oinstall   57679872 Mar  3 07:51 temp01.dbf
drwxr-x---. 2 oracle oinstall       4096 Mar  3 08:02 .
-rw-r-----. 1 oracle oinstall 1142956032 Mar  3 08:11 system01.dbf
-rw-r-----. 1 oracle oinstall  692068352 Mar  3 08:11 sysaux01.dbf
-rw-r-----. 1 oracle oinstall  356524032 Mar  3 08:11 undotbs01.dbf
-rw-r-----. 1 oracle oinstall    5251072 Mar  3 08:11 users01.dbf
-rw-r-----. 1 oracle oinstall   10493952 Mar  3 08:14 big.dbf
-rw-r-----. 1 oracle oinstall  209715712 Mar  3 08:15 redo01.log
-rw-r-----. 1 oracle oinstall   10600448 Mar  3 08:15 control01.ctl
-rw-r-----. 1 oracle oinstall   10600448 Mar  3 08:15 control02.ctl
[oracle@ora19 ORCL]$ rm big.dbf

Now the unrecoverable nologging insert append change is lost. It is not on the backup and not on the redo or archived redo logs.

6 - restore and recover it

[oracle@ora19 ORCL]$ rman target /

Recovery Manager: Release - Production on Wed Mar 3 08:16:07 2021

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL (DBID=1534990846)

RMAN> alter tablespace big offline immediate;

using target database control file instead of recovery catalog
Statement processed

RMAN> restore tablespace big;

Starting restore at 03-MAR-21
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=37 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00005 to /home/oracle/product/oradata/ORCL/big.dbf
channel ORA_DISK_1: reading from backup piece /home/oracle/product/db/19.0.0/dbs/04voq09o_4_1_1
channel ORA_DISK_1: piece handle=/home/oracle/product/db/19.0.0/dbs/04voq09o_4_1_1 tag=TAG20210303T081136
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 03-MAR-21

RMAN> recover tablespace big;

Starting recover at 03-MAR-21
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 03-MAR-21

RMAN> alter tablespace big online;

Statement processed

Simple tablespace restore and recovery. Had to alter tablespace offline immediate because the file was not there.

7 - verify corruption exists

ORCL:SYSTEM>select * from bigfiletest;
select * from bigfiletest
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 787)
ORA-01110: data file 5: '/home/oracle/product/oradata/ORCL/big.dbf'
ORA-26040: Data block was loaded using the NOLOGGING option

This just shows that the block is corrupt. It also gives us the file number (which we already knew) and the block id which would relate back to DBA_EXTENTS.

8 - run dbv to get DBAs - block addresses

dbv file=/home/oracle/product/oradata/ORCL/big.dbf blocksize=8192

[oracle@ora19 ORCL]$ dbv file=/home/oracle/product/oradata/ORCL/big.dbf blocksize=8192

DBVERIFY: Release - Production on Wed Mar 3 08:21:45 2021

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/product/oradata/ORCL/big.dbf

DBV-00201: Block, DBA 787, marked corrupt for invalid redo application

This was kind of the moment of truth. The DBA from the DBVERIFY utility dbv was 787 which is the same as the block number in the error from the select.

9 - run rman backup validate to get file id and block ids

[oracle@ora19 ORCL]$ rman target /

Recovery Manager: Release - Production on Wed Mar 3 08:30:47 2021

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL (DBID=1534990846)



File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     1              53           1280            2503068
  File Name: /home/oracle/product/oradata/ORCL/big.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              1
  Index      0              0
  Other      0              1226


no rows selected


---------- ---------- ---------- ----------------------- --------- --------------------- --------- ----------------- --------- ---------------------------------------- --------- ----------
         5        787          1                 2502865                         2502865                     1920977 02-MAY-19 74043                                    UNKNOWN            0

RMAN VALIDATE has the same block number – 787.

10 - select from dba_extents to get block locations for table

ORCL:SYSTEM>select * from dba_extents
  2  where segment_name='BIGFILETEST';

OWNER                                                                                                                            SEGMENT_NAME                                                                                                                     PARTITION_NAME                                                                                                                   SEGMENT_TYPE       TABLESPACE_NAME                 EXTENT_ID    FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
-------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ------------------ ------------------------------ ---------- ---------- ---------- ---------- ---------- ------------
SYSTEM                                                                                                                           BIGFILETEST                                                                                                                                                                                                                                                       TABLE              BIG                                     0          5        784      65536          8         1024

Seems to be the forth block. The extent starts at block 784 but block 787 is corrupt.

I had a larger test database with many corrupt blocks due to the way we had populated it with an RMAN restore and recover. I knew which table was corrupt because I ran select count(*) queries against every table on the database and only found one corrupt. Using the DBA value from dbv against the DBA_EXTENTS view for over 300 sample corrupt blocks that all pointed back to the table I knew was corrupt. I queried it like this:

tablespace_name, segment_type, owner, segment_name 
FROM my_extents 
WHERE file_id = 29 and 15340893 between block_id AND block_id + blocks - 1;

I created the my_extents table from dba_extents to speed up these queries:

create table my_extents as
select * from dba_extents;
create index my_extents_i1 on my_extents(block_id);

execute dbms_stats.gather_table_stats('MYOWNER','MY_EXTENTS');

Anyway, I do not know if this holds true in every situation, but it appears that the DBA values from dbv for bigfiles correspond to the block_id values in DBA_EXTENTS.


Categories: DBA Blogs

$6 per month for blog on AWS

Tue, 2021-03-02 17:59

It looks like this blog is costing me about $6 per month on AWS which is cool. I was paying about $200/year or about $17 per month on iPage. I am not sure what I am missing. This blog is much faster on AWS even with a minimal size virtual machine.

I did pay a fixed, up-front $12 to switch my domain registrar to AWS and that is not included in the $6/month. The compute was $123.97 including tax for 3 years or 36 months. That buys use one 2.5 gigahertz processor and 1 gig of memory. Disk is extra as is a couple of other things. The extras were $2.27 for February which of course only has 28 days. If the site were to get busy, then the monthly price would go up but that seems unlikely.

Not sure what I am missing. iPage did more for you and could be used by a non-technical person but if you can handle the technical part AWS seems better and cheaper.


Categories: DBA Blogs

Created MySQL Scripts Repository

Fri, 2021-02-26 12:30

Most of my work is with Oracle databases but I have been starting to learn a little bit about MySQL. Where I work, we have a bunch of on premises Oracle databases including large and active ones with various challenging performance tuning problems that I enjoy working on. But for the last couple or so years we have been using MySQL on RDS within Amazon’s cloud (AWS). I have not had many chances to work on a MySQL performance issue, but I want to be ready when the time comes. Back in 2019 I read several chapters of the MySQL 5.7 reference manual and started looking at tuning. Last week I picked it up again. I was mainly looking at Amazon’s Performance Insights that we have running on our RDS databases and seeing what information it gave us and how to link that back to other tables/views that MySQL uses for performance. Anyway, I have made notes on our internal web pages, but I also decided to create a public GitHub repository with some of the SQL and Python scripts that I have created so far. It is very basic, and I am sure that other people have more sophisticated tools, but I think it is helpful to me to create the repository and to update it as I learn more. So, if you are interested here it is:


Categories: DBA Blogs

Updated Centos VM to Oracle Linux

Sun, 2021-01-31 12:59

I was using Centos Linux for Nethack programming and playing on my home laptop. Evidently Centos is going away at the end of 2021 and Tim Hall had a post about moving Centos to Oracle Linux. So, I did it. The theory was that I would still be able to get updates from Oracle in 2022 and beyond.

I followed the instructions from here:

I did this while logged in as root:

git clone
cd centos2ol

I got this error:

Error: Package: tkinter-2.7.5-88.el7.x86_64 (@base/7)
           Requires: python = 2.7.5-88.el7

I do not need tkinter so I just removed it:

yum remove tkinter-2.7.5-88.el7.x86_64

Then I just did a yum update and it seemed fine. The only odd thing is that when I reboot, I get an error like this:

integrity: Unable to open file: /etc/keys/x509_ima.der (-2)

I looked up this message and it seems like I cannot fix it but can just ignore it. Not a big deal but would be cleaner without it.

It was easy to switch but I just thought I would document the two error messages, even though they did not cause any problems.


Categories: DBA Blogs

Moved my blog to AWS

Thu, 2021-01-28 09:07

It has been two years or more since I first thought about moving my blog to Amazon Web Services (AWS) from iPage, the hosting company that this blog has always been on. My company uses AWS for a number of cloud initiatives, and I need to learn more about it. I thought it made sense to make moving my blog a little cloud training project for me. Plus, there were a couple of things that I wanted to improve over my experience with iPage. My point is not to criticize them. It has been great having a blog, and iPage made it easy to get started. But performance of my blog has really gone down. Often my site was so slow that updates would time out and often the blog would just hang. Also, I wanted to get full control over the environment and have a Linux host that I could ssh into. So, it made sense to move to AWS and put my blog on an EC2 Linux virtual machine. But I put it off until now because it seemed like too big of a project. It turned out to not be that bad. I felt like I had a little extra time at the beginning of the year before we get deep into new projects for 2021 so it made sense to do this simple cloud project. I am glad that I did it because now my blog is more responsive, and I learned a lot about AWS and hosting a web site.

My move to AWS revolved around me reading a number of informative web pages – mostly AWS tutorial pages – so I want to put the links to them in this post and describe things I learned from them or ways that I deviated from them. First off, I created an AWS account for myself two years ago and let it sit. I do not recommend this approach. I probably could have saved myself a little money on free tier pricing if I created the account right before I started on the blog migration. But I do not care. The money is not very much, and the training is worth far more. I do not have the link to whatever page I used to setup my AWS account but it was not hard. You need a credit card pretty much. I ended up deciding to use the same Oregon region that my company uses since it is out West near where I live in Arizona.

Since I already had an AWS account my first step was to see how to setup an EC2 instance that would hold a WordPress blog. Since I want to save money I chose the inexpensive t2.micro size instance which has 1 core and 1 gigabyte of memory with 8 gigabytes of storage. Hopefully I could run this simple blog on an affordable instance. I knew from iPage support that I was running in around 128-256 megabytes of memory on their shared server so hopefully a 1 GB memory host would be enough. The first tutorial I used showed me how to setup a “LAMP” server which would support WordPress on an EC2:

Tutorial: Install a LAMP web server on Amazon Linux 2

It seems that “Amazon Linux 2” is the latest version so I went with that. The biggest change I ended up making from this tutorial is that I wanted to use PHP 7.4 which WordPress recommends. So, I changed this line from the tutorial:

sudo amazon-linux-extras install -y lamp-mariadb10.2-php7.2 php7.2


sudo amazon-linux-extras install php7.4

Much later on when I was trying to get Jetpack to work I ended up installing php-xml as well with:

sudo yum install php-xml

I thought about trying to install the latest version of MySQL but got bogged down trying to figure out the right yum command so I just stuck with the MariaDB install as listed in the tutorial. The only other thing from this tutorial is that after migrating my blog’s files over I had to rerun the chmod commands listed to get the directories and files to have the group permissions needed to run updates. So, I ran it once to get the server setup but much later I ran it again with all my files in place.

Now that I had the LAMP server setup on my small EC2 instance I needed to install WordPress just to get a basic install working. Later I would overwrite all the WordPress files with ones from my iPage host. I followed this tutorial:

Tutorial: Host a WordPress blog on Amazon Linux 2

One problem with the tutorial is that it left it where I could not do WordPress updates without an FTP password. But with my EC2 I could only use a key file to login so I followed the instructions in this URL to fix it where I could run updates without a password:

I did:

chmod -R 775 ./wp-content

vi wp-config.php

# add the next line:

define('FS_METHOD', 'direct');

The only other thing I could say about this tutorial is that it does not include things like how to deal with plugins like Wordfence and Jetpack that I had to deal with later in the migration.

After getting a basic WordPress instance working on an EC2, I had to figure out how the internet works. I really have not messed that much with domain registration and DNS records prior to this. In the end it was much simpler than it seemed at first. After looking at tutorials on moving domains to AWS I figured out what seemed like a safe path. Leave the domain registration with iPage and keep using the iPage name servers but point the web server DNS records to my EC2 instance. If something went wrong, I could just point the DNS records back to iPage, delete everything from AWS, and I would be no worse off than when I started. I ended up doing the following steps to get my blog working and up on AWS but still with the domain and DNS at iPage:

  1. Put an under-construction message on my current blog
  2. Backup current down EC2 to snapshot
  3. Setup Elastic IP and bring up EC2
  4. Copy database from iPage to EC2 with edits
  5. Copy files from iPage to EC2 with edits
  6. Move DNS entries from iPage host (for web only) to EC2 and test
  7. Setup certificate with Certbot

I edited this list for clarity because I did some wrong things that failed but these are the steps that really worked. I had to setup the certificate after switching over the DNS to point to the EC2. One key thing I learned was that you really needed to get a domain pointed to your site before you could finish setting it up. That is a bit of a pain when you are moving from one site to another.

I put a post out there letting people know this blog would be up and down. Then I posted an update about how things are going and kept updating it with PS’s.

Backing up an EC2 is a lot like taking a snapshot of a virtual machine in VirtualBox. You just stop the EC2 instance and take a snapshot of its volume. My small instance only has one volume which is the root filesystem /. The only thing I had to figure out was that the device name for the root filesystem was /dev/xvda. You must know that when you restore a volume from a snapshot. Works well. Nice to be able to take snapshots and restore from them.

An Elastic IP is Amazon’s term for an IP address that is on the internet. If you create an Elastic IP address and associate it with an EC2 instance, then it will always have that address after you stop and start it. Otherwise your instance’s public IP address changes with every stop and start. For me I was using Putty and WinSCP to access the host and it was nice to finally get a consistent IP address. Also, I could setup my DNS entries to point to an IP address that would not change which is good since I am stopping and starting the EC2 instance all the time. Documentation for Elastic IP:

Elastic IP addresses

Copying the database was not hard. I had played with this years ago. You download the SQL script to rebuild and load the MySQL tables for the blog database through phpMyAdmin in iPage’s GUI control panel. Was only about a 27-megabyte file. I just had to edit it to have my database name. IPage had a bunch of letters and numbers as the database name and I made up one that was more friendly. I just used WinSCP to copy the edited SQL dump up to my EC2 and ran it against the MariaDB database I had already created when I installed WordPress. The only minor trick was that I did a custom export of the database from iPage telling it to include drop table commands. That way the script dropped the tables created by the initial WordPress install and created new ones.

I ended up copying the files using SCP commands from the EC2 host. They looked like this:

scp -pr -P 2222* .
scp -p -P 2222* .

I was worried that I might miss an error copying a file and not notice it, but I just ran these commands manually and they seemed to run fine so I did not worry about it.

The most interesting thing was how simple it was to move the DNS records from my iPage host to the new one. There were two “A” records for the web site. One was for and the other for I just made a note of the current IP address of my host on iPage and replaced it with the Elastic IP address from AWS.

Originally, I tried to use the certificate that I had already paid for through next year. IPage lets you download the certificate and its private key from its GUI control panel. This worked in terms of letting the site be visible, but Jetpack kept throwing these errors:

cURL error 60: SSL certificate problem: unable to get local issuer certificate

I tried everything I could find on the internet and finally concluded that there must be some file that iPage uses with the “chain” of certificates or certificate authorities. Something like that. I didn’t delve that deep into it. I just figured that there was a file that I didn’t have. So, I gave up on the certificate I paid for and installed the free Certbot instead. Here are the instructions I used to setup SSL:

Tutorial: Configure SSL/TLS on Amazon Linux 2

I did not do step 3 to harden it, but I did do the steps at the end called “Certificate automation: Let’s Encrypt with Certbot on Amazon Linux 2”. Those steps gave my site an A rating on so they must do the hardening steps automatically. I tried to do step 2 with my certificate from iPage but that did not work with Jetpack. I think I was missing the file for SSLCACertificateFile in /etc/httpd/conf.d/ssl.conf and that was why I got the certificate error. In any case the Let’s Encrypt with Certbot steps seemed to cure a lot of ills.

I ran into a funny problem at one point in this setup. I ran out of memory. I don’t think I kept the error message. WordPress just told me something was wrong, and I had to update this line in /var/www/html/wp-config.php to see the error:

define('WP_DEBUG', true);

This showed the out of memory error on the PHP web pages. I think you can also send these errors to a log on the filesystem which is probably what I will configure it to do later. But this WP_DEBUG true setting showed me that various PHP scripts were out of memory. The t2.micro instance has 1 gigabyte of memory and no swap. So, when you run out of memory you are out. You cannot swap processes out to disk. I found a bunch of processes named php-fpm using the memory. I had to edit the file /etc/php-fpm.conf to limit my system to 5 php-fhm processes with this setting:

; The maximum number of processes FPM will fork. This has been designed to control
; the global number of processes when using dynamic PM within a lot of pools.
; Use it with caution.
; Note: A value of 0 indicates no limit
; Default Value: 0
;process.max = 128
process.max = 5

So far, the limit of 5 processes has not hurt the site. It is much more responsive than it was before. I may have to increase this later if it gets to be a problem. Right now, I am running with plenty of free memory.

The last steps were to move my DNS entries from iPage and then the domain registration. This tutorial describes moving the DNS entries:

Making Route 53 the DNS service for a domain that’s in use

This was a lot easier than I thought it would be. I only had 5 DNS entries to move. Two were the A records for the web site and 3 were for the email forwarding company ImprovMX. I thought email forwarding would be done within AWS, but it was a lot easier to use the free ImprovMX service. I only get a handful of emails a month. Anyway, you have to setup two MX records and one TXT record for the email forwarding. So, I manually added 5 entries on Route 53 in AWS and moved my DNS from iPage to Amazon. This site shows you which name servers your domain is using and who it is registered by:

Here are the steps to transfer the domain registration:

Transferring registration for a domain to Amazon Route 53

Pretty cool. The domain registration transfer finally finished after 7 days. I shut down my iPage account and I am fully on AWS. I was on iPage for over 8 years and it has been a great value to me, but it was worth it to move my blog to AWS at this time both for the better performance and for the experience messing with AWS.


P.S. While waiting for the domain registration transfer to finish I realized that I was not getting any email from my blog site. Evidently AWS blocks email sent from an EC2 instance by default. You have to ask AWS to remove their email block as documented here: How do I remove the restriction on port 25 from my Amazon EC2 instance? They approved it easily, so it was no big deal, but I am still having issues.

I added a couple of yum packages:

yum install sendmail

yum install mailx

Not sure if these were needed. Now I am trying to get “reverse DNS” setup which I think means that when I run nslookup on my IP address it returns instead of the Amazon public DNS name. ImprovMX may require this to prevent outgoing email being blocked as spam. This is now working:


Non-authoritative answer:



Was not as hard as I thought. This was a helpful page:

Meanwhile I setup a weekly backup script and a space monitoring script. I also wrote a Python script to automate bringing down the EC2 instance and creating a snapshot of its root volume. There is more that I could say but this is enough for one post.

Categories: DBA Blogs

Up on AWS

Wed, 2021-01-20 17:10

I have more to do but I have pointing to an Amazon EC2 virtual machine. Seems faster.

I still have email forwarding done on iPage and they still have the domain registration and dns.

I plan to work on the email forwarding next so all the email addresses will be handled by AWS. Either that or just change them to my personal email address. Would be nice to use the domain since I have it.


P.S. Jetpack was down for a good bit. Had to edit the php.ini file for some reason.

P.P.S. Jetpack was down again this morning. Changed from the certificate I got from iPage to a free one from certbot. Seems to be working now.

P.P.P.S. Looks like I got email forwarding working with improvmx which is free instead of doing something in AWS which is hard to use and costs money. Kind of nice because the certificate is free and now the email is also. So it is just the cost of the hosting and domain.

PPPPS Up on Amazon’s name servers. Transferring domain. Could be a few days but it is the last step to be fully AWS.

1/26/21 8:18 AM Phoenix Arizona time

Outgoing email is not working. So, no one will get notifications of new comments or posts including my replies to comments. Also I will not get any notifications from WordPress of my plugins. I am trying to figure out how to get this going. I have email forwarding working but I cannot get an email out of the EC2 instance.

2:15 PM

This might be fixed. I got a few emails out that had been blocked. They all went into spam. I am trying to get reverse dns setup so that may keep the emails from looking like spam.

1/27/21 8:10 AM Phoenix time

AWS got reverse dns setup and now email works well from blog to world including to my alias.

Categories: DBA Blogs

Under Construction

Wed, 2021-01-20 13:48

I am moving this blog from iPage to AWS starting now. This page,, might be down, or you might see certificate errors. This is just me. I will put up an all clear post when it is done.


Categories: DBA Blogs

Column Masking Testcase

Fri, 2021-01-15 16:23

I put together a quick testcase to show myself how to use DBMS_RLS on to mask a column. If you would like to run it or see the output it is here: test case zip.

I ran the test case as SYSTEM with a tnsnames.ora entry named ORCL1124.

I based the test on the Oracle Security manual section titled: “Using Column Masking to Display Sensitive Columns as NULL Values”

The most interesting thing was that I did a 10053 trace like I did on an earlier post: earlier VPD post with trace.

Tracing the column masking I found that Oracle sticks a CASE statement into the select clause:

Final query after transformations:******* UNPARSED QUERY IS *******

The predicate that I setup just checked for


So, Oracle’s DBMS_RLS package for VPD features just adds a CASE statement that returns a NULL for the column when the predicate is true and returns the real column otherwise.

Not earth shattering but I did not want to forget this.


Categories: DBA Blogs

Configure listener for dedicated with shared servers

Thu, 2021-01-14 11:20

I did a little research yesterday and I am writing this post to document what I learned so I can remember it and since I could not easily find this information. I have a database that uses shared servers and want to connect to it using a dedicated server connection. Everything I found said to add (SERVER = DEDICATED) on your client but I was not able to do that. A coworker of mine said that I should be able to add a service that only connects as a dedicated server process so I tried to figure out how to do that. I found a way to configure a service in the listener.ora file on the database server so that anyone connecting with that service name would only get a dedicated server connection. I tested this on Oracle on Linux

I setup a test database for shared servers with these parameters:

alter system set max_shared_servers = 2 scope=both;

alter system set shared_servers=2 scope=both;
alter system set dispatchers='(PROTOCOL=TCP)(DISPATCHERS=2)' scope=both;

I set the service_name and domain to be like production:

SQL> show parameter service

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
service_names                        string      ORCL

SQL> show parameter domain

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_domain                            string      WORLD

I set the listener.ora similar to production:

        (ADDRESS = (PROTOCOL = TCP)(HOST = ora1124)(PORT = 1521))

    (SID_DESC =
      (SID_NAME = ORCL)
      (ORACLE_HOME = /oracle/product/db/

I stopped the listener and database and then started the listener and database. At this point if I connected using (SERVICE_NAME = ORCL.WORLD) or (SID = ORCL) I got a shared server process. For example: =
      (ADDRESS = (PROTOCOL = TCP)(Host = = 61020))
  ) =
      (ADDRESS = (PROTOCOL = TCP)(Host = = 61020))
      (SID = ORCL)

To this point all I was doing was matching a production database of a different name’s configuration. What I wanted to do was do something to the listener.ora to add a new service name that only connects dedicated. I changed the SID_LIST_LISTENER part of the listener.ora to this:

    (SID_DESC =
      (SID_NAME = ORCL)
      (ORACLE_HOME = /oracle/product/db/
    (SID_DESC =
      (SID_NAME = ORCL)
      (ORACLE_HOME = /oracle/product/db/

This added the service name ORCLDED that only connects dedicated. I used a tns entry like this: =
      (ADDRESS = (PROTOCOL = TCP)(Host = = 61020))

I verified this by running this query with the connection:

select username,server,count(*)
from v$session
group by username,server
order by username,server;

It would show my user’s session as either SHARED or DEDICATED.

Also, I could tell from the listener status:

[oracle@ora1124 admin]$ lsnrctl status

LSNRCTL for Linux: Version - Production on 13-JAN-2021 14:46:26

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=ora1124)(PORT=1521)))
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version - Production
Start Date                13-JAN-2021 14:45:51
Uptime                    0 days 0 hr. 0 min. 36 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/product/db/
Listener Log File         /oracle/product/diag/tnslsnr/ora1124/listener/alert/log.xml
Listening Endpoints Summary...
Services Summary...
Service "ORCL.WORLD" has 2 instance(s).
  Instance "ORCL", status UNKNOWN, has 1 handler(s) for this service...
  Instance "ORCL", status READY, has 3 handler(s) for this service...
Service "ORCLDED" has 1 instance(s).
  Instance "ORCL", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

Notice that service ORCLDED only has 1 handler which is the one to spawn dedicated server processes. ORCL.WORLD has 3 handlers which includes 2 for the 2 dispatchers.

I also experimented with adding the setting SERVICE= to the dispatchers parameter but it did not meet my needs in this case. Here is the format of the parameter as I tested it:

alter system set dispatchers='(PROTOCOL=TCP)(DISPATCHERS=2)(SERVICE=SS)' scope=both;

With this setting the listener status had this entry for service SS:

Service "SS" has 1 instance(s).
  Instance "ORCL", status READY, has 2 handler(s) for this service...

So, it looks like setting the service in the dispatchers parameter would pull out the shared server process connections on to that service but that was not what I needed in this case.

Anyway, I messed with this yesterday and thought I would put it out there for my own memory and in case others need it.


Categories: DBA Blogs

Querying Many Databases in Parallel in Python

Thu, 2020-12-03 17:21

I have some Python scripts that I use to access a bunch of databases to gather information such as the size of the database. Usually it is not that important how long they take so I have been running queries against a list of 100 databases one at a time and it can take a while depending on what I do. Coworkers of mine have worked on running Unix/Linux shell scripts in parallel to hit a bunch of systems like this more quickly. So, I thought I would look at how to do the same in Python. I am sure there are more complicated ways to do it, but I got good results based on the simplest example from this part of the Python documentation:

Here is a simple python script to get the size of the database files from four databases at the same time:

All I did was take the first example from the multiprocessing documentation and replace f() which squared a number with dbspace() which connects to a database and runs a query to get the total db size. I experimented with different Pool() sizes. It is 4 in the example.

I picked 82 non-production databases that I have access to and ran the same query sequentially in a loop and it took 375 seconds. I took the code above and increased the Pool to 32 processes and it ran in 24 seconds. That is about a 15x speedup.

I am sure that I could get a lot more complicated with all this stuff, but this works well with minimal effort.


Categories: DBA Blogs

AWR and V$ Scripts for a SQL_ID

Thu, 2020-11-05 17:00

Quick note to myself. If I am tuning a query by its SQL_ID I use these scripts:

sqlstat.sql – shows execution history of SQL_ID from AWR

getplans.sql – shows all plans for SQL_ID from AWR

vsqlarea.sql – shows current executions of SQL_ID from V$ view

extractplansqlid.sql – new script to pull current plan of SQL_ID from V$ views

I mainly mine the history from the AWR but I have been looking at V$ view information more recently, especially after putting in a fix to a slow query plan.


Categories: DBA Blogs

SQL Profile example when best plan not clear

Tue, 2020-10-20 12:54

I resolved another production performance issue with a SQL Profile yesterday. I have several posts about SQL Profiles, so I do not want to be redundant, but this case was a little different because it was not clear that I had a better plan. I want to document the challenge that I had deciding if I had the best plan and show the resolution.

On September 21 when I think I was on vacation or otherwise not in the office there was a big performance issue on an older production database. The on-call DBA identified the sql_id of the top SQL statement as 30q69rbpn7g75. But he and an application support developer together could not connect that SQL statement back to the long running reports that were impacting the business. Eventually the issue went away later that night. Here is some of the execution history from that original issue:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms Cluster Average ms Application Average ms Concurrency Average ms Average buffer gets Average disk reads Average disk write megabytes Average rows processed
------------- --------------- ------------------------- ---------------- ------------------ -------------- ------------- ------------------ ---------------------- ---------------------- ------------------- ------------------ ---------------------------- ----------------------
30q69rbpn7g75      1463081877 21-SEP-20 AM                5         420798.922         144724    209959.863                  0                      0                      0           4824516.8           710748.4                            0                    870
30q69rbpn7g75      1732425686 21-SEP-20 AM               13         66.9426923     56.1538462    7.25523077                  0                      0                      0          6410.23077         249.769231                            0             414.538462
30q69rbpn7g75       592872749 21-SEP-20 AM                1           4144.437           3240       955.246                  0                      0                      0               63878              29725                            0                     89
30q69rbpn7g75      4175108531 21-SEP-20 AM               11         172328.731     92788.1818    62448.1775                  0                      0                      0             3463219         466919.273                            0             610.090909
30q69rbpn7g75      2823019537 21-SEP-20 AM               19         332354.362     212357.895     22580.778                  0                      0                      0          11145610.8         163882.105                            0             303.526316
30q69rbpn7g75      2908690533 21-SEP-20 AM                1          23377.169          13070     11680.972                  0                      0                      0             1582917              89479                            0                    258
30q69rbpn7g75       291832905 21-SEP-20 AM                2         40314.0255          15940      24490.36                  0                      0                      0             1831813             128525                            0                    248
30q69rbpn7g75      1467059382 21-SEP-20 AM                1          20179.636           5760     16155.407                  0                      0                      0              124599              84761                            0                    780
30q69rbpn7g75      1033740578 21-SEP-20 AM                1            1728.49           1570         2.906                  0                      0                      0                1525                 35                            0                     12
30q69rbpn7g75      4175108531 21-SEP-20 AM                8         147782.833        59617.5    65356.3268                  0                      0                      0             2280007          245985.25                            0                    402
30q69rbpn7g75      3938646712 21-SEP-20 AM                2         139722.393          55905     86105.482                  0                      0                      0             6080269             616766                            0                 1143.5
30q69rbpn7g75      2823019537 21-SEP-20 PM               48         238332.678     138706.875    19077.4738                  0                      0                      0          6928661.85         99573.2708                            0             145.395833
30q69rbpn7g75      2823019537 21-SEP-20 PM               64         147520.373     80835.1563    19092.0985                  0                      0                      0          4148771.28         106131.016                            0              79.890625
30q69rbpn7g75      2823019537 21-SEP-20 PM               58         180185.939     113102.931    14365.2987                  0                      0                      0          5926129.21         123920.569                            0             22.0344828
30q69rbpn7g75      2823019537 21-SEP-20 PM               37         307432.645     201436.216    22904.6901                  0                      0                      0          10204978.9         158950.973                            0             201.243243
30q69rbpn7g75      2823019537 21-SEP-20 PM               28         465140.082     326940.357    30687.9033                  0                      0                      0          16715547.3             263153                            0             460.571429
30q69rbpn7g75      2823019537 21-SEP-20 PM               14         934982.157     690958.571    41595.1995                  0                      0                      0          34940770.8         365038.357                            0             243.285714
30q69rbpn7g75      2823019537 21-SEP-20 PM               14         818768.534     640054.286    42596.9506                  0                      0                      0          33547406.9         451864.786                            0                  471.5
30q69rbpn7g75      2823019537 21-SEP-20 PM                4         2329248.39        2013515    75722.5718                  0                      0                      0           104343531         1027683.25                            0                    859
30q69rbpn7g75      2823019537 21-SEP-20 PM                1         4006478.22        3707840     88265.422                  0                      0                      0           186157328            1082000                            0                   2744
30q69rbpn7g75      2823019537 21-SEP-20 PM                1         1818375.63        1771470     20586.628                  0                      0                      0            88206433             374924                            0                   2692
30q69rbpn7g75      2823019537 21-SEP-20 PM                2            1742051        1399440     41061.122                  0                      0                      0          68750135.5             335797                            0                 1479.5
30q69rbpn7g75      2823019537 21-SEP-20 PM                0         3552963.71        3183770    142948.208                  0                      0                      0           154159601             633488                            0                      0

This output is from my sqlstat.sql script. Notice how earlier in the day there are several plans with widely varying elapsed times. Plan hash value 1732425686 averages 66 millisconds during the hour ending at 10 am. I do not know why the query has so many different plans. 2823019537 was the bad plan and it got locked in throughout the afternoon and into the night.

Later in that same week I reviewed the plans and found that the longest running plans used certain indexes on the top table and the shorter ones used another index. I looked at the “Segments by Logical Reads” section of an AWR report during the problem time and found that the top segment was index X6_WFCTOTAL. I used by getplans.sql to get all the plans for SQL id 30q69rbpn7g75 and found that the fastest ones used range scans against index X1_WFCTOTAL and the slow ones did range or skip scans against indexes X5_WFCTOTAL or X6_WFCTOTAL. So I picked one plan, 382022017, and used coe_xfr_sql_profile.sql to force 30q69rbpn7g75 to always run with plan 382022017 which used index X1_WFCTOTAL. Here is some execution history of the plan I picked:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms Cluster Average ms Application Average ms Concurrency Average ms Average buffer gets Average disk reads Average disk write megabytes Average rows processed
------------- --------------- ------------------------- ---------------- ------------------ -------------- ------------- ------------------ ---------------------- ---------------------- ------------------- ------------------ ---------------------------- ----------------------
30q69rbpn7g75       382022017 24-SEP-20 AM                2          1713.2625            865         5.805                  0                      0                      0                2135               76.5                            0                    141
30q69rbpn7g75       382022017 24-SEP-20 AM                2            355.329            120      265.0765                  0                      0                      0                8183             1324.5                            0                    673
30q69rbpn7g75       382022017 24-SEP-20 AM                1            190.386            180        28.981                  0                      0                      0               17505               1759                            0                   1878
30q69rbpn7g75       382022017 24-SEP-20 PM                4          217.37625            200        20.723                  0                      0                      0            21009.25             392.25                            0                   1865
30q69rbpn7g75       382022017 24-SEP-20 PM                5           507.1578            114      432.2858                  0                      0                      0              7076.6              936.2                            0                  407.8
30q69rbpn7g75       382022017 24-SEP-20 PM                3         47.5793333             10    35.1866667                  0                      0                      0          504.333333         81.3333333                            0                     54
30q69rbpn7g75       382022017 24-SEP-20 PM                1            313.107            110       229.071                  0                      0                      0                8178                399                            0                    396
30q69rbpn7g75       382022017 25-SEP-20 AM                3         30.7433333     16.6666667    15.0446667                  0                      0                      0                 927                 89                            0             110.666667

This plan was averaging less than 1800 milliseconds. But after putting in the supposedly better plan it seemed to have worse execution times that other plans. Also, the plan hash value was different than 382022017. Here is the execution history from when I put this SQL Profile in place:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms Cluster Average ms Application Average ms Concurrency Average ms Average buffer gets Average disk reads Average disk write megabytes Average rows processed
------------- --------------- ------------------------- ---------------- ------------------ -------------- ------------- ------------------ ---------------------- ---------------------- ------------------- ------------------ ---------------------------- ----------------------
30q69rbpn7g75       653445232 28-SEP-20 PM                1          92139.015          26930     61332.901                  0                 20.998                      0             2202073            2197909                            0                    171
30q69rbpn7g75       653445232 28-SEP-20 PM                9         83373.8866     25152.2222    56386.2852                  0             14.6918889                      0          2102264.11            2065530                            0             2613.88889
30q69rbpn7g75       653445232 28-SEP-20 PM                5         101830.312          29264    70157.0998                  0                 9.1714                      0           2439219.6          2432260.2                            0                    348
30q69rbpn7g75       653445232 28-SEP-20 PM                9          89705.846     26606.6667    61716.6917                  0             14.2046667                      0          2212464.67         2196829.33                            0             825.111111
30q69rbpn7g75       653445232 28-SEP-20 PM                2          91173.826          26425     63832.838                  0                22.1385                      0             2181959          2195820.5                            0                  310.5
30q69rbpn7g75       653445232 28-SEP-20 PM                4         90240.0368          26430    62587.1345                  0                 16.558                      0             2200555          2197764.5                            0                 168.75
30q69rbpn7g75       653445232 28-SEP-20 PM                3         107541.972     26816.6667     75418.071                  0                 16.164                      0             2193977         2193579.33                            0             39.3333333
30q69rbpn7g75       653445232 29-SEP-20 AM                1          92416.895          26410     66108.169                  0                 21.449                      0             2194591            2193764                            0                    158
30q69rbpn7g75       653445232 29-SEP-20 AM                1         103773.265          26510     75455.905                  0                 13.887                      0             2200242            2198725                            0                    122
30q69rbpn7g75       653445232 29-SEP-20 AM               13         74637.6784     20391.5385    51139.8206                  0             16.9292308                      0          1686133.69         1684935.15                            0             24.2307692
30q69rbpn7g75       653445232 29-SEP-20 AM               10         105894.074          28882    72971.0734                  0                20.1169                      0           2418827.8          2417314.8                            0                     56
30q69rbpn7g75       653445232 29-SEP-20 AM                4         89408.2108          26080    61537.7755                  0                10.0275                      0             2174791            2169846                            0                 421.75
30q69rbpn7g75       653445232 29-SEP-20 AM                7         71644.7906     20574.2857    48465.1234                  0                  8.331                      0             1697739         1694385.43                            0             232.857143
30q69rbpn7g75       653445232 29-SEP-20 AM                2         113993.942          32580    72589.2515                  0                 6.2465                      0             2672441            2667206                            0                    145
30q69rbpn7g75       653445232 29-SEP-20 AM                6         99793.2442          28600    69181.7687                  0             4.65783333                      0          2394135.83         2387505.17                            0                    246
30q69rbpn7g75       653445232 29-SEP-20 AM               10         94787.9044          28367    65535.8735                  0                10.6895                      0           2353904.6          2336951.3                            0                  476.8
30q69rbpn7g75       653445232 29-SEP-20 PM                5           89167.86          26462     61499.462                  0                14.4808                      0           2200557.8            2195895                            0                  201.8

Notice first that the plan hash value is 653445232 instead of 382022017. I usually see things like this when the plan has system generated table temporary table names, but I have not seen that in the plan. There must be something like that going on. Either that on the SQL Profile just locks in a slightly different plan. Anyway, 653445232 is the plan caused by the SQL Profile. Notice how the average elapsed time hangs around 90,000 to 100,000 milliseconds. But 382022017 has elapsed times under 1800 milliseconds. Seeing these results, I dropped the SQL Profile. Kind of like a doctor who swears to “do no harm” I did not want to slow down queries that were running very efficiently in my attempt to prevent another system slowdown or outage like we had on September 21. I dropped the SQL Profile on September 29.

Then yesterday, October 19, the problem recurred. This time the bad plan was 3908549093 and used a skip scan on X6_WFCTOTAL. I seem to see a lot of bad plans with skip scans. Here was the execution history before I put the SQL Profile back in:

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms Cluster Average ms Application Average ms Concurrency Average ms Average buffer gets Average disk reads Average disk write megabytes Average rows processed
------------- --------------- ------------------------- ---------------- ------------------ -------------- ------------- ------------------ ---------------------- ---------------------- ------------------- ------------------ ---------------------------- ----------------------
30q69rbpn7g75      3908549093 19-OCT-20 AM               16         832180.951      450528.75    68650.1103                  0                      0                      0          21204921.8           495900.5                            0                    562
30q69rbpn7g75      3908549093 19-OCT-20 PM               33         392068.144     194336.364    47412.7197                  0                      0                      0          9271475.06         221593.545                            0             309.454545
30q69rbpn7g75      3908549093 19-OCT-20 PM                4         3543778.15        1793980    261653.391                  0                      0                      0          82176276.8            1088971                            0                 1036.5

Between 12 and 1 pm the system was really bogged down with executions of 30q69rbpn7g75 taking almost an hour, 3543778 milliseconds. So, I put the SQL Profile back in. I had the script sitting there from the first time I tried it. These are all kept in the sqlt/utl directory. After putting it in things cleared up and the backlog of reports slowly emptied. I think we killed one long running session and had the user rerun the report with the SQL Profile in place.

SQL_ID        PLAN_HASH_VALUE END_INTERVAL_TIME         EXECUTIONS_DELTA Elapsed Average ms CPU Average ms IO Average ms Cluster Average ms Application Average ms Concurrency Average ms Average buffer gets Average disk reads Average disk write megabytes Average rows processed
------------- --------------- ------------------------- ---------------- ------------------ -------------- ------------- ------------------ ---------------------- ---------------------- ------------------- ------------------ ---------------------------- ----------------------
30q69rbpn7g75       653445232 19-OCT-20 PM               27         106287.519     26936.2963     61603.699                  0             7.41166667                      0          2226367.89         2204705.19                            0             1358.25926
30q69rbpn7g75      3908549093 19-OCT-20 PM                0         9380042.72        6465310    308289.185                  0                      0                      0           294469425            1581492                            0                   7067
30q69rbpn7g75       653445232 19-OCT-20 PM               80         103522.356       26259.25    67322.2418                  0             21.1012125                      0           2159835.1         2142347.26                            0               914.2625
30q69rbpn7g75       653445232 19-OCT-20 PM               31         99252.0065          26990    66069.6995                  0             24.5769355                      0          2227363.45         2213021.26                            0             885.709677
30q69rbpn7g75       653445232 19-OCT-20 PM                5         94587.0244          26988    64440.3338                  0                24.1514                      0           2223326.4          2204672.6                            0                 1553.6
30q69rbpn7g75       653445232 19-OCT-20 PM                5         93301.2074          26690    65105.9732                  0                14.0782                      0           2213653.6            2203033                            0                  736.4
30q69rbpn7g75       653445232 19-OCT-20 PM                3         101117.233     27193.3333    72020.9813                  0             15.4373333                      0          2225218.33            2207644                            0                   1623
30q69rbpn7g75       653445232 20-OCT-20 AM                1           92061.35          26550     64953.945                  0                 22.245                      0             2210157            2206170                            0                    235
30q69rbpn7g75       653445232 20-OCT-20 AM                1          92872.242          26470     66092.822                  0                 11.999                      0             2208305            2206231                            0                    158
30q69rbpn7g75       653445232 20-OCT-20 AM                2         88107.2095          26075     61670.129                  0                17.2175                      0           2205332.5            2203981                            0                    116
30q69rbpn7g75       653445232 20-OCT-20 AM                1          91007.493          26210     64276.474                  0                  9.972                      0             2208516            2206310                            0                    177
30q69rbpn7g75       653445232 20-OCT-20 AM                4         101878.314          26940     65491.475                  0                 30.476                      0          2210945.25         2204828.75                            0                  461.5
30q69rbpn7g75       653445232 20-OCT-20 AM               10         97441.3635          26496    67549.4579                  0                  3.178                      0           2197412.4          2192467.5                            0                  478.1
30q69rbpn7g75       653445232 20-OCT-20 AM                5         59362.9672          17038    41573.7714                  0                  7.767                      0           1416804.4            1416061                            0                   10.2
30q69rbpn7g75       653445232 20-OCT-20 AM                6         108681.505     30798.3333    75082.4997                  0             10.5146667                      0          2558854.67         2549531.83                            0                    405
30q69rbpn7g75       653445232 20-OCT-20 AM                1         186021.778          53440    131017.985                  0                 18.976                      0             4285650            4202806                            0                    814

But with the SQL Profile in place it still in consistently around 90,000 milliseconds even today. There are no executions under 1800 milliseconds like before. It is a puzzle, but it got us through the issue. I am planning to leave this in place to prevent another production outage, but I suspect that these reports may be running longer than normal in many cases. But at least they are consistent and the business needs are being met.

So, this is another example of a SQL Profile to the rescue in a production database performance issue. But this was different because it was not clear that the one plan was always the best one. I went so far as to put it in and take it out and then put it in again. I ended up leaving it in because it resolved a performance issue that we had twice and do not want to have in the future. Hopefully any less than optimal performance outside of these peak times will be worth it since we are preventing an outage with significant impact to the business.


Categories: DBA Blogs