Monday, September 30, 2013

Understanding Semaphores for Oracle

Semaphores can be thought of as on/off flags. A process can turn the flag on or turn it off. If the flag is already on, processes which try to turn on the flag will sleep until the flag is off.

There are three semaphore parameters which are important to Oracle.

  • SEMMSL is number of semaphores in a semaphore set
  • SEMMNI is the maximum number of semaphores sets in the system
  • SEMMNS is the number of semaphores in the system

A semaphore set is a structure that stores a group of semaphores. SEMMNS should be the result of SEMMSL multiplied by SEMMNI. So the maximum number of semaphores that can be allocated in the system will be the lesser of (SEMMSL*SEMMNI) or SEMMNS.

$> cat /etc/sysctl.conf | grep kernel.sem

kernel.sem = 250              32000           100               128
----------------SEMMSL---SEMMNS---SEMOPM---SEMMNI 

$> ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767

Oracle needs to allocate a number of semaphores equal to the processes parameter in pfile or spfile. Otherwise the startup process fails. The semaphores can be in one semaphore set or multiple sets. Please check Metalink Doc ID 169706.1 for semaphore requirement during the installation for each platform. 

The following example shows the relationship between semaphores and processes in init.ora file.

SQL> show parameter processes
processes                            integer     128

SQL> host ipcs -s


------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x86608da4 917505     oracle    640        132

There is one set of 132 semaphores. Now we change the processes parameter in the init.ora file to 246 and restart the database,

SQL> show parameter processes
processes                            integer     246
SQL> host ipcs -s

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x86608da4 2457601    oracle    640        250

All 250 semaphores owned by Oracle are stored in one semaphore set (SEMMSL=250). Now we change the parameter to 247 and restart.


SQL> show parameter processes
processes                            integer     247
SQL> host ipcs -s

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x86608da4 2588673    oracle    640        125
0x86608da5 2621442    oracle    640        125
0x86608da6 2654211    oracle    640        125

Two more semaphores are created. Now change processes = 498 and restart.

SQL> show parameter processes 
processes                            integer     498
SQL> host ipcs -s

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x86608da4 3440641    oracle    640        125
0x86608da5 3473410    oracle    640        125
0x86608da6 3506179    oracle    640        125
0x86608da7 3538948    oracle    640        125
0x86608da8 3571717    oracle    640        125

If we have more than one instance on the server and PROCESSES parameters are different, we might want to make SEMMSL equal to the lowest PROCESSES so we do not allocate semaphores that will not be used. For example, instance A has PROCESSES=100 and instance B has PROCESSES=50. If SEMMSL = 50, 3 semaphore sets will be allocated, 2 for instance A and 1 for B. If SEMMSL = 100, 2 semaphore sets will be allocated, 1 for instance A and 1 for B. In this case, only 50 semaphores will be used by instance B. The rest 50 semaphores is unused and cannot be allocated for any other databases.

Reference

  • Metalink Doc ID 15566.1: Unix Semaphores and Shared Memory Explained
  • https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/chap-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores.html


Sunday, September 29, 2013

rlwrap in SQL Plus and RMAN

The rlwrap (readline wrapper) utility is a handy tool for SQL Plus and RMAN. It provides a command history and editing of keyboard input.

The details can be found here,
http://www.oracle-base.com/articles/linux/rlwrap.php

Monday, September 9, 2013

db link does not work due to $TNS_ADMIN

Few weeks ago I run into a very weird problem. A database link works on the server but it does not work on the client.

On the database server,

SQL> select * from dual@orcl.oracle.com;


On any client,

SQL> select * from dual@orcl.oracle.com;

Later I also found the database link does not work from sqlplus on database server if I use the tnsname.


This is really interesting. After some investigation I found out the difference is environmental parameter $TNS_ADMIN.


ps -ef | grep oracle

pxargs -p

Monday, January 28, 2013

DBMS_DEBUG_VC2COLL

Today I was asked to re-process 700+ orders. A list of Order_IDs was provided and I was expected to update the order status and process date. It seems to be an easy task,
update orders
   set status = 'PAID',last_modified_dt = sysdate
 where order_id in (1234, 1235, 1239) 
But the problem is there are some orders missing. There are no such orders exist in the database for some order_IDs. I was asked to provide a list of those missing order_IDs. It is easy to find order_IDs exist in the database,
select order_ID
  from orders
 where order_id in (1234, 1235, 1239) 
For example, only order_ID 1234 and 1235 exist. Order_ID 1239 does not exist in the database. How could we find the non-existing orders? My first thought is spool the result of the select statement to a file, then compare it with the file including all order_IDs using diff command. It definitely works. Is there a neat way in SQL to do it? I found a very handy collection data type SYS.DBMS_DEBUG_VC2COLL, which can convert a list to an array. Here are some examples.
select * 
  from table(sys.dbms_debug_vc2coll('a','b','c'));
COLUMN_VALUE
------------
a
b
c
select * 
  from table(sys.dbms_debug_vc2coll(1,2,3));
COLUMN_VALUE
------------
1
2
3
Let us go back to my case. How can I achieve my goal?
select * 
  from table(sys.dbms_debug_var2coll(1234, 1235, 1239) 
 minus
select order_ID 
  from orders 
 where order_id in (1234, 1235, 1239) 
That is it. It solves the problem without using diff command. Is there any other better way?

Thursday, December 27, 2012

An interesting problem about systimestamp

Recently we scheduled a job to monitor the orders process status. The statement looks like,


SELECT *
FROM ORDERS
WHERE STATUS_LAST_UPDT_TS
BETWEEN TRUNC (SYSTIMESTAMP) AND SYSTIMESTAMP - 30 / 1440

The job is scheduled at 3PM everyday so it should get the orders which were last updated between midnight and 2:30PM. However we found orders placed at 2:30 to 3:00 PM were returned as well. After some investigations I found out the root cause is the data type of systimestamp.

First, let us go over some basic knowledge of timestamp.

Oracle timestamp is an extension of the date datatype. It can store date and time data, including fractional seconds.

Timestamp with time zone is an extension of the timestamp datatype.It includes the time zone information.


Internally the timestamp type takes 11 bytes of storage and the timestamp with time zone takes 13 bytes.

Similar to sysdate, the function systimestamp returns the current timestamp with time zone of the database. localtimestamp returns the timestamp with the time zone of current session. For example,

alter session set time_zone = '-05:00';

Session altered.

select systimestamp,localtimestamp from dual;

SYSTIMESTAMP                                         LOCALTIMESTAMP
--------------------------------------------------    --------------------------------------------------
27-DEC-12 1.47.42.257748 PM -06:00       27-DEC-12 2.47.42.257787 PM

Simiare to sysdate, we can do basic date arithmetic on systimestamp.

alter session set nls_date_format = 'DD-MON-YY HH.MI.SS AM';

Session altered.

select systimestamp, systimestamp - 30/1440 from dual;

SYSTIMESTAMP                                         SYSTIMESTAMP-30/1440
--------------------------------------------------    --------------------------------------------------
27-DEC-12 1.53.48.752908 PM -06:00        27-DEC-12 1.23.48 PM

It works most of the time, but there is some problem with it. Unfortunately we run into this problem.


Here is our story. Our database is in Eastern Time Zone (-05:00) while we are in Central Time Zone (-06:00).

SQL> ALTER SESSION SET TIME_ZONE='-6:00';

Session altered.

SQL> SELECT SYSTIMESTAMP
           FROM DUAL
           WHERE TO_TIMESTAMP_TZ('2012/12/27 14:25 -5:00','YYYY/MM/DD HH24:MI TZH:TZM')
          BETWEEN TRUNC(SYSTIMESTAMP) AND SYSTIMESTAMP - 30/1440;

SYSTIMESTAMP
---------------------------------------------------------------------------
27-DEC-12 02.33.18.399155 PM -05:00

1 row selected.

This is not correct. The current time is 2:33 PM so 2:25 PM should not fall between trunc(systimestamp) and timestamp - 30/1440.

SQL> ALTER SESSION SET TIME_ZONE='-5:00';

Session altered.

SQL> SELECT SYSTIMESTAMP
           FROM DUAL
           WHERE TO_TIMESTAMP_TZ('2012/12/27 14:25 -5:00','YYYY/MM/DD HH24:MI TZH:TZM')
          BETWEEN TRUNC(SYSTIMESTAMP) AND SYSTIMESTAMP - 30/1440;

no rows selected

This is the right answer. But why does it happen? 

The trick is. systimestamp is a timestamp with timezone but (systimestamp - 30/1440) is not longer a timestamp, it is date type. It can be easily demonstrated by,


SQL> select dump(systimestamp),dump(systimestamp-30/1440) from dual;

DUMP(SYSTIMESTAMP)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DUMP(SYSTIMESTAMP-30/1440)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Typ=188 Len=20: 7,220,12,27,19,39,0,0,59,94,163,144,251,0,5,0,0,0,0,0
Typ=13 Len=8: 7,220,12,27,14,9,0,0

Thursday, June 28, 2012

Snapshot Too Old after SAN outrage

Recently we experienced an SAN outrage. The databases was brought up after SAN was fixed. However we received an ORA-01555 error “Snapshot too old” when we run a SQL statement, e.g.
select * 
  from scott.orders 
 where order_date between trunc(sysdate - 1) and trunc(sysdate);
ORA-01555 is caused by Oracle read consistency mechanism. When the long run query is not able to find the before-image from undo segment, ORA-01555 occurs. But I was very sure it is not our case, since the application server was still down and there were sessions only from dba. And I also noticed the SQL statement was successful without ORA-01555 if
select * 
  from scott.orders 
 where order_date between trunc(sysdate - 2) and trunc(sysdate - 1); 
It makes me think there is data block corruption for the data in the time range of trunc(sysdate - 1) and trunc(sysdate).


Oracle Metalink document 787004.1 describes a very similar situation, except that we have xml data type. So I continued the investigation.

select order_id 
  from scott.orders 
 where order_date between trunc(sysdate - 1) and trunc(sysdate);
select order_xml 
  from scott.orders 
 where order_date between trunc(sysdate - 1) and trunc(sysdate);

The first SQL was successful. The second got the same ORA-01555 error. Now it is confirmed order_xml column was corrupted.


I wrote a script to find out the corrupt data block,
DECLARE
   n   NUMBER;
   s   VARCHAR2 (4000);
BEGIN
   FOR c1 IN (SELECT *
                FROM scott.orders
               WHERE empno)
   LOOP
      BEGIN
         s := XMLType.getClobVal (c1.order_xml);
      EXCEPTION
         WHEN OTHERS
         THEN
            DBMS_OUTPUT.put_line ('ORA' || SQLCODE || ': order_id = ' || c1.order_id);
      END;
   END LOOP;
END;

Tuesday, May 1, 2012

Oracle Listener Security Alert: A Possible 13 Years Old Oracle Bug?

On Monday 4/30/2012 Oracle finally released a security alert CVE-2012-1675 to address the “TNS Listener Poison Attack” in Oracle database. According to Joxean Koret, "the bug is probably available in any Oracle Database version since 1999 (Oracle 8i) to the latest one (Oracle 11g) ...  The bug was reported to Oracle in 2008 so it only took them 4 years to fix the vulnerability since reported."

A description of the security alert is available at http://www.oracle.com/technetwork/topics/security/alert-cve-2012-1675-1608180.html

A comprehensive vulnerability explanation http://seclists.org/fulldisclosure/2012/Apr/204

There is no current CPU or PSU to fix the bug. Oracle provided a solution to fix the bug. Please follow Support Note 1340831.1 for RAC databases and Support Note 1453883.1 for non-RAC databases. DO NOT blame me if links are not working. Try the links during the day or search in Oracle Metalink.