Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 5 days 9 hours ago

Oracle 18c Grid Infrastructure on Windows Server

Sat, 2019-04-27 10:05

Oracle Grid Infrastucture can be installed on Windows platform. The steps are the same that on other platforms. In this blog we are going to install Oracle GI 18c on Windows 2016.I have two disks on my server
Disk 0 : for the system
Disk 1 : for the ASM
I am using a VirtualBox virtual machine.
We suppose that the grid infrastructure sofware is already downloaded and decompressed in the grid home.
Like on other platforms, we have to configure the ASM disk. In the documentation we can read :
The only partitions that OUI displays for Windows systems are logical drives that are on disks and have been marked (or stamped) with asmtoolg or by Oracle Automatic Storage Management (Oracle ASM) Filter Driver.
So Disk1 should not be formatted and should not be assigned to a letter.
Then the first step is to create logical partition using Windows diskpart utility.

Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.

C:\Users\Administrator>diskpart

Microsoft DiskPart version 10.0.14393.0

Copyright (C) 1999-2013 Microsoft Corporation.
On computer: RACWIN2

DISKPART> list disk

  Disk ###  Status         Size     Free     Dyn  Gpt
  --------  -------------  -------  -------  ---  ---
  Disk 0    Online           60 GB      0 B
  Disk 1    Online           20 GB    20 GB

DISKPART> select disk 1

Disk 1 is now the selected disk.

DISKPART> create partition extended

DiskPart succeeded in creating the specified partition.

DISKPART> create partition logical

DiskPart succeeded in creating the specified partition.

DISKPART>

We can then list existing partition for Disk 1

DISKPART> list partition

  Partition ###  Type              Size     Offset
  -------------  ----------------  -------  -------
  Partition 0    Extended            19 GB  1024 KB
* Partition 1    Logical             19 GB  2048 KB

DISKPART>

Once the logical partition created we can launch the asmtool or asmtoolg utility. This utility comes with the grid software

c:\app\grid\18000\bin>asmtoolg.exe

The first time we executed the asmtoolg.exe command, we get following error

According to this Oracle support note Windows: asmtoolg: MSVCR120.dll is missing from your computer (Doc ID 2251869.1), we have to download and install Visual C++ 2013 Redistributable Package.
Once done we launch again the asmtoolg utility

Clicking on next, we can choose the disk we want to stamp for ASM

Click on Next

Click on Next

And Click to Finish. We can then list the disks marked for ASM with the asmtool utility.

C:\Users\Administrator>cd c:\app\18000\grid\bin

c:\app\18000\grid\bin>asmtool.exe -list
NTFS                             \Device\Harddisk0\Partition1              500M
NTFS                             \Device\Harddisk0\Partition2            60938M
ORCLDISKDATA0                    \Device\Harddisk1\Partition1            20477M
c:\app\18000\grid\bin>

Now it’s time to launch the gridSetup executable

c:\app\grid\18000>gridSetup.bat






We decide to ignore the Warning


At the end, we got an error from the cluster verification utility. But it is normal because we ignored some perquisites.


We can verify that the insallation was fine

c:\app\18000\grid>crsctl status resource -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       racwin2                  STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       racwin2                  STABLE
ora.asm
               ONLINE  ONLINE       racwin2                  Started,STABLE
ora.ons
               OFFLINE OFFLINE      racwin2                  STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
      1        ONLINE  ONLINE       racwin2                  STABLE
ora.evmd
      1        ONLINE  ONLINE       racwin2                  STABLE
--------------------------------------------------------------------------------

c:\app\18000\grid>

We can connect to the ASM instance

C:\Users\Administrator>set oracle_sid=+ASM

C:\Users\Administrator>sqlplus / as sysasm

SQL*Plus: Release 18.0.0.0.0 - Production on Sat Apr 27 05:49:38 2019
Version 18.3.0.0.0

Copyright (c) 1982, 2018, Oracle.  All rights reserved.


SQL> select name,state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
DATA                           MOUNTED

SQL>

Conclusion
Once the grid infrastructure configured, we have to install the Oracle database software.

Cet article Oracle 18c Grid Infrastructure on Windows Server est apparu en premier sur Blog dbi services.

Creating PostgreSQL users with a PL/pgSQL function

Thu, 2019-04-25 04:07

Sometimes you might want to create users in PostgreSQL using a function. One use case for this is, that you want to give other users the possibility to create users without granting them the right to do so. How is that possible then? Very much the same as in Oracle you can create functions in PostgreSQL that either execute under the permission of the user who created the function or they run under the permissions of the user who executes the function. Lets see how that works.

Here is a little PL/pgSQL function that creates a user with a given password, does some checks on the input parameters and tests if the user already exists:

create or replace function f_create_user ( pv_username name
                                         , pv_password text
                                         ) returns boolean
as $$
declare
  lb_return boolean := true;
  ln_count integer;
begin
  if ( pv_username is null )
  then
     raise warning 'Username must not be null';
     lb_return := false;
  end if;
  if ( pv_password is null )
  then
     raise warning 'Password must not be null';
     lb_return := false;
  end if;
  -- test if the user already exists
  begin
      select count(*)
        into ln_count
        from pg_user
       where usename = pv_username;
  exception
      when no_data_found then
          -- ok, no user with this name is defined
          null;
      when too_many_rows then
          -- this should really never happen
          raise exception 'You have a huge issue in your catalog';
  end;
  if ( ln_count > 0 )
  then
     raise warning 'The user "%" already exist', pv_username;
     lb_return := false;
  else
      execute 'create user '||pv_username||' with password '||''''||'pv_password'||'''';
  end if;
  return lb_return;
end;
$$ language plpgsql;

Once that function is created:

postgres=# \df
                                   List of functions
 Schema |     Name      | Result data type |        Argument data types         | Type 
--------+---------------+------------------+------------------------------------+------
 public | f_create_user | boolean          | pv_username name, pv_password text | func
(1 row)

… users can be created by calling this function when connected as a user with permissions to do so:

postgres=# select current_user;
 current_user 
--------------
 postgres
(1 row)

postgres=# select f_create_user('test','test');
 f_create_user 
---------------
 t
(1 row)

postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of 
-----------+------------------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test      |                                                            | {}

Trying to execute this function with a user that does not have permissions to create other users will fail:

postgres=# create user a with password 'a';
CREATE ROLE
postgres=# grant EXECUTE on function f_create_user(name,text) to a;
GRANT
postgres=# \c postgres a
You are now connected to database "postgres" as user "a".
postgres=> select f_create_user('test2','test2');
ERROR:  permission denied to create role
CONTEXT:  SQL statement "create user test2 with password 'pv_password'"
PL/pgSQL function f_create_user(name,text) line 35 at EXECUTE

You can make that work by saying that the function should run with the permissions of the user who created the function:

create or replace function f_create_user ( pv_username name
                                         , pv_password text
                                         ) returns boolean
as $$
declare
  lb_return boolean := true;
  ln_count integer;
begin
...
end;
$$ language plpgsql security definer;

From now on our user “a” is allowed to create other users:

postgres=> select current_user;
 current_user 
--------------
 a
(1 row)

postgres=> select f_create_user('test2','test2');
 f_create_user 
---------------
 t
(1 row)

postgres=> \du
                                   List of roles
 Role name |                         Attributes                         | Member of 
-----------+------------------------------------------------------------+-----------
 a         |                                                            | {}
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 test      |                                                            | {}
 test2     |                                                            | {}

Before implementing something like this consider the “Writing SECURITY DEFINER Functions Safely” section in the documentation, there are some points to consider such as this:

postgres=# revoke all on function f_create_user(name,text) from public;
REVOKE

… and correctly setting the search_path.

Cet article Creating PostgreSQL users with a PL/pgSQL function est apparu en premier sur Blog dbi services.

Direct NFS, ODM 4.0 in 12.2: archiver stuck situation after a shutdown abort and restart

Wed, 2019-04-24 12:23

A customer had an interesting case recently. Since Oracle 12.2. he got archiver stuck situations after a shutdown abort and restart. I reproduced the issue and it is caused by direct NFS since running ODM 4.0 (i.e. since 12.2.). The issue also reproduced on 18.5. When direct NFS is enabled then the archiver-process writes to a file with a preceding dot in its name. E.g.


.arch_1_90_985274359.arc

When the file has been fully copied from the online redolog, then it is renamed to not contain the preceding dot anymore. I.e. using the previous example:


arch_1_90_985274359.arc

When I do a “shutdown abort” while the archiver is in process of writing to the archive-file (with the leading dot in its name) and I do restart the database then Oracle is not able to cope with that file. I.e. in the alert-log I do get the following errors:


2019-04-17T10:22:33.190330+02:00
ARC0 (PID:12598): Unable to create archive log file '/arch_backup/gen183/archivelog/arch_1_90_985274359.arc'
2019-04-17T10:22:33.253476+02:00
Errors in file /u01/app/oracle/diag/rdbms/gen183/gen183/trace/gen183_arc0_12598.trc:
ORA-19504: failed to create file "/arch_backup/gen183/archivelog/arch_1_90_985274359.arc"
ORA-17502: ksfdcre:8 Failed to create file /arch_backup/gen183/archivelog/arch_1_90_985274359.arc
ORA-17500: ODM err:File exists
2019-04-17T10:22:33.254078+02:00
ARC0 (PID:12598): Error 19504 Creating archive log file to '/arch_backup/gen183/archivelog/arch_1_90_985274359.arc'
ARC0 (PID:12598): Stuck archiver: inactive mandatory LAD:1
ARC0 (PID:12598): Stuck archiver condition declared

The DB continues to operate normal until it has to overwrite the online redologfile, which has not been fully archived yet. At that point the archiver becomes stuck and modifications on the DB are no longer possible.

When I remove the incomplete archive-file then the DB continues to operate normally:


rm .arch_1_90_985274359.arc

Using a 12.1-Database with ODM 3.0 I didn’t see that behavior. I.e. I could also see an archived redologfile with a preceding dot in its name, but when I shutdown abort and restart then Oracle removed the file itself and there was no archiver problem.

Testcase:

1.) make sure you have direct NFS enabled


cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk dnfs_on

2.) configure a mandatory log archive destination pointing to a NFS-mounted filesystem. E.g.


[root]# mount -t nfs -o rw,bg,hard,rsize=32768,wsize=32768,vers=3,nointr,timeo=600,proto=tcp,suid,nolock,noac nfs_server:/arch_backup /arch_backup
 
SQL> alter system set log_archive_dest_1='location=/arch_backup/gen183/archivelog mandatory reopen=30';

3.) Produce some DML-load on the DB

I created 2 tables t3 and t4 as a copy of all_objects with approx 600’000 rows:


SQL> create table t3 as select * from all_objects;
SQL> insert into t3 select * from t3;
SQL> -- repeat above insert until you have 600K rows in t3
SQL> commit;
SQL> create table t4 as select * from t3;

Run the following PLSQL-block to produce redo:


begin
for i in 1..20 loop
delete from t3;
commit;
insert into t3 select * from t4;
commit;
end loop;
end;
/

4.) While the PLSQL-block of 3.) is running check the archive-files produced in your log archive destination


ls -ltra /arch_backup/gen183/archivelog

Once you see a file created with a preceding dot in its name then shutdown abort the database:


oracle@18cR0:/arch_backup/gen183/archivelog/ [gen183] ls -ltra /arch_backup/gen183/archivelog
total 2308988
drwxr-xr-x. 3 oracle oinstall 23 Apr 17 10:13 ..
-r--r-----. 1 oracle oinstall 2136861184 Apr 24 18:24 arch_1_104_985274359.arc
drwxr-xr-x. 2 oracle oinstall 69 Apr 24 18:59 .
-rw-r-----. 1 oracle oinstall 2090587648 Apr 24 18:59 .arch_1_105_985274359.arc
 
SQL> shutdown abort

5.) If the file with the preceding dot is still there after the shutdown then you reproduced the issue. Just startup the DB and “tail -f” your alert-log-file.


oracle@18cR0:/arch_backup/gen183/archivelog/ [gen183] cdal
oracle@18cR0:/u01/app/oracle/diag/rdbms/gen183/gen183/trace/ [gen183] tail -f alert_gen183.log
...
2019-04-24T19:01:24.775991+02:00
Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 4.0
...
2019-04-24T19:01:43.770196+02:00
ARC0 (PID:8876): Unable to create archive log file '/arch_backup/gen183/archivelog/arch_1_105_985274359.arc'
2019-04-24T19:01:43.790546+02:00
Errors in file /u01/app/oracle/diag/rdbms/gen183/gen183/trace/gen183_arc0_8876.trc:
ORA-19504: failed to create file "/arch_backup/gen183/archivelog/arch_1_105_985274359.arc"
ORA-17502: ksfdcre:8 Failed to create file /arch_backup/gen183/archivelog/arch_1_105_985274359.arc
ORA-17500: ODM err:File exists
ARC0 (PID:8876): Error 19504 Creating archive log file to '/arch_backup/gen183/archivelog/arch_1_105_985274359.arc'
ARC0 (PID:8876): Stuck archiver: inactive mandatory LAD:1
ARC0 (PID:8876): Stuck archiver condition declared
...

This is a serious problem, because it may cause an archiver stuck problem after a crash. I opened a Service Request at Oracle. The SR has been assigned to the ODM-team now. Once I get a resolution I’ll update this Blog.

Cet article Direct NFS, ODM 4.0 in 12.2: archiver stuck situation after a shutdown abort and restart est apparu en premier sur Blog dbi services.

Bringing up an OpenShift playground in AWS

Wed, 2019-04-17 13:24

Before we begin: This is in no way production ready, as the title states. In a production setup you would put the internal registry on a persistent storage, you would probably have more than one master node and you would probably have more than on compute node. Security is not covered at all here. This post is intended to quickly bring up something you can play with, that’s it. In future posts will explore more details of OpenShift. So, lets start.

What I used as a starting point are three t2.xlarge instances:

One of them will be the master, there will be one infrastructure and one compute node. All of them are based on the Red Hat Enterprise Linux 7.5 (HVM) AMI:

Once these three instances are running the most important thing is that you set persistent hostnames (if you do not do this the OpenShift installation will fail):

[root@master ec2-user]$ hostnamectl set-hostname --static master.it.dbi-services.com
[root@master ec2-user]$ echo "preserve_hostname: true" >> /etc/cloud/cloud.cfg

Of course you need to do that on all three hosts. Once that is done, because I have no DNS in my setup, /etc/hosts should be adjusted on all the machines, in my case:

[root@master ec2-user]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.1.167  master master.it.dbi-services.com
10.0.1.110  node1 node1.it.dbi-services.com
10.0.1.13   node2 node2.it.dbi-services.com

As everything is based on RedHat you need to register all the machines:

[root@master ec2-user]$ subscription-manager register
Registering to: subscription.rhsm.redhat.com:443/subscription
Username: xxxxxx
Password: 
The system has been registered with ID: xxxxxxx
The registered system name is: master

Once done, refresh and then list the available subscriptions. There should be at least one which is named like “Red Hat OpenShift”. Having identified the “Pool ID” for that one attach it (on all machines):

[root@master ec2-user]$ subscription-manager refresh
[root@master ec2-user]$ subscription-manager list --available
[root@master ec2-user]$ subscription-manager attach --pool=xxxxxxxxxxxxxxxxxxxxxxxxx

Now you are ready to enable the required repositories (on all machines):

[root@master ec2-user]$ subscription-manager repos --enable="rhel-7-server-rpms" \
    --enable="rhel-7-server-extras-rpms" \
     --enable="rhel-7-server-ose-3.11-rpms" \
     --enable="rhel-7-server-ansible-2.6-rpms"

Repository 'rhel-7-server-rpms' is enabled for this system.
Repository 'rhel-7-server-extras-rpms' is enabled for this system.
Repository 'rhel-7-server-ansible-2.6-rpms' is enabled for this system.
Repository 'rhel-7-server-ose-3.11-rpms' is enabled for this system.

Having the repos enabled the required packages can be installed (on all machines):

[root@master ec2-user]$ yum -y install wget git net-tools bind-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct

Updating all packages to the latest release and rebooting to the potentially new kernel is recommended. As we will be using Docker for this deployment we will install that as well (on all machines):

[root@master ec2-user]$ yum install -y docker
[root@master ec2-user]$ yum update -y
[root@master ec2-user]$ systemctl reboot

Now, that we are up to date and the prerequisites are met we create a new group and a new user. Why that? The complete OpenShift installation is driven by Ansible. You could run all of the installation directly as root, but a better way is to use a dedicated user that has sudo permissions to perform the tasks (on all machines):

[root@master ec2-user]$ useradd -g dbi dbi
[root@master ec2-user]$ useradd -g dbi dbi

As Ansible needs to login to all the machines you will need to setup password-less ssh connections for the user. I am assuming that you know how to do that. If not, please check here.

Several tasks of the OpenShift Ansible playbooks need to be executed as root so the “dbi” user needs permissions to do that (on all machines):

[root@master ec2-user]$ cat /etc/sudoers | grep dbi
dbi	ALL=(ALL)	NOPASSWD: ALL

There is one last preparation step to be executed on the master only: Installing the Ansible playbooks required to bring up OpenShift:

[root@master ec2-user]$ yum -y install openshift-ansible

That’s all the preparation required for this playground setup. As all the installation is Ansible based we need an inventory file on the master:

[dbi@master ~]$ id -a
uid=1001(dbi) gid=1001(dbi) groups=1001(dbi),994(dockerroot) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[dbi@master ~]$ pwd
/home/dbi
[dbi@master ~]$ cat inventory 
# Create an OSEv3 group that contains the masters, nodes, and etcd groups
[OSEv3:children]
masters
nodes
etcd

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=dbi
# If ansible_ssh_user is not root, ansible_become must be set to true
ansible_become=true
become_method = sudo
openshift_deployment_type=openshift-enterprise
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_htpasswd_users={'admin': '$apr1$4ZbKL26l$3eKL/6AQM8O94lRwTAu611', 'developer': '$apr1$4ZbKL26l$3eKL/6AQM8O94lRwTAu611'}
# Registry settings
oreg_url=registry.redhat.io/openshift3/ose-${component}:${version}
oreg_auth_user=dbiservices2800
oreg_auth_password=eIJAy7LsyA
# disable checks
openshift_disable_check=disk_availability,docker_storage,memory_availability

openshift_master_default_subdomain=apps.it.dbi-services.com

# host group for masters
[masters]
master.it.dbi-services.com

# host group for etcd
[etcd]
master.it.dbi-services.com

# host group for nodes, includes region info
[nodes]
master.it.dbi-services.com openshift_node_group_name='node-config-master'
node1.it.dbi-services.com openshift_node_group_name='node-config-compute'
node2.it.dbi-services.com openshift_node_group_name='node-config-infra'

If you need more details about all the variables and host groups used here, please check the OpenShift documentation.

In any case pleas execute the prerequisites playbook before starting with the installation. When that does not run until the end or does show any “failed” tasks then you need to fix something before proceeding:

[dbi@master ~]$ ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml 

PLAY [Fail openshift_kubelet_name_override for new hosts] **********************************************

TASK [Gathering Facts] *********************************************************************************
ok: [master.it.dbi-services.com]
ok: [node1.it.dbi-services.com]

...

PLAY RECAP *********************************************************************************************
localhost                  : ok=11   changed=0    unreachable=0    failed=0   
master.it.dbi-services.com : ok=80   changed=17   unreachable=0    failed=0   
node1.it.dbi-services.com  : ok=56   changed=16   unreachable=0    failed=0   


INSTALLER STATUS ***************************************************************************************
Initialization  : Complete (0:01:40)

When it is fine, install OpenShift:

[dbi@master ~]$ ansible-playbook -i inventory /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml 

That will take some time but at the end your OpenShift cluster should be up and running:

[dbi@master ~]$ oc login -u system:admin
Logged into "https://master:8443" as "system:admin" using existing credentials.

You have access to the following projects and can switch between them with 'oc project ':

  * default
    kube-public
    kube-service-catalog
    kube-system
    management-infra
    openshift
    openshift-ansible-service-broker
    openshift-console
    openshift-infra
    openshift-logging
    openshift-monitoring
    openshift-node
    openshift-sdn
    openshift-template-service-broker
    openshift-web-console

Using project "default".

[dbi@master ~]$ oc get nodes 
NAME                         STATUS    ROLES     AGE       VERSION
master.it.dbi-services.com   Ready     master    1h        v1.11.0+d4cacc0
node1.it.dbi-services.com    Ready     compute   1h        v1.11.0+d4cacc0
node2.it.dbi-services.com    Ready     infra     1h        v1.11.0+d4cacc0

As expected there is one master, one infratructure and one compute node. All the pods in the default namespace should be running fine:

[dbi@master ~]$ oc get pods -n default
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-lmjzs    1/1       Running   0          1h
registry-console-1-n4z5j   1/1       Running   0          1h
router-1-5wl27             1/1       Running   0          1h

All the default Image Streams are there as well:

[dbi@master ~]$ oc get is -n openshift
NAME                                           DOCKER REPO                                                                               TAGS                          UPDATED
apicurito-ui                                   docker-registry.default.svc:5000/openshift/apicurito-ui                                   1.2                           2 hours ago
dotnet                                         docker-registry.default.svc:5000/openshift/dotnet                                         latest,1.0,1.1 + 3 more...    2 hours ago
dotnet-runtime                                 docker-registry.default.svc:5000/openshift/dotnet-runtime                                 2.2,latest,2.0 + 1 more...    2 hours ago
eap-cd-openshift                               docker-registry.default.svc:5000/openshift/eap-cd-openshift                               14.0,15.0,13 + 6 more...      2 hours ago
fis-java-openshift                             docker-registry.default.svc:5000/openshift/fis-java-openshift                             1.0,2.0                       2 hours ago
fis-karaf-openshift                            docker-registry.default.svc:5000/openshift/fis-karaf-openshift                            1.0,2.0                       2 hours ago
fuse-apicurito-generator                       docker-registry.default.svc:5000/openshift/fuse-apicurito-generator                       1.2                           2 hours ago
fuse7-console                                  docker-registry.default.svc:5000/openshift/fuse7-console                                  1.0,1.1,1.2                   2 hours ago
fuse7-eap-openshift                            docker-registry.default.svc:5000/openshift/fuse7-eap-openshift                            1.0,1.1,1.2                   2 hours ago
fuse7-java-openshift                           docker-registry.default.svc:5000/openshift/fuse7-java-openshift                           1.0,1.1,1.2                   2 hours ago
fuse7-karaf-openshift                          docker-registry.default.svc:5000/openshift/fuse7-karaf-openshift                          1.0,1.1,1.2                   2 hours ago
httpd                                          docker-registry.default.svc:5000/openshift/httpd                                          2.4,latest                    2 hours ago
java                                           docker-registry.default.svc:5000/openshift/java                                           8,latest                      2 hours ago
jboss-amq-62                                   docker-registry.default.svc:5000/openshift/jboss-amq-62                                   1.3,1.4,1.5 + 4 more...       2 hours ago
jboss-amq-63                                   docker-registry.default.svc:5000/openshift/jboss-amq-63                                   1.0,1.1,1.2 + 1 more...       2 hours ago
jboss-datagrid73-openshift                     docker-registry.default.svc:5000/openshift/jboss-datagrid73-openshift                     1.0                           
jboss-datavirt63-driver-openshift              docker-registry.default.svc:5000/openshift/jboss-datavirt63-driver-openshift              1.0,1.1                       2 hours ago
jboss-datavirt63-openshift                     docker-registry.default.svc:5000/openshift/jboss-datavirt63-openshift                     1.0,1.1,1.2 + 2 more...       2 hours ago
jboss-decisionserver62-openshift               docker-registry.default.svc:5000/openshift/jboss-decisionserver62-openshift               1.2                           2 hours ago
jboss-decisionserver63-openshift               docker-registry.default.svc:5000/openshift/jboss-decisionserver63-openshift               1.3,1.4                       2 hours ago
jboss-decisionserver64-openshift               docker-registry.default.svc:5000/openshift/jboss-decisionserver64-openshift               1.0,1.1,1.2 + 1 more...       2 hours ago
jboss-eap64-openshift                          docker-registry.default.svc:5000/openshift/jboss-eap64-openshift                          1.7,1.3,1.4 + 6 more...       2 hours ago
jboss-eap70-openshift                          docker-registry.default.svc:5000/openshift/jboss-eap70-openshift                          1.5,1.6,1.7 + 2 more...       2 hours ago
jboss-eap71-openshift                          docker-registry.default.svc:5000/openshift/jboss-eap71-openshift                          1.1,1.2,1.3 + 1 more...       2 hours ago
jboss-eap72-openshift                          docker-registry.default.svc:5000/openshift/jboss-eap72-openshift                          1.0,latest                    2 hours ago
jboss-fuse70-console                           docker-registry.default.svc:5000/openshift/jboss-fuse70-console                           1.0                           2 hours ago
jboss-fuse70-eap-openshift                     docker-registry.default.svc:5000/openshift/jboss-fuse70-eap-openshift                     1.0                           
jboss-fuse70-java-openshift                    docker-registry.default.svc:5000/openshift/jboss-fuse70-java-openshift                    1.0                           2 hours ago
jboss-fuse70-karaf-openshift                   docker-registry.default.svc:5000/openshift/jboss-fuse70-karaf-openshift                   1.0                           2 hours ago
jboss-processserver63-openshift                docker-registry.default.svc:5000/openshift/jboss-processserver63-openshift                1.3,1.4                       2 hours ago
jboss-processserver64-openshift                docker-registry.default.svc:5000/openshift/jboss-processserver64-openshift                1.2,1.3,1.0 + 1 more...       2 hours ago
jboss-webserver30-tomcat7-openshift            docker-registry.default.svc:5000/openshift/jboss-webserver30-tomcat7-openshift            1.1,1.2,1.3                   2 hours ago
jboss-webserver30-tomcat8-openshift            docker-registry.default.svc:5000/openshift/jboss-webserver30-tomcat8-openshift            1.2,1.3,1.1                   2 hours ago
jboss-webserver31-tomcat7-openshift            docker-registry.default.svc:5000/openshift/jboss-webserver31-tomcat7-openshift            1.0,1.1,1.2                   2 hours ago
jboss-webserver31-tomcat8-openshift            docker-registry.default.svc:5000/openshift/jboss-webserver31-tomcat8-openshift            1.0,1.1,1.2                   2 hours ago
jenkins                                        docker-registry.default.svc:5000/openshift/jenkins                                        2,latest,1                    2 hours ago
mariadb                                        docker-registry.default.svc:5000/openshift/mariadb                                        10.1,10.2,latest              2 hours ago
mongodb                                        docker-registry.default.svc:5000/openshift/mongodb                                        2.4,3.2,3.6 + 3 more...       2 hours ago
mysql                                          docker-registry.default.svc:5000/openshift/mysql                                          5.7,latest,5.6 + 1 more...    2 hours ago
nginx                                          docker-registry.default.svc:5000/openshift/nginx                                          1.8,latest,1.10 + 1 more...   2 hours ago
nodejs                                         docker-registry.default.svc:5000/openshift/nodejs                                         8-RHOAR,0.10,6 + 3 more...    2 hours ago
perl                                           docker-registry.default.svc:5000/openshift/perl                                           5.20,5.24,5.16 + 1 more...    2 hours ago
php                                            docker-registry.default.svc:5000/openshift/php                                            5.6,5.5,7.0 + 1 more...       2 hours ago
postgresql                                     docker-registry.default.svc:5000/openshift/postgresql                                     latest,10,9.2 + 3 more...     2 hours ago
python                                         docker-registry.default.svc:5000/openshift/python                                         2.7,3.3,3.4 + 3 more...       2 hours ago
redhat-openjdk18-openshift                     docker-registry.default.svc:5000/openshift/redhat-openjdk18-openshift                     1.0,1.1,1.2 + 2 more...       2 hours ago
redhat-sso70-openshift                         docker-registry.default.svc:5000/openshift/redhat-sso70-openshift                         1.3,1.4                       2 hours ago
redhat-sso71-openshift                         docker-registry.default.svc:5000/openshift/redhat-sso71-openshift                         1.1,1.2,1.3 + 1 more...       2 hours ago
redhat-sso72-openshift                         docker-registry.default.svc:5000/openshift/redhat-sso72-openshift                         1.0,1.1,1.2                   2 hours ago
redis                                          docker-registry.default.svc:5000/openshift/redis                                          3.2,latest                    2 hours ago
rhdm70-decisioncentral-openshift               docker-registry.default.svc:5000/openshift/rhdm70-decisioncentral-openshift               1.0,1.1                       2 hours ago
rhdm70-kieserver-openshift                     docker-registry.default.svc:5000/openshift/rhdm70-kieserver-openshift                     1.0,1.1                       2 hours ago
rhdm71-controller-openshift                    docker-registry.default.svc:5000/openshift/rhdm71-controller-openshift                    1.0,1.1                       2 hours ago
rhdm71-decisioncentral-indexing-openshift      docker-registry.default.svc:5000/openshift/rhdm71-decisioncentral-indexing-openshift      1.0,1.1                       2 hours ago
rhdm71-decisioncentral-openshift               docker-registry.default.svc:5000/openshift/rhdm71-decisioncentral-openshift               1.1,1.0                       2 hours ago
rhdm71-kieserver-openshift                     docker-registry.default.svc:5000/openshift/rhdm71-kieserver-openshift                     1.0,1.1                       2 hours ago
rhdm71-optaweb-employee-rostering-openshift    docker-registry.default.svc:5000/openshift/rhdm71-optaweb-employee-rostering-openshift    1.0,1.1                       2 hours ago
rhdm72-controller-openshift                    docker-registry.default.svc:5000/openshift/rhdm72-controller-openshift                    1.0,1.1                       2 hours ago
rhdm72-decisioncentral-indexing-openshift      docker-registry.default.svc:5000/openshift/rhdm72-decisioncentral-indexing-openshift      1.0,1.1                       2 hours ago
rhdm72-decisioncentral-openshift               docker-registry.default.svc:5000/openshift/rhdm72-decisioncentral-openshift               1.1,1.0                       2 hours ago
rhdm72-kieserver-openshift                     docker-registry.default.svc:5000/openshift/rhdm72-kieserver-openshift                     1.0,1.1                       2 hours ago
rhdm72-optaweb-employee-rostering-openshift    docker-registry.default.svc:5000/openshift/rhdm72-optaweb-employee-rostering-openshift    1.0,1.1                       2 hours ago
rhpam70-businesscentral-indexing-openshift     docker-registry.default.svc:5000/openshift/rhpam70-businesscentral-indexing-openshift     1.0,1.1,1.2                   2 hours ago
rhpam70-businesscentral-monitoring-openshift   docker-registry.default.svc:5000/openshift/rhpam70-businesscentral-monitoring-openshift   1.1,1.2,1.0                   2 hours ago
rhpam70-businesscentral-openshift              docker-registry.default.svc:5000/openshift/rhpam70-businesscentral-openshift              1.0,1.1,1.2                   2 hours ago
rhpam70-controller-openshift                   docker-registry.default.svc:5000/openshift/rhpam70-controller-openshift                   1.0,1.1,1.2                   2 hours ago
rhpam70-kieserver-openshift                    docker-registry.default.svc:5000/openshift/rhpam70-kieserver-openshift                    1.0,1.1,1.2                   2 hours ago
rhpam70-smartrouter-openshift                  docker-registry.default.svc:5000/openshift/rhpam70-smartrouter-openshift                  1.0,1.1,1.2                   2 hours ago
rhpam71-businesscentral-indexing-openshift     docker-registry.default.svc:5000/openshift/rhpam71-businesscentral-indexing-openshift     1.0,1.1                       2 hours ago
rhpam71-businesscentral-monitoring-openshift   docker-registry.default.svc:5000/openshift/rhpam71-businesscentral-monitoring-openshift   1.0,1.1                       2 hours ago
rhpam71-businesscentral-openshift              docker-registry.default.svc:5000/openshift/rhpam71-businesscentral-openshift              1.0,1.1                       2 hours ago
rhpam71-controller-openshift                   docker-registry.default.svc:5000/openshift/rhpam71-controller-openshift                   1.0,1.1                       2 hours ago
rhpam71-kieserver-openshift                    docker-registry.default.svc:5000/openshift/rhpam71-kieserver-openshift                    1.0,1.1                       2 hours ago
rhpam71-smartrouter-openshift                  docker-registry.default.svc:5000/openshift/rhpam71-smartrouter-openshift                  1.0,1.1                       2 hours ago
rhpam72-businesscentral-indexing-openshift     docker-registry.default.svc:5000/openshift/rhpam72-businesscentral-indexing-openshift     1.1,1.0                       2 hours ago
rhpam72-businesscentral-monitoring-openshift   docker-registry.default.svc:5000/openshift/rhpam72-businesscentral-monitoring-openshift   1.0,1.1                       2 hours ago
rhpam72-businesscentral-openshift              docker-registry.default.svc:5000/openshift/rhpam72-businesscentral-openshift              1.0,1.1                       2 hours ago
rhpam72-controller-openshift                   docker-registry.default.svc:5000/openshift/rhpam72-controller-openshift                   1.0,1.1                       2 hours ago
rhpam72-kieserver-openshift                    docker-registry.default.svc:5000/openshift/rhpam72-kieserver-openshift                    1.0,1.1                       2 hours ago
rhpam72-smartrouter-openshift                  docker-registry.default.svc:5000/openshift/rhpam72-smartrouter-openshift                  1.0,1.1                       2 hours ago
ruby                                           docker-registry.default.svc:5000/openshift/ruby                                           2.2,2.3,2.4 + 3 more...       2 hours ago

Happy playing …

Cet article Bringing up an OpenShift playground in AWS est apparu en premier sur Blog dbi services.

WebLogic – Update on the WLST monitoring

Sun, 2019-04-14 11:05

A few years ago, I wrote this blog about a WLST script to monitor a WebLogic Server. At that time, we were managing a Documentum Platform with 115 servers and now, it’s more than 700 servers so I wanted to come back in this blog with an update on the WLST script.

1. Update of the WLST script needed

Over the past two years, we installed a lot of new servers with a lot of new components. Some of these components required us to adapt slightly our monitoring solution to be able to handle the monitoring in the same, efficient way, for all servers of our Platform: we want to have a single solution which fits all cases. The new cases we came accross where WebLogic Clustering as well as EAR Applications.

In the past, we only had WAR files related to Documentum: D2.war, da.war, D2-REST.war, aso… All these WAR files are quite simple to monitor because one “ApplicationRuntimes” equal one “ComponentRuntimes” (I’m talking here about the WLST script from the previous blog). So basically if you want to check the number of open sessions [get(‘OpenSessionsCurrentCount’)] or the total amount of sessions [get(‘SessionsOpenedTotalCount’)], then it’s just one value. EAR files often contain WAR file(s) as well as other components so in this case, you have potentially a lot of “ComponentRuntimes” for each “ApplicationRuntimes”. Therefore, the best way I found to keep having a single monitoring solution for all WebLogic Servers, no matter what application is deployed on it, was to loop on each components and cumulate the number of open (respectively total sessions) for each components and then return that for the application.

In addition to that, we also started to deploy some WebLogic Servers in Cluster so the monitoring script also needed to take that into account. In the previous version, the WLST script supposed that the deployment was a single local Managed Server (local to the AdminServer) so in case of a WLS Cluster, the deployment target can be a cluster and in this case, the WLST script wouldn’t find the correct monitoring value so I had to introduce a check on whether or not the Application is deployed on a cluster and in this case, then I’m selecting the deployment on the local Managed Server that is part of this cluster. We are using the NodeManager Listen Address to know if the Managed Server is a local one so it expects both the NodeManager and the Managed Server to use the same Listen Address.

As a side note, in case you have a WebLogic Cluster that is deploying an Application only on certain machines of the WebLogic Domain (so for example you have 3 machines but a cluster only targets 2 of them), then on the machine(s) where the Application isn’t deployed by the WebLogic Cluster, the monitoring will still try to find the Application on a local Managed Server and it will not succeed. This will still create a log file for this Application with the following content: “CRITICAL – The Managed Server ‘ + appTargetName + ‘ or the Application ‘ + app.getName() + ‘ is not started”. This is expected since the Application isn’t deployed there but it’s then your job to either set the monitoring tool to expect a CRITICAL or just not check this specific log file for this machine.

Finally the last modification I did was using a properties file instead of embedded properties because we are now deploying more and more WebLogic Servers with our silent scripts (takes a few minutes to have a WLS fully installed, configured, with clustering, with SSL, aso…) and it is easier to have a properties file for a WebLogic Domain that is used by our WebLogic Servers as well as by the Monitoring System to know what’s installed, if it’s a cluster, where is the AdminServer, if it’s using t3 or t3s, aso…

2. WebLogic Domain properties file

As mentioned above, we started to use properties file with our silent scripts to describes what is installed on the local server aso… This is an extract of a domain.properties file that we are using:

[weblogic@weblogic_server_01 ~]$ cat /app/weblogic/wlst/domain.properties
...
NM_HOST=weblogic_server_01.dbi-services.com
ADMIN_URL=t3s://weblogic_server_01.dbi-services.com:8443
DOMAIN_NAME=MyDomain
...
CLUSTERS=clusterWS-01:msWS-011,machine-01,weblogic_server_01.dbi-services.com,8080,8081:msWS-012,machine-02,weblogic_server_02.dbi-services.com,8080,8081|clusterWS-02:msWS-021,machine-01,weblogic_server_01.dbi-services.com,8082,8083:msWS-022,machine-02,weblogic_server_02.dbi-services.com,8082,8083
...
[weblogic@weblogic_server_01 ~]$

The parameter “CLUSTERS” in this properties file is composed in the following way:

  • If it’s a WebLogic Domain with Clustering: CLUSTERS=cluster1:ms11,machine11,listen11,http11,https11:ms12,machine12,…|cluster2:ms21,machine21,…:ms22,machine22,…:ms23,machine23,…
    • ms11 and ms12 being 2 Managed Servers part of the cluster cluster1
    • ms21, ms22 and ms23 being 3 Managed Servers part of the cluster cluster2
  • If it’s not a WebLogic Domain with Clustering: CLUSTERS= (equal nothing, it’s empty, not needed)

There are other properties in this domain.properties of ours like the config and key secure files that WebLogic is using (different from the Nagios ones), the NodeManager configuration (port, type, config & key secure files as well) and a few other things about the AdminServer, the list of Managed Servers, aso… But all these properties aren’t needed for the monitoring topic so I’m only showing the ones that make sense.

3. New version of the WLST script

Enough talk, I assume you came here for the WLST script so here it is. I highlighted below what changed compared to the previous version so you can spot easily how the customization was done:

[nagios@weblogic_server_01 ~]$ cat /app/nagios/etc/objects/scripts/MyDomain_check_weblogic.wls
# WLST
# Identification: check_weblogic.wls  v1.2  15/08/2018
#
# File: check_weblogic.wls
# Purpose: check if a WebLogic Server is running properly
# Author: dbi services (Morgan Patou)
# Version: 1.0 23/03/2016
# Version: 1.1 14/06/2018 - re-formatting
# Version: 1.2 15/08/2018 - including cluster & EAR support
#
###################################################

from java.io import File
from java.io import FileOutputStream

import re

properties='/app/weblogic/wlst/domain.properties'

try:
  loadProperties(properties)
except:
  exit()

directory='/app/nagios/etc/objects/scripts'
userConfig=directory + '/' + DOMAIN_NAME + '_configfile.secure'
userKey=directory + '/' + DOMAIN_NAME + '_keyfile.secure'

try:
  connect(userConfigFile=userConfig, userKeyFile=userKey, url=ADMIN_URL)
except:
  exit()

def setOutputToFile(fileName):
  outputFile=File(fileName)
  fos=FileOutputStream(outputFile)
  theInterpreter.setOut(fos)

def setOutputToNull():
  outputFile=File('/dev/null')
  fos=FileOutputStream(outputFile)
  theInterpreter.setOut(fos)

def getLocalServerName(clustername):
  localServerName=""
  for clusterList in CLUSTERS.split('|'):
    found=0
    for clusterMember in clusterList.split(':'):
      if found == 1:
        clusterMemberDetails=clusterMember.split(',')
        if clusterMemberDetails[2] == NM_HOST:
          localServerName=clusterMemberDetails[0]
      if clusterMember == clustername:
        found=1
  return localServerName

while 1:
  domainRuntime()
  for server in domainRuntimeService.getServerRuntimes():
    setOutputToFile(directory + '/wl_threadpool_' + domainName + '_' + server.getName() + '.out')
    cd('/ServerRuntimes/' + server.getName() + '/ThreadPoolRuntime/ThreadPoolRuntime')
    print 'threadpool_' + domainName + '_' + server.getName() + '_OUT',get('ExecuteThreadTotalCount'),get('HoggingThreadCount'),get('PendingUserRequestCount'),get('CompletedRequestCount'),get('Throughput'),get('HealthState')
    setOutputToNull()
    setOutputToFile(directory + '/wl_heapfree_' + domainName + '_' + server.getName() + '.out')
    cd('/ServerRuntimes/' + server.getName() + '/JVMRuntime/' + server.getName())
    print 'heapfree_' + domainName + '_' + server.getName() + '_OUT',get('HeapFreeCurrent'),get('HeapSizeCurrent'),get('HeapFreePercent')
    setOutputToNull()

  try:
    setOutputToFile(directory + '/wl_sessions_' + domainName + '_console.out')
    cd('/ServerRuntimes/AdminServer/ApplicationRuntimes/consoleapp/ComponentRuntimes/AdminServer_/console')
    print 'sessions_' + domainName + '_console_OUT',get('OpenSessionsCurrentCount'),get('SessionsOpenedTotalCount')
    setOutputToNull()
  except WLSTException,e:
    setOutputToFile(directory + '/wl_sessions_' + domainName + '_console.out')
    print 'CRITICAL - The Server AdminServer or the Administrator Console is not started'
    setOutputToNull()

  domainConfig()
  for app in cmo.getAppDeployments():
    domainConfig()
    cd('/AppDeployments/' + app.getName())
    for appTarget in cmo.getTargets():
      if appTarget.getType() == "Cluster":
        appTargetName=getLocalServerName(appTarget.getName())
      else:
        appTargetName=appTarget.getName()
      print appTargetName
      domainRuntime()
      try:
        setOutputToFile(directory + '/wl_sessions_' + domainName + '_' + app.getName() + '.out')
        cd('/ServerRuntimes/' + appTargetName + '/ApplicationRuntimes/' + app.getName())
        openSessions=0
        totalSessions=0
        for appComponent in cmo.getComponentRuntimes():
          result=re.search(appTargetName,appComponent.getName())
          if result != None:
            cd('ComponentRuntimes/' + appComponent.getName())
            try:
              openSessions+=get('OpenSessionsCurrentCount')
              totalSessions+=get('SessionsOpenedTotalCount')
            except WLSTException,e:
              cd('/ServerRuntimes/' + appTargetName + '/ApplicationRuntimes/' + app.getName())
            cd('/ServerRuntimes/' + appTargetName + '/ApplicationRuntimes/' + app.getName())
        print 'sessions_' + domainName + '_' + app.getName() + '_OUT',openSessions,totalSessions
        setOutputToNull()
      except WLSTException,e:
        setOutputToFile(directory + '/wl_sessions_' + domainName + '_' + app.getName() + '.out')
        print 'CRITICAL - The Managed Server ' + appTargetName + ' or the Application ' + app.getName() + ' is not started'
        setOutputToNull()

  java.lang.Thread.sleep(120000)

[nagios@weblogic_server_01 ~]$

 

For all our WAR files, even if the WLST script changed, the outcome is the same since there is only one component and for the EAR files, it will just add all of the open sessions into a global count. Obviously, this doesn’t necessary represent the real number of “user” sessions but it’s an estimation of the load. We do not really care about a specific number but we want to see how the load evolves during the day and we can adjust our thresholds to take into account that it’s not just a single component’s sessions but it’s a global count.

You can obviously tweak the script to match your needs but this is working pretty well for us on all our environments. If you have ideas about what could be updated to make it even better, don’t hesitate to share!

 

Cet article WebLogic – Update on the WLST monitoring est apparu en premier sur Blog dbi services.

Documentum – RCS/CFS installation failure

Sun, 2019-04-14 11:00

A few weeks ago, I had a task to add a new CS into already HA environments (DEV/TEST/PROD) to better support the load on these environments as well as adding a new repository on all Content Servers. These environments were installed a nearly two years ago already so it was really just adding something new into the picture. When doing so, the installation of a new repository on existing Content Servers (CS1 / CS2) was successful and without much trouble (installation in silent obviously so it’s fast & reliable for the CS and RCS) but then the new Remote Content Server (RCS/CFS – CS3) installation, using the same silent scripts, failed for the two existing/old repositories while it succeeded for the new one.

Well actually, the CFS installation didn’t completely fail. The silent installer returned the prompt properly, the repository start/stop scripts were present, the config folder was present, the dm_server_config object was there, aso… So it looked like the installation was successful but, as a best practice, it is really important to always check the log file for a silent installation because it doesn’t show anything on the prompt, even if there are errors. So while checking at the log file after the silent installer returned the prompt, I saw the following:

[dmadmin@content_server_03 ~]$ cd $DM_HOME/install/logs/
[dmadmin@content_server_03 logs]$ cat install.log
15:12:31,830  INFO [main] com.documentum.install.shared.installanywhere.actions.InitializeSharedLibrary - Done InitializeSharedLibrary ...
15:12:31,870  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCfsInitializeImportantServerVariables - The installer is gathering system configuration information.
15:12:31,883  INFO [main] com.documentum.install.server.installanywhere.actions.DiWASilentRemoteServerValidation - Start to verify the password
15:12:33,259  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:12:33,635  INFO [main] com.documentum.fc.client.security.internal.CreateIdentityCredential$MultiFormatPKIKeyPair - generated RSA (2,048-bit strength) mutiformat key pair in 352 ms
15:12:33,667  INFO [main] com.documentum.fc.client.security.internal.CreateIdentityCredential - certificate created for DFC <CN=dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa,O=EMC,OU=Documentum> valid from Fri Feb 01 15:07:33 UTC 2019 to Mon Jan 29 15:12:33 UTC 2029:

15:12:33,668  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:12:33,681  INFO [main] com.documentum.fc.client.security.impl.InitializeKeystoreForDfc - [DFC_SECURITY_IDENTITY_INITIALIZED] Initialized new identity in keystore, DFC alias=dfc, identity=dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa
15:12:33,682  INFO [main] com.documentum.fc.client.security.impl.AuthenticationMgrForDfc - identity for authentication is dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa
15:12:33,687  INFO [main] com.documentum.fc.impl.RuntimeContext - DFC Version is 7.3.0040.0025
15:12:33,939  INFO [Timer-2] com.documentum.fc.client.impl.bof.cache.ClassCacheManager$CacheCleanupTask - [DFC_BOF_RUNNING_CLEANUP] Running class cache cleanup task
15:12:34,717  INFO [main] com.documentum.fc.client.impl.connection.docbase.DocbaseConnection - Object protocol version 2
15:12:34,758  INFO [main] com.documentum.fc.client.security.internal.AuthenticationMgr - new identity bundle <dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa   1549033954      content_server_03.dbi-services.com         hicAAvU7QX3VNvDft2PwmnW4SIFX+5Snx7PlA5hryuOpo2eWLcEANYAEwYBbU6F3hEBAMenRR/lXFrHFqlrxTZl54whGL+9VnH6CCEu4x8dxdQ+QLRE3EtLlO31SPNhqkzjyVwhktNuivhiZkxweDNynvk+pDleTPvzUvF0YSoggcoiEq+kGr6/c9vUPOMuuv1k7PR1AO05JHmu7vea9/UBaV+TFA6/cGRwVh5i5D2s1Ws7qiDlBl4R+Wp3+TbNLPjbn/SeOz5ZSjAmXThK0H0RXwbcwHo9bVm0Hzu/1n7silII4ZzjAW7dd5Jvbxb66mxC8NWaNabPksus2mTIBhg==>
15:12:35,002  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:12:35,119  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: false
15:12:36,317  INFO [main] com.documentum.fc.client.privilege.impl.PublicKeyCertificate - stored certificate for CN
15:12:36,353  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling in GR_DocBase a new record with this persistent certificate:
-----BEGIN CERTIFICATE-----
MIIDHzCCAgcCELGIh8FYcycggMmImLESjEYwDQYJKoZIhvcNAQELBQAwTjETMBEG
YXZxbFJuN1lRZFlUTXRQNnBWNnpRY3JBYTAeFw0xOTAyMDExNTA3MzNaFw0yOTAx
MjkxNTEyMzNaME4xEzARBgNVBAsMCkRvY3VtZW50dW0xDDAKBgNVBAoMA0VNQzEp
hKnQmaMo/wCv+QXZTCsitrBNvoomcT82mYzwIxV5/7cPCIHHMcJijsJCtunjiucV
MCcGA1UEAwwgZGZjX1VuSWF2cWxSbjdZUWRZVE10UDZwVjZ6UWNyQWEwggEiMA0G
HcL0KUImSV7owDqKzV3lEYCGdomX4gYTI5bMKAiTEuGyWRKw2YTQGhfp5y0mU0hV
ORTYyRoGjpRUuXWpdrsrbX8g8gD9l6ijWTSIWfTGO/7//mTHp2zwp/TiIEuAS+RA
eFw1pBLSCKneYgquMuiyFfuCfBVNY5Q0MzyPHYxrDAp4CtjasIrNT5h3AgMBAAEw
CSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC4Hli+niUAD0ksVVWocPnvzV10ZOj2
DQYJKoZIhvcNAQELBQADggEBAEAre45NEpqzGMMYX1zpjgib9wldSmiPVDZbhj17
KnUCgDy7FhFQ5U5w6wf2iO9UxGV42AYQe2TjED0EbYwpYB8DC970J2ZrjZRFMy/Y
A1UECwwKRG9jdW1lbnR1bTEMMAoGA1UECgwDRU1DMSkwJwYDVQQDDCBkZmNfVW5J
gwKynVf9O10GQP0a8Z6Fr3jrtCEzfLjOXN0VxEcgwOEKRWHM4auxjevqGCPegD+y
FVWwylyIsMRsC9hOxoNHZPrbhk3N9Syhqsbl+Z9WXG0Sp4uh1z5R1NwVhR7YjZkF
19cfN8uEHqedJo26lq7oFF2KLJ+/8sWrh2a6lrb4fNXYZIAaYKjAjsUzcejij8en
Rd8yvghCc4iwWvpiRg9CW0VF+dXg6KkQmaFjiGrVosskUjuACHncatiYC5lDNJy+
TDdnNWYlctfWcT8WL/hX6FRGedT9S30GShWJNobM9vECoNg=
-----END CERTIFICATE-----
15:12:36,355  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: false
15:12:36,535  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling a new registration record for dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa
15:12:36,563  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - [DFC_SECURITY_GR_REGISTRATION_PUBLISH] this dfc instance is now published in the global registry GR_DocBase
15:12:37,513  INFO [main] com.documentum.fc.client.impl.connection.docbase.DocbaseConnection - Object protocol version 2
15:12:38,773  INFO [main] com.documentum.fc.client.impl.connection.docbase.DocbaseConnection - Object protocol version 2
15:12:39,314  INFO [main] com.documentum.install.shared.common.services.dfc.DiDfcProperties - Installer is adding it as primary connection broker and moves existing primary as backup.
15:12:41,643  INFO [main]  - The installer updates dfc.properties file.
15:12:41,644  INFO [main] com.documentum.install.shared.common.services.dfc.DiDfcProperties - Installer is adding it as primary connection broker and moves existing primary as backup.
15:12:41,649  INFO [main] com.documentum.install.server.installanywhere.actions.DiWAServerEnableLockBoxValidation - The installer will validate AEK/Lockbox fileds.
15:12:41,656  INFO [main] com.documentum.install.shared.common.services.dfc.DiDfcProperties - Installer is changing primary as backup and backup as primary.
15:12:43,874  INFO [main]  - The installer updates dfc.properties file.
15:12:43,874  INFO [main] com.documentum.install.shared.common.services.dfc.DiDfcProperties - Installer is changing primary as backup and backup as primary.
15:12:43,876  INFO [main]  - The installer is creating folders for the selected repository.
15:12:43,876  INFO [main]  - Checking if cfs is being installed on the primary server...
15:12:43,877  INFO [main]  - CFS is not being installed on the primary server
15:12:43,877  INFO [main]  - Installer creates necessary directory structure.
15:12:43,879  INFO [main]  - Installer copies aek.key, server.ini, dbpasswd.txt and webcache.ini files from primary server.
15:12:43,881  INFO [main]  - Installer executes dm_rcs_copyfiles.ebs to get files from primary server
15:12:56,295  INFO [main]  - $DOCUMENTUM/dba/config/DocBase1/dbpasswd.txt has been created successfully
15:12:56,302  INFO [main]  - $DOCUMENTUM/dba/config/DocBase1/webcache.ini has been created successfully
15:12:56,305  INFO [main]  - Installer found exising file $DOCUMENTUM/dba/secure/lockbox.lb
15:12:56,305  INFO [main]  - Installer renamed exising file $DOCUMENTUM/dba/secure/lockbox.lb to $DOCUMENTUM/dba/secure/lockbox.lb.bak.3
15:12:56,306  INFO [main]  - $DOCUMENTUM/dba/secure/lockbox.lb has been created successfully
15:12:56,927  INFO [main]  - $DOCUMENTUM/dba/config/DocBase1/server_content_server_03_DocBase1.ini has been created successfully
15:12:56,928  INFO [main]  - Installer found exising file $DOCUMENTUM/dba/castore_license
15:12:56,928  INFO [main]  - Installer renamed exising file $DOCUMENTUM/dba/castore_license to $DOCUMENTUM/dba/castore_license.bak.3
15:12:56,928  INFO [main]  - $DOCUMENTUM/dba/castore_license has been created successfully
15:12:56,931  INFO [main]  - $DOCUMENTUM/dba/config/DocBase1/ldap_080f123450006deb.cnt has been created successfully
15:12:56,934  INFO [main]  - Installer updates server.ini
15:12:56,940  INFO [main]  - The installer tests database connection.
15:12:57,675  INFO [main]  - Database successfully opened.
Test table successfully created.
Test view successfully created.
Test index successfully created.
Insert into table successfully done.
Index successfully dropped.
View successfully dropped.
Database case sensitivity test successfully past.
Table successfully dropped.
15:13:00,675  INFO [main]  - The installer creates server config object.
15:13:00,853  INFO [main]  - The installer is starting a process for the repository.
15:13:01,993  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:13:03,079  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:13:04,149  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:13:05,187  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:13:06,256  INFO [main] com.documentum.install.multinode.cfs.installanywhere.actions.DiWAServerCreateContentFileServerPostSeq - logPath is $DOCUMENTUM/dba/log/content_server_03_DocBase1.log
15:14:06,352  INFO [main]  - Waiting for repository DocBase1.content_server_03_DocBase1 to start up.
15:14:25,003  INFO [main] com.documentum.fc.client.impl.connection.docbase.DocbaseConnection - Object protocol version 2
15:14:25,495  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:14:25,498  INFO [main] com.documentum.fc.client.security.impl.JKSKeystoreUtilForDfc - keystore file name is /tmp/655905.tmp/dfc.keystore
15:14:25,513  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: true
15:14:25,672  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - assigning rights to all roles for this client on DocBase1
15:14:25,682  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - found client rights: false
15:14:25,736  INFO [main] com.documentum.fc.client.privilege.impl.PublicKeyCertificate - stored certificate for CN
15:14:25,785  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling in DocBase1 a new record with this persistent certificate:
-----BEGIN CERTIFICATE-----
MIIDHzCCAgcCELGIh8FYcycggMmImLESjEYwDQYJKoZIhvcNAQELBQAwTjETMBEG
YXZxbFJuN1lRZFlUTXRQNnBWNnpRY3JBYTAeFw0xOTAyMDExNTA3MzNaFw0yOTAx
MjkxNTEyMzNaME4xEzARBgNVBAsMCkRvY3VtZW50dW0xDDAKBgNVBAoMA0VNQzEp
hKnQmaMo/wCv+QXZTCsitrBNvoomcT82mYzwIxV5/7cPCIHHMcJijsJCtunjiucV
MCcGA1UEAwwgZGZjX1VuSWF2cWxSbjdZUWRZVE10UDZwVjZ6UWNyQWEwggEiMA0G
HcL0KUImSV7owDqKzV3lEYCGdomX4gYTI5bMKAiTEuGyWRKw2YTQGhfp5y0mU0hV
ORTYyRoGjpRUuXWpdrsrbX8g8gD9l6ijWTSIWfTGO/7//mTHp2zwp/TiIEuAS+RA
eFw1pBLSCKneYgquMuiyFfuCfBVNY5Q0MzyPHYxrDAp4CtjasIrNT5h3AgMBAAEw
CSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC4Hli+niUAD0ksVVWocPnvzV10ZOj2
DQYJKoZIhvcNAQELBQADggEBAEAre45NEpqzGMMYX1zpjgib9wldSmiPVDZbhj17
KnUCgDy7FhFQ5U5w6wf2iO9UxGV42AYQe2TjED0EbYwpYB8DC970J2ZrjZRFMy/Y
A1UECwwKRG9jdW1lbnR1bTEMMAoGA1UECgwDRU1DMSkwJwYDVQQDDCBkZmNfVW5J
gwKynVf9O10GQP0a8Z6Fr3jrtCEzfLjOXN0VxEcgwOEKRWHM4auxjevqGCPegD+y
FVWwylyIsMRsC9hOxoNHZPrbhk3N9Syhqsbl+Z9WXG0Sp4uh1z5R1NwVhR7YjZkF
19cfN8uEHqedJo26lq7oFF2KLJ+/8sWrh2a6lrb4fNXYZIAaYKjAjsUzcejij8en
Rd8yvghCc4iwWvpiRg9CW0VF+dXg6KkQmaFjiGrVosskUjuACHncatiYC5lDNJy+
TDdnNWYlctfWcT8WL/hX6FRGedT9S30GShWJNobM9vECoNg=
-----END CERTIFICATE-----
15:14:25,789  INFO [main] com.documentum.fc.client.security.impl.DfcIdentityPublisher - found client registration: true
15:14:25,802  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - found client rights: false
15:14:25,981  INFO [main] com.documentum.fc.client.security.impl.IpAndRcHelper - filling a new rights record for dfc_UnYQdYTP6pV6zRn7tQMIavqlcrAa
15:14:26,032  INFO [main] com.documentum.fc.client.security.impl.DfcRightsCreator - [DFC_SECURITY_DOCBASE_RIGHTS_REGISTER] this dfc instance has now escalation rights registered with docbase DocBase1
15:14:26,052  INFO [main] com.documentum.install.appserver.jboss.JbossApplicationServer - setApplicationServer sharedDfcLibDir is:$DOCUMENTUM/shared/dfc
15:14:26,052  INFO [main] com.documentum.install.appserver.jboss.JbossApplicationServer - getFileFromResource for templates/appserver.properties
15:14:26,059  INFO [main] com.documentum.install.server.installanywhere.actions.DiWAServerAddDocbaseEntryToWebXML - BPM webapp does not exist.
15:14:26,191  INFO [main] com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2 - Executing the Docbase HeadStart script.
15:14:36,202  INFO [main] com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2 - Executing the Creates ACS config object script.
15:14:46,688  INFO [main] com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2 - Executing the This script does miscellaneous setup tasks for remote content servers script.
15:14:56,840 ERROR [main] com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2 - The installer failed to execute the This script does miscellaneous setup tasks for remote content servers script. For more information, please read output file: $DOCUMENTUM/dba/config/DocBase1/dm_rcs_setup.out.
com.documentum.install.shared.common.error.DiException: The installer failed to execute the This script does miscellaneous setup tasks for remote content servers script. For more information, please read output file: $DOCUMENTUM/dba/config/DocBase1/dm_rcs_setup.out.
        at com.documentum.install.server.installanywhere.actions.cfs.DiWAServerProcessingScripts2.setup(DiWAServerProcessingScripts2.java:98)
        at com.documentum.install.shared.installanywhere.actions.InstallWizardAction.install(InstallWizardAction.java:75)
        at com.zerog.ia.installer.actions.CustomAction.installSelf(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.an(Unknown Source)
        at com.zerog.ia.installer.ConsoleBasedAAMgr.ac(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.am(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.runNextInstallPiece(Unknown Source)
        ...
        at com.zerog.ia.installer.ConsoleBasedAAMgr.ac(Unknown Source)
        at com.zerog.ia.installer.AAMgrBase.runPreInstall(Unknown Source)
        at com.zerog.ia.installer.LifeCycleManager.consoleInstallMain(Unknown Source)
        at com.zerog.ia.installer.LifeCycleManager.executeApplication(Unknown Source)
        at com.zerog.ia.installer.Main.main(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.zerog.lax.LAX.launch(Unknown Source)
        at com.zerog.lax.LAX.main(Unknown Source)
15:14:56,843  INFO [main]  - The INSTALLER_UI value is SILENT
15:14:56,843  INFO [main]  - The KEEP_TEMP_FILE value is true
15:14:56,843  INFO [main]  - The common.installOwner.password value is ******
15:14:56,843  INFO [main]  - The SERVER.SECURE.ROOT_PASSWORD value is ******
15:14:56,843  INFO [main]  - The common.upgrade.aek.lockbox value is null
15:14:56,843  INFO [main]  - The common.old.aek.passphrase.password value is null
15:14:56,843  INFO [main]  - The common.aek.algorithm value is AES_256_CBC
15:14:56,843  INFO [main]  - The common.aek.passphrase.password value is ******
15:14:56,843  INFO [main]  - The common.aek.key.name value is CSaek
15:14:56,843  INFO [main]  - The common.use.existing.aek.lockbox value is null
15:14:56,843  INFO [main]  - The SERVER.ENABLE_LOCKBOX value is true
15:14:56,844  INFO [main]  - The SERVER.LOCKBOX_FILE_NAME value is lockbox.lb
15:14:56,844  INFO [main]  - The SERVER.LOCKBOX_PASSPHRASE.PASSWORD value is ******
15:14:56,844  INFO [main]  - The SERVER.COMPONENT_ACTION value is CREATE
15:14:56,844  INFO [main]  - The SERVER.DOCBROKER_ACTION value is null
15:14:56,844  INFO [main]  - The SERVER.PRIMARY_CONNECTION_BROKER_HOST value is content_server_01.dbi-services.com
15:14:56,844  INFO [main]  - The SERVER.PRIMARY_CONNECTION_BROKER_PORT value is 1489
15:14:56,844  INFO [main]  - The SERVER.PROJECTED_CONNECTION_BROKER_HOST value is content_server_03.dbi-services.com
15:14:56,844  INFO [main]  - The SERVER.PROJECTED_CONNECTION_BROKER_PORT value is 1489
15:14:56,844  INFO [main]  - The SERVER.FQDN value is content_server_03.dbi-services.com
15:14:56,845  INFO [main]  - The SERVER.DOCBASE_NAME value is DocBase1
15:14:56,845  INFO [main]  - The SERVER.PRIMARY_SERVER_CONFIG_NAME value is DocBase1
15:14:56,845  INFO [main]  - The SERVER.REPOSITORY_USERNAME value is dmadmin
15:14:56,845  INFO [main]  - The SERVER.SECURE.REPOSITORY_PASSWORD value is ******
15:14:56,845  INFO [main]  - The SERVER.REPOSITORY_USER_DOMAIN value is
15:14:56,845  INFO [main]  - The SERVER.REPOSITORY_USERNAME_WITH_DOMAIN value is dmadmin
15:14:56,845  INFO [main]  - The SERVER.REPOSITORY_HOSTNAME value is content_server_01.dbi-services.com
15:14:56,845  INFO [main]  - The SERVER.CONNECTION_BROKER_NAME value is null
15:14:56,845  INFO [main]  - The SERVER.CONNECTION_BROKER_PORT value is null
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_NAME value is
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_PORT value is
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_CONNECT_MODE value is null
15:14:56,846  INFO [main]  - The SERVER.USE_CERTIFICATES value is false
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_KEYSTORE_FILE_NAME value is null
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_KEYSTORE_PASSWORD_FILE_NAME value is null
15:14:56,846  INFO [main]  - The SERVER.DOCBROKER_CIPHER_LIST value is null
15:14:56,853  INFO [main]  - The SERVER.DFC_SSL_TRUSTSTORE value is null
15:14:56,853  INFO [main]  - The SERVER.DFC_SSL_TRUSTSTORE_PASSWORD value is ******
15:14:56,853  INFO [main]  - The SERVER.DFC_SSL_USE_EXISTING_TRUSTSTORE value is null
15:14:56,853  INFO [main]  - The SERVER.CONNECTION_BROKER_SERVICE_STARTUP_TYPE value is null
15:14:56,854  INFO [main]  - The SERVER.DOCUMENTUM_DATA value is $DATA
15:14:56,854  INFO [main]  - The SERVER.DOCUMENTUM_SHARE value is $DOCUMENTUM/share
15:14:56,854  INFO [main]  - The CFS_SERVER_CONFIG_NAME value is content_server_03_DocBase1
15:14:56,854  INFO [main]  - The SERVER.DOCBASE_SERVICE_NAME value is DocBase1
15:14:56,854  INFO [main]  - The CLIENT_CERTIFICATE value is null
15:14:56,854  INFO [main]  - The RKM_PASSWORD value is ******
15:14:56,854  INFO [main]  - The SERVER.DFC_BOF_GLOBAL_REGISTRY_VALIDATE_OPTION_IS_SELECTED value is null
15:14:56,854  INFO [main]  - The SERVER.PROJECTED_DOCBROKER_PORT_OTHER value is null
15:14:56,854  INFO [main]  - The SERVER.PROJECTED_DOCBROKER_HOST_OTHER value is null
15:14:56,854  INFO [main]  - The SERVER.GLOBAL_REGISTRY_REPOSITORY value is null
15:14:56,854  INFO [main]  - The SERVER.BOF_REGISTRY_USER_LOGIN_NAME value is null
15:14:56,855  INFO [main]  - The SERVER.SECURE.BOF_REGISTRY_USER_PASSWORD value is ******
15:14:56,855  INFO [main]  - The SERVER.COMPONENT_ACTION value is CREATE
15:14:56,855  INFO [main]  - The SERVER.COMPONENT_NAME value is null
15:14:56,855  INFO [main]  - The SERVER.DOCBASE_NAME value is DocBase1
15:14:56,855  INFO [main]  - The SERVER.CONNECTION_BROKER_NAME value is null
15:14:56,855  INFO [main]  - The SERVER.CONNECTION_BROKER_PORT value is null
15:14:56,855  INFO [main]  - The SERVER.PROJECTED_CONNECTION_BROKER_HOST value is content_server_03.dbi-services.com
15:14:56,855  INFO [main]  - The SERVER.PROJECTED_CONNECTION_BROKER_PORT value is 1489
15:14:56,855  INFO [main]  - The SERVER.PRIMARY_SERVER_CONFIG_NAME value is DocBase1
15:14:56,855  INFO [main]  - The SERVER.DOCBROKER_NAME value is
15:14:56,856  INFO [main]  - The SERVER.DOCBROKER_PORT value is
15:14:56,856  INFO [main]  - The SERVER.CONNECTION_BROKER_SERVICE_STARTUP_TYPE value is null
15:14:56,856  INFO [main]  - The SERVER.REPOSITORY_USERNAME value is dmadmin
15:14:56,856  INFO [main]  - The SERVER.REPOSITORY_PASSWORD value is ******
15:14:56,856  INFO [main]  - The SERVER.REPOSITORY_USER_DOMAIN value is
15:14:56,856  INFO [main]  - The SERVER.REPOSITORY_USERNAME_WITH_DOMAIN value is dmadmin
15:14:56,856  INFO [main]  - The SERVER.DFC_BOF_GLOBAL_REGISTRY_VALIDATE_OPTION_IS_SELECTED_KEY value is null
15:14:56,856  INFO [main]  - The SERVER.PROJECTED_DOCBROKER_PORT_OTHER value is null
15:14:56,856  INFO [main]  - The SERVER.PROJECTED_DOCBROKER_HOST_OTHER value is null
15:14:56,856  INFO [main]  - The SERVER.GLOBAL_REGISTRY_REPOSITORY value is null
15:14:56,856  INFO [main]  - The SERVER.BOF_REGISTRY_USER_LOGIN_NAME value is null
15:14:56,856  INFO [main]  - The SERVER.SECURE.BOF_REGISTRY_USER_PASSWORD value is ******
15:14:56,856  INFO [main]  - The SERVER.COMPONENT_ACTION value is CREATE
15:14:56,857  INFO [main]  - The SERVER.COMPONENT_NAME value is null
15:14:56,857  INFO [main]  - The SERVER.PRIMARY_SERVER_CONFIG_NAME value is DocBase1
15:14:56,857  INFO [main]  - The SERVER.DOCBASE_NAME value is DocBase1
15:14:56,857  INFO [main]  - The SERVER.REPOSITORY_USERNAME value is dmadmin
15:14:56,857  INFO [main]  - The SERVER.REPOSITORY_PASSWORD value is ******
15:14:56,857  INFO [main]  - The SERVER.REPOSITORY_USER_DOMAIN value is
15:14:56,857  INFO [main]  - The SERVER.REPOSITORY_USERNAME_WITH_DOMAIN value is dmadmin
15:14:56,857  INFO [main]  - The env PATH value is: /usr/xpg4/bin:$DOCUMENTUM/shared/java64/JAVA_LINK/bin:$DM_HOME/bin:$DOCUMENTUM/dba:$ORACLE_HOME/bin:$DOCUMENTUM/shared/java64/JAVA_LINK/bin:$DM_HOME/bin:$DOCUMENTUM/dba:$ORACLE_HOME/bin:$DM_HOME/bin:$ORACLE_HOME/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/dmadmin/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin
[dmadmin@content_server_03 logs]$

 

As you can see above, everything was going well until the script “This script does miscellaneous setup tasks for remote content servers” is executed. Yes that is a hell of a description, isn’t it? What this script is doing is actually running the “dm_rcs_setup.ebs” script (you can find it under $DM_HOME/install/admin/) on the repository to setup the remote jobs, project the RCS/CFS repository to the local docbroker, create the log folder and a few other things. Here was the content of the output file for the execution of this EBS:

[dmadmin@content_server_03 logs]$ cat $DOCUMENTUM/dba/config/DocBase1/dm_rcs_setup.out
Running dm_rcs_setup.ebs script on docbase DocBase1.content_server_03_DocBase1 to set up jobs for a remote content server.
docbaseNameOnly = DocBase1
Connected To DocBase1.content_server_03_DocBase1
$DOCUMENTUM/dba/log/000f1234/sysadmin was created.
Duplicating distributed jobs.
Creating job object for dm_ContentWarningcontent_server_03_DocBase1
Successfully created job object for dm_ContentWarningcontent_server_03_DocBase1
Creating job object for dm_LogPurgecontent_server_03_DocBase1
Successfully created job object for dm_LogPurgecontent_server_03_DocBase1
Creating job object for dm_ContentReplicationcontent_server_03_DocBase1
Successfully created job object for dm_ContentReplicationcontent_server_03_DocBase1
Creating job object for dm_DMCleancontent_server_03_DocBase1
The dm_DMClean job does not exist at the primary server so we will not create it at the remote site, either.
Failed to create job object for dm_DMCleancontent_server_03_DocBase1
[DM_API_E_BADID]error:  "Bad ID given: 0000000000000000"

[DM_API_E_BADID]error:  "Bad ID given: 0000000000000000"

[DM_API_E_BADID]error:  "Bad ID given: 0000000000000000"

[DM_API_E_BADID]error:  "Bad ID given: 0000000000000000"

[DM_API_E_NO_MATCH]error:  "There was no match in the docbase for the qualification: dm_job where object_name = 'dm_DMClean' and lower(target_server) like lower('DocBase1.DocBase1@%')"


Exiting with return code (-1)
[dmadmin@content_server_03 logs]$
[dmadmin@content_server_03 logs]$

 

The RCS/CFS installation is failing because the creation of a remote job cannot complete successfully. It’s working properly for 3 out of the 5 remote jobs but not for the 2 remaining. Only one is shown in the log file because it didn’t even try to process the 2nd one since it failed already and therefore stopped the installation here. That’s why the start/stop scripts were there, the log folder was there, the dm_server_config was ok as well but there were some missing pieces actually.

The issue here is that the RCS/CFS installation isn’t able to find the r_object_id of the “dm_DMClean” job (it mention “Bad ID given: 0000000000000000”) and therefore it’s not able to create the remote job. The last message is actually more interesting: “There was no match in the docbase for the qualification: dm_job where object_name = ‘dm_DMClean’ and lower(target_server) like lower(‘DocBase1.DocBase1@%’)”.

The RCS/CFS installation is actually looking at the job with the name ‘dm_DMClean’, which is OK but it is also filtering only on the target_server which is equal to ‘docbase_name.server_config_name@…’ and here, it’s not finding any result.

 

So what happened? Like I was saying in the introduction, this environment was already installed several years ago in HA already. As a result of that, the jobs were already configured by us as we would expect them. Usually, we are configuring the jobs as follow (I’m only talking about the distributed jobs here):

Job Name on CS1 Job Status on CS1 Job Name on RCS% Job Status on RCS% dm_ContentWarning Active dm_ContentWarning% Inactive dm_LogPurge Active dm_LogPurge% Active dm_DMClean Active dm_DMClean% Inactive dm_DMFilescan Active dm_DMFilescan% Inactive dm_ContentReplication Inactive dm_ContentReplication% Inactive

Based on this, we usually disable the dm_ContentReplication completely (if it’s not needed), we obviously leave the dm_LogPurge enabled (all of them) with the target_server set to the local CS it is supposed to run into (so 1 job per CS). Then for the 3 remaining jobs, it depends on the load of the environment. These jobs can be set to run on the CS1 by setting the target_server equal to ‘DocBase1.DocBase1@content_server_03.dbi-services.com’ or you can set them to run on ANY Content Server by setting an empty target_server (a single space: ‘ ‘). It doesn’t matter where they are running but it is important for these jobs to run and hence the setting to ANY available Content Server is better so it’s not bound to a single point of failure.

So the reason why the RCS/CFS installation failed is because we configured our jobs properly… Funny, right? As you could see in the logs, the dm_ContentWarning was created properly but that was because someone was doing some testing with this job and it was temporarily set to run on the CS1 only and therefore, when the installer checked it, it was a coincidence/luck that it could find it.

After the failure, there is normally not much done except creating the JMS config object, checking the ACS URLs and finally restarting the JMS but still, it is cleaner to just remove the RCS/CFS, clean the repository objects still remaining (the distributed jobs that were created) and then reinstalling the RCS/CFS after setting the jobs as the installer expects them to be…

 

Cet article Documentum – RCS/CFS installation failure est apparu en premier sur Blog dbi services.

How to stop Documentum processes in a docker container, and more (part I)

Sat, 2019-04-13 07:00
How to stop Documentum processes in a docker container, and more

Ideally, but not mandatorily, the management of Documentum processes is performed at the service level, e.g. by systemd. In my blog here, I showed how to configure init files for Documentum under systemd. But containers don’t have systemd, yet. They just run processes, often only one, sometimes more if they are closely related together (e.g. the docbroker, the method server and the content servers), so how to replicate the same functionality with containers ?
The topic of stopping processes in a docker container is abundantly discussed on-line (see for example the excellent article here). O/S signals are the magic solution so much so that I should have entitled this blog “Fun with the Signals” really !
I’ll simply see here if the presented approach can be applied in the particular case of a dockerized Documentum server. However, in order to keep things simple and quick, I won’t test such a real dockerized Documentum installation but rather use a script to simulate the Documentum processes, or any other processes at that since it is so generic.
But first, why bother with this matter ? During all the years that I have been administrating repositories I’ve never noticed anything going wrong after restarting a suddenly stopped server, be it after an intentional kill, a pesky crash or an unplanned database unavailability. Evidently, the content server (CS henceforth) seems quite robust in this respect. Or maybe we were simply lucky so far. Personally, I don’t feel confident if I don’t shut down cleanly a process or service that must be stopped; some data might be still buffered in the CS’ memory and not flushing them properly might introduce inconsistencies or even corruptions. The same goes when an unsuspected multi-step operation is started and aborted abruptly in the middle; ideally, transactions, if they are used, exist for this purpose but anything can go wrong during rollback. Killing a process is like slamming a door, it produces a lot of noise, vibrations in the walls, even damages in the long run and always leaves a bad impression behind. Isn’t it more comforting to clean up and shut the door gently ? Even then something can go wrong but at least it will be through no fault of our own.

A few Reminders

When a “docker container stop” is issued, docker sends the SIGTERM signal to the process with PID == 1 running inside the container. That process, if programmed to do so, can then react to the signal and do anything seen fit, typically shutting the running processes down cleanly. After a 10 seconds grace period, the container is stopped manu militari. In the case of Documentum processes, to put it politely, they don’t give a hoot to signals, except of course to the well-known, unceremonious SIGKILL one. Thus, a proxy process must be introduced which will accept the signal and invoke the proper shutdown scripts to stop the CS processes, usually the dm_shutdown_* and dm_stop_* scripts, or a generic one that takes care of everything, at start up and at shut down time.
Said proxy must run with PID == 1 i.e. it must be the first one started in the container. Sort of, but even if it is not the very first, its PID 1 parent can pass it the control by using one of the exec() family functions; unlike forking a process, those in effect allow a child process to replace its parent under the latter’s PID, kind of like in the Matrix movies the agents Smith inject themselves into someone else’s persona, if you will ;-). The main thing being that at one point the proxy becomes PID 1. Luckily for us, we don’t have to bother with this complexity for the dockerfile’s ENTRYPOINT[] clause takes care of everything.
The proxy also will be the one that starts the CS. In addition, since it must wait for the SIGTERM signal, it must never exit. It can indefinitely wait listening on a fake input (e.g. tail -f /dev/null), or wait for an illusory input in a detached container (e.g. while true; do read; done) or, better yet, do something useful like some light-weight monitoring.
While at it, the proxy process can listen to several conventional signals and react accordingly. For instance, a SIGUSR1 could mean “give me a docbroker docbase map” and a SIGUSR2 “restart the method server”. Admittedly, these actions could be done directly by just executing the relevant commands inside the container or from the outside command-line but the signal way is cheaper and, OK, funnier. So, let’s see how we can set all this up !

The implementation

As said, in order to focus on our topic, i.e. signal trapping, we’ve replaced the CS part with a simple simulation script, dctm.sh, that starts, stops and queries the status of dummy processes. It uses the bash shell and has been written under linux. Here it is:

#!/bin/bash
# launches in the background, or stops or queries the status of, a command with a conventional identification string in the prompt;
# the id is a random number determined during the start;
# it should be passed to enquiry the status of the started process or to stop it;
# Usage:
#   ./dctm.sh stop  | start | status 
# e.g.:
# $ ./dctm.sh start
# | process started with pid 13562 and random value 33699963
# $ psg 33699963
# | docker   13562     1  0 23:39 pts/0    00:00:00 I am number 33699963
# $ ./dctm.sh status 33699963
# $ ./dctm.sh stop 33699963
#
# cec - dbi-services - April 2019
#
trap 'help'         SIGURG
trap 'start_all'    SIGPWR
trap 'start_one'    SIGUSR1
trap 'status_all'   SIGUSR2
trap 'stop_all'     SIGINT SIGABRT
trap 'shutdown_all' SIGHUP SIGQUIT SIGTERM

verb="sleep"
export id_prefix="I am number"

func() {
   cmd="$1"
   case $cmd in
      start)
         # do something that sticks forever-ish, min ca. 20mn;
         (( params = 1111 * $RANDOM ))
         exec -a "$id_prefix" $verb $params &
         echo "process started with pid $! and random value $params"
         ;;
      stop)
         params=" $2"
         pid=$(ps -ajxf | gawk -v params="$params" '{if (match($0, " " ENVIRON["id_prefix"] params "$")) pid = $2} END {print (pid ? pid : "")}')
         if [[ ! -z $pid ]]; then
            kill -9 ${pid} &> /dev/null
            wait ${pid} &> /dev/null
         fi
         ;;
      status)
         params=" $2"
         read pid gid < <(ps -ajxf | gawk -v params="$params" '{if (match($0, " " ENVIRON["id_prefix"] params "$")) pid = $2 " " $3} END {print (pid ? pid : "")}')
         if [[ ! -z $pid ]]; then
            echo "random value${params} is used by process with pid $pid and pgid $gid"
         else
            echo "no such process running"
         fi
         ;;
   esac
}

help() {
   echo
   echo "send signal SIGURG for help"
   echo "send signal SIGPWR to start a few processes"
   echo "send signal SIGUSR1 to start a new process"
   echo "send signal SIGUSR2 for the list of started processes"
   echo "send signal SIGINT | SIGABRT  to stop all the processes"
   echo "send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container"
}

start_all() {
   echo; echo "starting a few processes at $(date +"%Y/%m/%d %H:%M:%S")"
   for loop in $(seq 5); do
      func start
   done

   # show them;
   echo; echo "started processes"
   ps -ajxf | grep "$id_prefix" | grep -v grep
}

start_one() {
   echo; echo "starting a new process at $(date +"%Y/%m/%d %H:%M:%S")"
   func start
}

status_all() {
   echo; echo "status of running processes at $(date +"%Y/%m/%d %H:%M:%S")"
   for no in $(ps -ef | grep "I am number " | grep -v grep | gawk '{print $NF}'); do
      echo "showing $no"
      func status $no
   done
}

stop_all() {
   echo; echo "shutting down the processes at $(date +"%Y/%m/%d %H:%M:%S")"
   for no in $(ps -ef | grep "I am number " | grep -v grep | gawk '{print $NF}'); do
      echo "stopping $no"
      func stop $no
   done
}

shutdown_all() {
   echo; echo "shutting down the container at $(date +"%Y/%m/%d %H:%M:%S")"
   stop_all
   exit 0
}

# -----------
# main;
# -----------

# starts a few dummy processes;
start_all

# display some usage explanation;
help

# make sure the container stays up and waits for signals;
while true; do read; done

The main part of the script starts a few processes, displays a help screen and then waits for input from stdin.
The script can be first tested outside a container as follows.
Run the script:

./dctm.sh

It will start a few easily distinguishable processes and display a help screen:

starting a few processes at 2019/04/06 16:05:35
process started with pid 17621 and random value 19580264
process started with pid 17622 and random value 19094757
process started with pid 17623 and random value 18211512
process started with pid 17624 and random value 3680743
process started with pid 17625 and random value 18198180
 
started processes
17619 17621 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 19580264
17619 17622 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 19094757
17619 17623 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 18211512
17619 17624 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 3680743
17619 17625 17619 1994 pts/0 17619 S+ 1000 0:00 | \_ I am number 18198180
 
send signal SIGURG for help
send signal SIGPWR to start a few processes
send signal SIGUSR1 to start a new process
send signal SIGUSR2 for the list of started processes
send signal SIGINT | SIGABRT to stop all the processes
send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container

Then, it will simply sit there and wait until it is asked to quit.
From another terminal, let’s check the started processes:

ps -ef | grep "I am number " | grep -v grep
docker 17621 17619 0 14:40 pts/0 00:00:00 I am number 19580264
docker 17622 17619 0 14:40 pts/0 00:00:00 I am number 19094757
docker 17623 17619 0 14:40 pts/0 00:00:00 I am number 18211512
docker 17624 17619 0 14:40 pts/0 00:00:00 I am number 3680743
docker 17625 17619 0 14:40 pts/0 00:00:00 I am number 18198180

Those processes could be Documentum ones or anything else, the point here is to control them from the outside, e.g. another terminal session, in or out of a docker container. We will do that though O/S signals. The bash shell lets a script listen and react to signals through the trap command. On top of the script, we have listed all the signals we’d like the script to react upon:

trap 'help'         SIGURG
trap 'start_all'    SIGPWR
trap 'start_one'    SIGUSR1
trap 'status_all'   SIGUSR2
trap 'stop_all'     SIGINT SIGABRT
trap 'shutdown_all' SIGHUP SIGQUIT SIGTERM

It’s really a feast of traps !
The first line for example says that on receiving the SIGURG signal, the script’s function help() should be executed, no matter what the script was doing at that time, which in our case is just waiting for input from stdin.
The SIGPWR signal is interpreted as start in the background another suite of five processes with the same naming convention “I am number ” followed with a random number. The function start_all() is called on receiving this signal.
The SIGUSR1 signal starts one new process in the background. Function start_one() does just this.
The SIGUSR2 signal displays all the started processes so far by invoking function status_all().
The SIGINT and SIGABRT signals shut down all the started processes so far. Function stop_all() is called to this purpose.
Finally, signals SIGHUP, SIGQUIT, or SIGTERM all invokes function shutdown_all() to stop all the processes and exit the script.
Admittedly, those signal’s choice is a bit stretched out but this is for the sake of the demonstration so bear with us. Feel free to remap the signals to the functions any way you prefer.
Now, how to send those signals ? The ill-named kill command or program is here for this. Despite its name, nobody will be killed here fortunately; signals will be sent and processes decide to react opportunely. Here, of course, we do react opportunely.
Here is its syntax (let’s use the −−long-options for clarity):

/bin/kill --signal pid

Since bash has a built-in kill command that behaves differently, make sure to call the right program by specifying its full path name, /bin/kill.
Example of use:

/bin/kill --signal SIGURG $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
# or shorter:
/bin/kill --signal SIGURG $(pgrep ^dctm.sh$)

The signal’s target is our test program dctm.sh, which is identified vis a vis kill through its PID.
Signals can be specified by their full name, e.g. SIGURG, SIGPWR, etc… or without the SIG prefix such as URG, PWR, etc … or even through their numeric value as shown below:

/bin/kill -L
1 HUP 2 INT 3 QUIT 4 ILL 5 TRAP 6 ABRT 7 BUS
8 FPE 9 KILL 10 USR1 11 SEGV 12 USR2 13 PIPE 14 ALRM
15 TERM 16 STKFLT 17 CHLD 18 CONT 19 STOP 20 TSTP 21 TTIN
22 TTOU 23 URG 24 XCPU 25 XFSZ 26 VTALRM 27 PROF 28 WINCH
29 POLL 30 PWR 31 SYS
 
or:
 
kill -L
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX

Thus, the following incantations are equivalent:

/bin/kill --signal SIGURG $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal URG $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal 23 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')

On receiving the supported signals, the related function is invoked and thereafter the script returns to its former activity, namely the loop that waits for a fake input. The loop is needed otherwise the script would exit on returning from a trap handler. In effect, the trap is processed like a function call and, on returning, the next statement at the point the trap occurred is given control. If there is none, then the script terminates. Hence the loop.
Here is the output after sending a few signals; for clarity, the signals sent from another terminal have been manually inserted as highlighted comments before the output they caused.
Output terminal:

# SIGUSR2:
status of running processes at 2019/04/06 16:12:46
showing 28046084
random value 28046084 is used by process with pid 29248 and pgid 29245
showing 977680
random value 977680 is used by process with pid 29249 and pgid 29245
showing 26299592
random value 26299592 is used by process with pid 29250 and pgid 29245
showing 25982957
random value 25982957 is used by process with pid 29251 and pgid 29245
showing 27830550
random value 27830550 is used by process with pid 29252 and pgid 29245
5 processes found
 
# SIGUSR1:
starting a new process at 2019/04/06 16:18:56
process started with pid 29618 and random value 22120010
 
# SIGUSR2:
status of running processes at 2019/04/06 16:18:56
showing 28046084
random value 28046084 is used by process with pid 29248 and pgid 29245
showing 977680
random value 977680 is used by process with pid 29249 and pgid 29245
showing 26299592
random value 26299592 is used by process with pid 29250 and pgid 29245
showing 25982957
random value 25982957 is used by process with pid 29251 and pgid 29245
showing 27830550
random value 27830550 is used by process with pid 29252 and pgid 29245
showing 22120010
random value 22120010 is used by process with pid 29618 and pgid 29245
6 processes found
 
# SIGURG:
send signal SIGURG for help
send signal SIGPWR to start a few processes
send signal SIGUSR1 to start a new process
send signal SIGUSR2 for the list of started processes
send signal SIGINT | SIGABRT to stop all the processes
send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container
 
# SIGINT:
shutting down the processes at 2019/04/06 16:20:17
stopping 28046084
stopping 977680
stopping 26299592
stopping 25982957
stopping 27830550
stopping 22120010
6 processes stopped
 
# SIGUSR2:
status of running processes at 2019/04/06 16:20:18
0 processes found
 
# SIGPWR:
starting a few processes at 2019/04/06 16:20:50
process started with pid 29959 and random value 2649735
process started with pid 29960 and random value 14971836
process started with pid 29961 and random value 14339677
process started with pid 29962 and random value 4460665
process started with pid 29963 and random value 12688731
5 processes started
 
started processes:
29245 29959 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 2649735
29245 29960 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 14971836
29245 29961 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 14339677
29245 29962 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 4460665
29245 29963 29245 1994 pts/0 29245 S+ 1000 0:00 | \_ I am number 12688731
 
# SIGUSR2:
status of running processes at 2019/04/06 16:20:53
showing 2649735
random value 2649735 is used by process with pid 29959 and pgid 29245
showing 14971836
random value 14971836 is used by process with pid 29960 and pgid 29245
showing 14339677
random value 14339677 is used by process with pid 29961 and pgid 29245
showing 4460665
random value 4460665 is used by process with pid 29962 and pgid 29245
showing 12688731
random value 12688731 is used by process with pid 29963 and pgid 29245
5 processes found
 
# SIGTERM:
shutting down the container at 2019/04/06 16:21:42
 
shutting down the processes at 2019/04/06 16:21:42
stopping 2649735
stopping 14971836
stopping 14339677
stopping 4460665
stopping 12688731
5 processes stopped

In the command terminal:

/bin/kill --signal SIGUSR2 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGUSR1 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGUSR2 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGURG $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGINT $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGUSR2 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGPWR $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGUSR2 $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
/bin/kill --signal SIGTERM $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')

Of course, sending the untrappable SIGKILL signal will abort the process that executes dctm.sh. However, its children processes will survive and be reparented to the root process:

...
status of running processes at 2019/04/10 22:38:25
showing 19996889
random value 19996889 is used by process with pid 24520 and pgid 24398
showing 5022831
random value 5022831 is used by process with pid 24521 and pgid 24398
showing 1363197
random value 1363197 is used by process with pid 24522 and pgid 24398
showing 18185959
random value 18185959 is used by process with pid 24523 and pgid 24398
showing 10996678
random value 10996678 is used by process with pid 24524 and pgid 24398
5 processes found
# /bin/kill --signal SIGKILL $(ps -ef | grep dctm.sh | grep -v grep | gawk '{print $2}')
Killed
 
ps -ef | grep number | grep -v grep
docker 24520 1 0 22:38 pts/1 00:00:00 I am number 19996889
docker 24521 1 0 22:38 pts/1 00:00:00 I am number 5022831
docker 24522 1 0 22:38 pts/1 00:00:00 I am number 1363197
docker 24523 1 0 22:38 pts/1 00:00:00 I am number 18185959
docker 24524 1 0 22:38 pts/1 00:00:00 I am number 10996678
 
# manual killing those processes;
ps -ef | grep number | grep -v grep | gawk '{print $2}' | xargs kill -9

 
ps -ef | grep number | grep -v grep
<empty>
 
# this works too:
kill -9 $(pgrep -f "I am number [0-9]+$")
# or, shorter:
pkill -f "I am number [0-9]+$"

Note that there is a simpler way to kill those related processes: by using their PGID, or process group id:

ps -axjf | grep number | grep -v grep
1 25248 25221 24997 pts/1 24997 S 1000 0:00 I am number 3489651
1 25249 25221 24997 pts/1 24997 S 1000 0:00 I am number 6789321
1 25250 25221 24997 pts/1 24997 S 1000 0:00 I am number 15840638
1 25251 25221 24997 pts/1 24997 S 1000 0:00 I am number 19059205
1 25252 25221 24997 pts/1 24997 S 1000 0:00 I am number 12857603
# processes have been reparented to PPID == 1;
# highlighted columns 3 is the PGID;
# kill them using negative-PGID;
kill -9 -25221
ps -axjf | grep number | grep -v grep
<empty>

This is why the status() commands displays the PGID.
In order to tell kill that the given PID is actually a PGID, it has to be prefixed with a minus sign. Alternatively, the command:

pkill -g pgid

does that too.
All this looks quite promising so far !
Please, join me now to part II of this article for the dockerization of the test script.

Cet article How to stop Documentum processes in a docker container, and more (part I) est apparu en premier sur Blog dbi services.

How to stop Documentum processes in a docker container, and more (part II)

Sat, 2019-04-13 07:00
ok, Ok, OK, and the docker part ?

In a minute.
In part I of this 2-part article, we showed how traps could be used to control a running executable from the outside. We also presented a bash test script to try out and play with traps. Now that we are confident about that simulation script, let’s dockerize it and try it out in this new environment. We use the dockerfile Dockerfile-dctm to create the CS image and so we include an ENTRYPOINT clause as follows:

FROM ubuntu:latest
RUN apt-get update &&      \
    apt-get install -y gawk
COPY dctm.sh /root/.
ENTRYPOINT ["/root/dctm.sh", "start"]

The above ENTRYPOINT syntax allows to run the dctm.sh script with PID 1 because the initial bash process (which runs with PID 1 obviously) performs an exec call to load and execute that script. To keep the dockerfile simple, the script will run as root. In the real-world, CS processes run as something like dmadmin, so this account would have to be set up in the dockerfile (or through some orchestration software).
When the docker image is run or the container is started, the dctm.sh script gets executed with PID 1; as the script is invoked with the start option, it starts the processes. Afterwards, it justs sits there waiting for the SIGTERM signal from the docker stop command; once received, it shuts down all the running processes under its control and exits, which will also stop the container’s process. Additionaly, it can listen and react to some other signals, just like when it runs outside of a container.

Testing

Let’s test this approach with a container built using the above simple Dockerfile-dctm. Since the container is started in interactive mode, its output is visible on the screen and the commands to test it have to be sent from another terminal session; as before, for clarity, the commands have been inserted in the transcript as comments right before their result.

docker build -f Dockerfile-dctm --tag=dctm .
Sending build context to Docker daemon 6.656kB
Step 1/5 : FROM ubuntu:latest
---> 1d9c17228a9e
Step 2/5 : RUN apt-get update && apt-get install -y gawk
---> Using cache
---> f550d88161b6
Step 3/5 : COPY dctm.sh /root/.
---> e15e3f4ea93c
Step 4/5 : HEALTHCHECK --interval=5s --timeout=2s --retries=1 CMD grep -q OK /tmp/status || exit 1
---> Running in 0cea23cec09e
Removing intermediate container 0cea23cec09e
---> f9bf4138eb83
Step 5/5 : ENTRYPOINT ["/root/dctm.sh", "start"] ---> Running in 670c5231d5d8
Removing intermediate container 670c5231d5d8
---> 27991672905e
Successfully built 27991672905e
Successfully tagged dctm:latest
 
# docker run -i --name=dctm dctm
process started with pid 9 and random value 32760057
process started with pid 10 and random value 10364519
process started with pid 11 and random value 2915264
process started with pid 12 and random value 3744070
process started with pid 13 and random value 23787621
5 processes started
 
started processes:
1 9 1 1 ? -1 S 0 0:00 I am number 32760057
1 10 1 1 ? -1 S 0 0:00 I am number 10364519
1 11 1 1 ? -1 S 0 0:00 I am number 2915264
1 12 1 1 ? -1 S 0 0:00 I am number 3744070
1 13 1 1 ? -1 S 0 0:00 I am number 23787621
 
send signal SIGURG for help
send signal SIGPWR to start a few processes
send signal SIGUSR1 to start a new process
send signal SIGUSR2 for the list of started processes
send signal SIGINT | SIGABRT to stop all the processes
send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container
 
# docker kill --signal=SIGUSR2 dctm
status of running processes at 2019/04/06 14:56:14
showing 32760057
random value 32760057 is used by process with pid 9 and pgid 1
showing 10364519
random value 10364519 is used by process with pid 10 and pgid 1
showing 2915264
random value 2915264 is used by process with pid 11 and pgid 1
showing 3744070
random value 3744070 is used by process with pid 12 and pgid 1
showing 23787621
random value 23787621 is used by process with pid 13 and pgid 1
5 processes found
 
# docker kill --signal=SIGURG dctm
send signal SIGURG for help
send signal SIGPWR to start a few processes
send signal SIGUSR1 to start a new process
send signal SIGUSR2 for the list of started processes
send signal SIGINT | SIGABRT to stop all the processes
send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the processes and exit the container
 
# docker kill --signal=SIGUSR1 dctm
starting a new process at 2019/04/06 14:57:30
process started with pid 14607 and random value 10066771
 
# docker kill --signal=SIGABRT dctm
shutting down the processes at 2019/04/06 14:58:12
stopping 32760057
stopping 10364519
stopping 2915264
stopping 3744070
stopping 23787621
stopping 10066771
6 processes stopped
 
# docker kill --signal=SIGUSR2 dctm
status of running processes at 2019/04/06 14:59:01
0 processes found
 
# docker kill --signal=SIGTERM dctm
shutting down the container at 2019/04/06 14:59:19
 
shutting down the processes at 2019/04/06 14:59:19
0 processes stopped
 
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

We observe exactly the same behavior as the stand-alone dctm.sh, that’s comforting.
Moreover, when the container is stopped, the signal is trapped correctly by the proxy:

...
random value 14725194 is used by process with pid 29 and pgid 1
showing 12554300
random value 12554300 is used by process with pid 30 and pgid 1
5 processes found
 
# date -u +"%Y/%m/%d %H:%M:%S"; docker stop dctm
# 2019/04/10 22:51:47
# dctm
shutting down the container at 2019/04/10 22:51:47
 
shutting down the processes at 2019/04/10 22:51:47
stopping 36164161
stopping 6693775
stopping 11404415
stopping 14725194
stopping 12554300
5 processes stopped

The good thing is that if the docker daemon is stopped at the host level, either interactively or at system shut down, the daemon first sends a SIGTERM to every running container:

date --utc +"%Y/%m/%d %H-%M-%S"; sudo systemctl stop docker
2019/04/06 15-02-18
[sudo] password for docker:

and on the other terminal:

shutting down the container at 2019/04/06 15:02:39
 
shutting down the processes at 2019/04/06 15:02:39
stopping 17422702
stopping 30251419
stopping 14451888
stopping 14890733
stopping 1105445
5 processes stopped

so each container can process the signal accordingly to its needs. Our future Documentum container is now ready for a clean shutdown.

Doing something useful instead of sitting idle: light monitoring

As said, the proxy script waits for a signal from within a loop; the action performed inside the loop is waiting for an input from stdin, which is not particularly useful. Why not taking advantage of this slot to make it do something useful like a monitoring of the running processes ? Such a function already exists in the script, it’s status_all(). Thus, let’s set this up:

# while true; do read; done
# do something useful instead;
while true; do
status_all
sleep 30
done

We quickly notice that their processing is not so briskly any more. In effect, bash waits before processing a signal until the command currently executing completes, here any command inside the loop, so a slight delay is perceptible before our signals get care of, especially if we are in the middle of a ‘sleep 600’ command. Moreover, incoming signals are not stacked up but they replace each one another until the most recent one only is processed. In practical conditions, this is not a problem for it is still possible to send signals and have them processed, just not in burst mode. If a better reactivity to signals is needed, the sleep duration should be shortened and/or a separate scheduling of the monitoring be introduced (started asynchronously from a loop in the entrypoint or from a crontab inside the container ?).
Note that the status send to stdout from within a detached container (i.e. started without the -i for interactive option, which is generally the case) is not visible outside a container. Fortunately, and even better, the docker logs command makes it possible to view on demand the status output:

docker logs --follow container_name

In our case:

docker logs --follow dctm
status of running processes at 2019/04/06 15:21:21
showing 8235843
random value 8235843 is used by process with pid 8 and pgid 1
showing 16052839
random value 16052839 is used by process with pid 9 and pgid 1
showing 1097668
random value 1097668 is used by process with pid 10 and pgid 1
showing 5113933
random value 5113933 is used by process with pid 11 and pgid 1
showing 1122110
random value 1122110 is used by process with pid 12 and pgid 1
5 processes found

Note too that the logs commands also has a timestamps option for prefixing the lines output with the time they were produced, as illustrated below:

docker logs --timestamps --since 2019-04-06T18:06:23 dctm
2019-04-06T18:06:23.607796640Z status of running processes at 2019/04/06 18:06:23
2019-04-06T18:06:23.613666475Z showing 7037074
2019-04-06T18:06:23.616334029Z random value 7037074 is used by process with pid 8 and pgid 1
2019-04-06T18:06:23.616355592Z showing 33446655
2019-04-06T18:06:23.623719975Z random value 33446655 is used by process with pid 9 and pgid 1
2019-04-06T18:06:23.623785755Z showing 17309380
2019-04-06T18:06:23.627050839Z random value 17309380 is used by process with pid 10 and pgid 1
2019-04-06T18:06:23.627094599Z showing 13859725
2019-04-06T18:06:23.630436025Z random value 13859725 is used by process with pid 11 and pgid 1
2019-04-06T18:06:23.630472176Z showing 26767323
2019-04-06T18:06:23.633304616Z random value 26767323 is used by process with pid 12 and pgid 1
2019-04-06T18:06:23.635900480Z 5 processes found
2019-04-06T18:06:26.640490424Z

This is handy, but still not perfect, for those cases where lazy programmers neglect to date their logs’ entries.
Now, since we have a light-weight monitoring in place, we can use it in the dockerfile’s HEALTHCHECK clause to show the container’s status through the ps command. As the processes’ status is already determined in the wait loop of the dctm.sh script, it is pointless to compute it again. Instead, we can modify status_all() to print the overall status in a file, say in /tmp/status, so that HEALTHCHECK can read it later every $INTERVAL period. If status_all() is invoked every $STATUS_PERIOD, a race condition can occur every LeastCommonMultiple($INTERVAL, $STATUS_PERIOD), i.e. when these 2 processes will access the file simultaneously, the former in reading mode and the latter in writing mode. To avoid this nasty situation, status_all() will first write into /tmp/tmp_status and later rename this file to /tmp/status. For the sake of our example, let’s decide that the container is unhealthy if there are no dummy processes running, and healthy if there is at least one running (in real conditions, the container would be healthy if ALL the processes are responding, and unhealthy if ANY of them is not but it also depends on the definition of health). Here is the new dctm.sh’s status_all() function:

status_all() {
   echo; echo "status of running processes at $(date +"%Y/%m/%d %H:%M:%S")"
   nb_processes=0
   for no in $(ps -ef | grep "I am number " | grep -v grep | gawk '{print $NF}'); do
      echo "showing $no"
      func status $no
      (( nb_processes++ ))
   done
   echo "$nb_processes processes found"
   if [[ $nb_processes -eq 0 ]]; then
      printf "status: bad\n" > /tmp/tmp_status
   else
      printf "status: OK\n" > /tmp/tmp_status
   fi
   mv /tmp/tmp_status /tmp/status
}

Here is the new dockerfile:

FROM ubuntu:latest
RUN apt-get update &&      \
    apt-get install -y gawk
COPY dctm.sh /root/.
HEALTHCHECK --interval=10s --timeout=2s --retries=2 CMD grep -q OK /tmp/status || exit 1
ENTRYPOINT ["/root/dctm.sh", "start"]

Here is what the ps commands shows now:

# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
64e22a8f75cd dctm "/root/dctm.sh start" 38 minutes ago Up 2 seconds (health: starting) dctm
 
...
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
64e22a8f75cd dctm "/root/dctm.sh start" 38 minutes ago Up 6 seconds (healthy) dctm

A new column, STATUS, is displayed showing the container’s current health status.
If a new built is unwanted, the clause can be specified when running the image:

docker run --name dctm --health-cmd "grep -q OK /tmp/status || exit 1" --health-interval=10s --health-timeout=2s --health-retries=1 dctm

Note how these parameters are now prefixed with “health-” so they can be related to the HEALTHCHECK clause.
Now, in order to observe how the status is updated, let’s play with the signals INT and PWR to respectively stop and launch processes inside the container:

# current situation:
docker logs dctm
status of running processes at 2019/04/12 14:05:00
showing 29040429
random value 29040429 is used by process with pid 1294 and pgid 1
showing 34302125
random value 34302125 is used by process with pid 1295 and pgid 1
showing 2979702
random value 2979702 is used by process with pid 1296 and pgid 1
showing 4661756
random value 4661756 is used by process with pid 1297 and pgid 1
showing 7169283
random value 7169283 is used by process with pid 1298 and pgid 1
5 processes found
 
# show status:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff25beae71f0 dctm "/root/dctm.sh start" 55 minutes ago Up 55 minutes (healthy) dctm
 
# stop the processes:
docker kill --signal=SIGINT dctm
# wait up to the given health-interval and check again:
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff25beae71f0 dctm "/root/dctm.sh start" 57 minutes ago Up 57 minutes (unhealthy) dctm
 
# restart the processes:
docker kill --signal=SIGPWR dctm
 
# wait up to the health-interval and check again:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff25beae71f0 dctm "/root/dctm.sh start" About an hour ago Up About an hour (healthy) dctm

The healthstatus command works as expected.
Note that the above HEALTHCHECK successful tests were done under Centos Linux release 7.6.1810 (Core) with docker Client Version 1.13.1 and API version 1.26, Server Version 1.13.1 and API version 1.26 (minimum version 1.12).
The HEALTHCHECK clause looks broken under Ubuntu 18.04.1 LTS with docker Client Version 18.09.1 and API version 1.39, Server Engine – Community Engine Version 18.09.1 and API version 1.39 (minimum version 1.12). After a change of status, HEALTHCHECK sticks to the unhealthy state and “docker ps” always shows “healthy” no matter the following changes in the running processes inside the container. It looks like the monitoring cycles until an unhealthy condition occurs, then it stops cycling and stays in the unhealthy state, as it is also visible in the timestamps by inspecting the container’s status:

docker inspect --format='{{json .State.Health}}' dctm
{"Status":"unhealthy","FailingStreak":0,"Log":[{"Start":"2019-04-12T16:04:18.995957081+02:00","End":"2019-04-12T16:04:19.095540448+02:00","ExitCode":0,"Output":""},
{"Start":"2019-04-12T16:04:21.102151004+02:00","End":"2019-04-12T16:04:21.252025292+02:00","ExitCode":0,"Output":""},
{"Start":"2019-04-12T16:04:23.265929424+02:00","End":"2019-04-12T16:04:23.363387974+02:00","ExitCode":0,"Output":""},
{"Start":"2019-04-12T16:04:25.372757042+02:00","End":"2019-04-12T16:04:25.471229004+02:00","ExitCode":0,"Output":""},
{"Start":"2019-04-12T16:04:27.47692396+02:00","End":"2019-04-12T16:04:27.580458001+02:00","ExitCode":0,"Output":""}]}

The last 5 entries stop being updated.
While we are mentioning bugs, “docker logs –tail 0 dctm” under Centos displays the whole log available so far, so specify 1 at least to reduce the output of the log history to a minimum. Under Ubuntu, it works as expected though. However, the “–follow” option works under Centos but not under Ubuntu. So, there is some instability here; be prepared to comprehensively test every docker’s feature to be used.

Using docker’s built-in init process

As said above, docker does not have a full-fledged init process like systemd but still offers something vaguely related, tini, which stands for “tiny init”, see here. It wont’t solve the inability of Documentum’s processes to respond to signals and therefore the proxy script is still needed. However, in addition to forwarding signals to its child process, tini has the advantage of taking care of defunct processes, or zombies, by reaping them up regularly. Documentum produces a lot of them and they finish up disappearing in the long run. Still, tini could speeds this up a little bit.
tini can be invoked from the command-line as follows:

docker run -i --name=dctm --init dctm

But it is also possible to integrate it directly in the dockerfile so the −−init option won’t be needed any longer (and shouldn’t be used otherwise tini will not be PID 1 and its reaping feature won’t be possible anymore, making it useless for us):

FROM ubuntu:latest
COPY dctm.sh /root/.
ENV TINI_VERSION v0.18.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN apt-get update &&      \
    apt-get install -y gawk &&      \
    chmod +x /tini
HEALTHCHECK --interval=10s --timeout=2s --retries=2 CMD grep -q OK /tmp/status || exit 1
ENTRYPOINT ["/tini", "--"]
# let tini launch the proxy;
CMD ["/root/dctm.sh", "start"]

Let’s build the image with tini:

docker build -f Dockerfile-dctm --tag=dctm:with-tini .
Sending build context to Docker daemon 6.656kB
Step 1/8 : FROM ubuntu:latest
---> 1d9c17228a9e
Step 2/8 : COPY dctm.sh /root/.
---> a724637581fe
Step 3/8 : ENV TINI_VERSION v0.18.0
---> Running in b7727fc065e9
Removing intermediate container b7727fc065e9
---> d1e1a17d7255
Step 4/8 : ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
Downloading [==================================================>] 24.06kB/24.06kB
---> 47b1fc9f82c7
Step 5/8 : RUN apt-get update && apt-get install -y gawk && chmod +x /tini
---> Running in 4543b6f627f3
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB] Get:2 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB] ...
Step 6/8 : HEALTHCHECK --interval=5s --timeout=2s --retries=1 CMD grep -q OK /tmp/status || exit 1
---> Running in d2025cbde647
Removing intermediate container d2025cbde647
---> a17fd24c4819
Step 7/8 : ENTRYPOINT ["/tini", "--"] ---> Running in ee1e10062f22
Removing intermediate container ee1e10062f22
---> f343d21175d9
Step 8/8 : CMD ["/root/dctm.sh", "start"] ---> Running in 6d41f591e122
Removing intermediate container 6d41f591e122
---> 66541b8c7b37
Successfully built 66541b8c7b37
Successfully tagged dctm:with-tini

Let’s run the image:

docker run -i --name=dctm dctm:with-tini
 
starting a few processes at 2019/04/07 11:55:30
process started with pid 9 and random value 23970936
process started with pid 10 and random value 35538668
process started with pid 11 and random value 12039907
process started with pid 12 and random value 21444522
process started with pid 13 and random value 7681454
5 processes started
...

And let’s see how the container’s processes look like with tini from another terminal:

docker exec -it dctm /bin/bash
ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:55 ? 00:00:00 /tini -- /root/dctm.sh start
root 6 1 0 11:55 ? 00:00:00 /bin/bash /root/dctm.sh start
root 9 6 0 11:55 ? 00:00:00 I am number 23970936
root 10 6 0 11:55 ? 00:00:00 I am number 35538668
root 11 6 0 11:55 ? 00:00:00 I am number 12039907
root 12 6 0 11:55 ? 00:00:00 I am number 21444522
root 13 6 0 11:55 ? 00:00:00 I am number 7681454
root 174 0 0 11:55 ? 00:00:00 /bin/bash
root 201 6 0 11:55 ? 00:00:00 sleep 3
root 208 174 0 11:55 ? 00:00:00 ps -ef
...

So tini is really running with PID == 1 and has started the proxy as its child process as expected.
Let’s test the container by sending a few signals:

docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5f745485a907 dctm:with-tini "/tini -- /root/dctm…" 8 seconds ago Up 7 seconds (healthy) dctm
 
# docker kill --signal=SIGINT dctm
shutting down the processes at 2019/04/07 11:59:42
stopping 23970936
stopping 35538668
stopping 12039907
stopping 21444522
stopping 7681454
5 processes stopped
 
status of running processes at 2019/04/07 11:59:42
0 processes found
 
status of running processes at 2019/04/07 11:59:45
0 processes found
 
# docker kill --signal=SIGTERM dctm
shutting down the processes at 2019/04/07 12:00:00
0 processes stopped

and then the container gets stopped. So, the signals are well transmitted to tini’s child process.
If one prefers to use the run’s −−init option instead of modifying the dockerfile and introduce tini as the ENTRYPOINT, it is even better because we will have only one version of the dockerfile to maintain. Here is the invocation and how the processes will look like:

docker run --name=dctm --init dctm
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1d9fa0d98817 dctm "/root/dctm.sh start" 4 seconds ago Up 3 seconds (health: starting) dctm
docker exec dctm /bin/bash -c "ps -ef"
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 12:11 ? 00:00:00 /dev/init -- /root/dctm.sh start
root 6 1 0 12:11 ? 00:00:00 /bin/bash /root/dctm.sh start
root 9 6 0 12:11 ? 00:00:00 I am number 23850948
root 10 6 0 12:11 ? 00:00:00 I am number 19493606
root 11 6 0 12:11 ? 00:00:00 I am number 34535435
root 12 6 0 12:11 ? 00:00:00 I am number 32571187
root 13 6 0 12:11 ? 00:00:00 I am number 35596440
root 116 0 1 12:11 ? 00:00:00 /bin/bash
root 143 6 0 12:11 ? 00:00:00 sleep 3
root 144 116 0 12:11 ? 00:00:00 ps -ef

It looks even better; tini is still there – presumably – but hidden behind /dev/init so the container will be immune to any future change in the default init process.

Adapting dctm.sh for Documentum

Adapting the proxy script to a real Documentum installation with its own central stop/start/status script, let’s name it dctm_stop_start.sh, is easy. The main changes are limited to the func() function; now, it just relays the commands to the script dctm_stop_start.sh:

#!/bin/bash
# launches in the background, or stops or queries the status of, the Documentum dctm_start_stop.sh script;
# Usage:
#   ./dctm.sh stop | start | status
# e.g.:
# $ ./dctm.sh start
# cec - dbi-services - April 2019
#
trap 'help'         SIGURG
trap 'start_all'    SIGPWR
trap 'status_all'   SIGUSR2
trap 'stop_all'     SIGINT SIGABRT
trap 'shutdown_all' SIGHUP SIGQUIT SIGTERM

verb="sleep"
export id_prefix="I am number"

func() {
   cmd="$1"
   case $cmd in
      start)
         ./dctm_start_stop.sh start &
         ;;
      stop)
         ./dctm_start_stop.sh stop &
         ;;
      status)
         ./dctm_start_stop.sh status
         return $?
         ;;
   esac
}

help() {
   echo
   echo "send signal SIGURG for help"
   echo "send signal SIGPWR to start the Documentum processes"
   echo "send signal SIGUSR2 for the list of Documentum started processes"
   echo "send signal SIGINT | SIGABRT to stop all the processes"
   echo "send signal SIGHUP | SIGQUIT | SIGTERM to shutdown the Documentum processes and exit the container"
}

start_all() {
   echo; echo "starting the Documentum processes at $(date +"%Y/%m/%d %H:%M:%S")"
   func start
}

status_all() {
   echo; echo "status of Documentum processes at $(date +"%Y/%m/%d %H:%M:%S")"
   func status
   if [[ $? -eq 0 ]]; then
      printf "status: bad\n" > /tmp/tmp_status
   else
      printf "status: OK\n" > /tmp/tmp_status
   fi
   mv /tmp/tmp_status /tmp/status
}

stop_all() {
   echo; echo "shutting down the Documentum processes at $(date +"%Y/%m/%d %H:%M:%S")"
   func stop
}

shutdown_all() {
   echo; echo "shutting down the container at $(date +"%Y/%m/%d %H:%M:%S")"
   stop_all
   exit 0
}

# -----------
# main;
# -----------

# starts a few dummy processes;
[[ "$1" = "start" ]] && start_all

# make sure the container stays up and waits for signals;
while true; do status_all; sleep 3; done

Here is a skeleton of the script dctm_start_stop.sh:

#!/bin/bash
   cmd="$1"
   case $cmd in
      start)
         # insert here your Documentum installation's start scripts, e.g.
         # /app/dctm/dba/dm_launch_Docbroker
         # /app/dctm/dba/dm_start_testdb
         # /app/dctm/shared/wildfly9.0.1/server/startMethodServer.sh &
         echo "started"
         ;;
      stop)
         # insert here your Documentum installation's stop scripts, e.g.
         # /app/dctm/shared/wildfly9.0.1/server/stopMethodServer.sh
         # /app/dctm/dba/dm_shutdown_testdb
         # /app/dctm/dba/dm_stop_Docbroker
         echo "stopped"
         ;;
      status)
         # insert here your statements to test the Documentum processes's health;
         # e.g. dmqdocbroker -c -p 1489 ....
         # e.g. idql testdb -Udmadmin -Pxxx to try to connect to the docbase;
         # e.g. wget http://localhost:9080/... to test the method server;
         # 0: OK, 1: NOK;
         exit 0
         ;;
   esac

Let’s introduce a slight modification in the dockerfile’s entrypoint clause: instead of having the Documentum processes start at container startup, the container will start with only the proxy running inside. Only upon receiving the signal SIGPWR will the proxy start all the Documentum processes:

ENTRYPOINT ["/root/dctm.sh", ""]

If the light-weight monitoring is in action, the container will be flagged unhealthy but this can be an useful reminder.
Note that the monitoring could be activated or deactivated through a signal as showed in the diff output below:

diff dctm-no-monitoring.sh dctm.sh
21a22
> trap 'status_on_off' SIGCONT
24a26
> bStatus=1
119a122,125
> status_on_off() {
>    (( bStatus = (bStatus + 1) % 2 ))
> }
> 
132c138
    [[ $bStatus -eq 1 ]] && status_all

This is more flexible and better matches the reality.

Shortening docker commands

We have thrown a lot of docker commands at you. If they are used often, their verbosity can be alleviated through aliases, e.g.:

alias di='docker images'
alias dpsas='docker ps -as'
alias dps='docker ps'
alias dstatus='docker kill --signal=SIGUSR2'
alias dterm='docker kill --signal=SIGTERM'
alias dabort='docker kill --signal=SIGABRT'
alias dlogs='docker logs --follow'
alias dstart='docker start'
alias dstarti='docker start -i'
alias dstop="docker stop"
alias drm='docker container rm'

or even bash functions for the most complicated ones (to be appended into e.g. your ~/.bashrc):

function drun {
image="$1"
docker run -i --name=$image $image
}
 
function dkill {
signal=$1
container=$2
docker kill --signal=$signal $container
}
 
function dbuild {
docker build -f Dockerfile-dctm --tag=$1 .
}

The typical sequence for testing the dockerfile Dockerfile-dctm to produce the image dctm and run it as the dctm container is:

dbuild dctm
drm dctm
drun dctm

Much less typing.

Conclusion

At the end of the day, it is no such a big deal that the Documentum CS does not process signals sent to it for it is easy to work around this omission and even go beyond the basic stops and starts. As always, missing features or shortcomings become a source of inspiration and enhancements !
Containerization has lots of advantages but we have noticed that docker’s implementations vary between versions and platforms so some features don’t always work as expected, if at all.
In a future blog, I’ll show a containerization of the out of the box CS that includes signal trapping. In the meantime, live long and don’t despair.

Cet article How to stop Documentum processes in a docker container, and more (part II) est apparu en premier sur Blog dbi services.

PostgreSQL 12: Explain will display custom settings, if instructed

Wed, 2019-04-10 13:35

How many times did you try to solve a performance issue but have not been able to reproduce the explain plan? Whatever you tried you always got a different result. Lets say you managed to get a dump of the database in question, got all the PostreSQL parameters the same, gathered statistics but still you do not manage to get the same plan as the one who reported the issue. What could be a potential issue here? Lets do a short demo:

Imagine someone is sending you this plan for a simple count(*) against pg_class:

postgres=# explain (analyze) select count(*) from pg_class;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=23.10..23.11 rows=1 width=8) (actual time=0.293..0.293 rows=1 loops=1)
   ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.27..22.12 rows=390 width=0) (actual time=0.103..0.214 rows=390 loops=1)
         Heap Fetches: 0
 Planning Time: 0.155 ms
 Execution Time: 0.384 ms
(5 rows)

When you try the same on your environment the plan always looks like this (sequential scan, but not an index only scan):

postgres=# explain (analyze) select count(*) from pg_class;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17.88..17.89 rows=1 width=8) (actual time=0.322..0.323 rows=1 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..16.90 rows=390 width=0) (actual time=0.017..0.220 rows=390 loops=1)
 Planning Time: 1.623 ms
 Execution Time: 0.688 ms
(4 rows)

In this case the index only scan is even faster, but usually you get a sequential scan because costs are lower. Whatever you try, you can not reproduce it. What you can’t know: The person reporting the issue didn’t tell you about that:

postgres=# set enable_seqscan = off;
SET
postgres=# explain (analyze) select count(*) from pg_class;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=23.10..23.11 rows=1 width=8) (actual time=0.230..0.230 rows=1 loops=1)
   ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.27..22.12 rows=390 width=0) (actual time=0.032..0.147 rows=390 loops=1)
         Heap Fetches: 0
 Planning Time: 0.130 ms
 Execution Time: 0.281 ms

Just before executing the statement a parameter has been changed which influences PostgreSQL’s choise about the best plan. And this is where the new feature of PostgreSQL 12 becomes handy:

postgres=# explain (analyze,settings) select count(*) from pg_class;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=23.10..23.11 rows=1 width=8) (actual time=0.309..0.310 rows=1 loops=1)
   ->  Index Only Scan using pg_class_oid_index on pg_class  (cost=0.27..22.12 rows=390 width=0) (actual time=0.045..0.202 rows=390 loops=1)
         Heap Fetches: 0
 Settings: enable_seqscan = 'off'
 Planning Time: 0.198 ms
 Execution Time: 0.395 ms
(6 rows)

postgres=# 

From PostgreSQL 12 on you can ask explain to display any setting that has been changed and influenced the decision on which plan to choose. This might be optimizer parameters as here, but this might also be others when they differ from the global setting:

postgres=# explain (analyze,settings) select count(*) from pg_class;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17.88..17.89 rows=1 width=8) (actual time=0.197..0.198 rows=1 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..16.90 rows=390 width=0) (actual time=0.016..0.121 rows=390 loops=1)
 Settings: work_mem = '64MB'
Planning Time: 0.162 ms
 Execution Time: 0.418 ms
(5 rows)

… or:

postgres=# set from_collapse_limit = 13;
SET
postgres=# explain (analyze,settings) select count(*) from pg_class;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17.88..17.89 rows=1 width=8) (actual time=0.190..0.190 rows=1 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..16.90 rows=390 width=0) (actual time=0.012..0.115 rows=390 loops=1)
 Settings: from_collapse_limit = '13', work_mem = '64MB'
 Planning Time: 0.185 ms
 Execution Time: 0.263 ms
(5 rows)

Nice addition. Asking people to use the “settings” switch with analyze, you can be sure on what was changed from the global settings so it is much easier to reproduce the issue and to see what’s going on.

Parameters that do not influence the plan do not pop up:

postgres=# set log_statement='all';
SET
postgres=# explain (analyze,settings) select count(*) from pg_class;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17.88..17.89 rows=1 width=8) (actual time=0.199..0.200 rows=1 loops=1)
   ->  Seq Scan on pg_class  (cost=0.00..16.90 rows=390 width=0) (actual time=0.018..0.124 rows=390 loops=1)
 Settings: from_collapse_limit = '13', work_mem = '64MB'
 Planning Time: 0.161 ms
 Execution Time: 0.391 ms
(5 rows)

Cet article PostgreSQL 12: Explain will display custom settings, if instructed est apparu en premier sur Blog dbi services.

PostgreSQL 12: Copying replication slots

Tue, 2019-04-09 04:44

The concept of replication slots was introduced in PostgreSQL 9.4 and was created to prevent a primary instance to delete WAL that a replica still needs to apply. That could happen when you have a network interruption or the replica was down for another reason. With replication slots you can prevent that at the downside that your master could fill up your disk if the interruption is too long. This concept of a “physical replication slot” was then advanced so you can also create “logical replication slots” which are used in logical replication which made in into PostgreSQL 10. Now with PostgreSQL 12 being in active development another great feature made it into PostgreSQL core: Copying replication slots.

What might that be good for? Lets assume the following scenario:

  • You want to attach two replicas to your master instance
  • You want both replicas to use a physical replication slot
  • You want to build both replicas from the same basebackup and to start at the same position

What you can do in PostgreSQL is to create base backup that will create a physical replication slot:

postgres@pgbox:/home/postgres/ [PGDEV] psql -X -c "select * from pg_replication_slots" postgres
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
(0 rows)

postgres@pgbox:/home/postgres/ [PGDEV] mkdir -p /var/tmp/basebackup
postgres@pgbox:/home/postgres/ [PGDEV] pg_basebackup --create-slot --slot myslot --write-recovery-conf -D /var/tmp/basebackup/
postgres@pgbox:/home/postgres/ [PGDEV] psql -X -c "select * from pg_replication_slots" postgres
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 myslot    |        | physical  |        |          | f         | f      |            |      |              | 0/2000000   | 
(1 row)

(Please note that there is no more recovery.conf in PostgreSQL 12 so the recovery parameters have been added to postgresql.auto.conf)

The replication slot will not be dropped after pg_basebackup finished and you can use it to attach a new replica. But before doing that: As of PostgreSQL 12 you can copy the slot and then attach a second replica to the copied slot, so both replicas will start at the same position:

postgres@pgbox:/home/postgres/ [PGDEV] psql -X -c "select pg_copy_physical_replication_slot('myslot','myslot2')" postgres
 pg_copy_physical_replication_slot 
-----------------------------------
 (myslot2,)
(1 row)

postgres@pgbox:/home/postgres/ [PGDEV] psql -X -c "select * from pg_replication_slots" postgres
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 myslot    |        | physical  |        |          | f         | f      |            |      |              | 0/8000000   | 
 myslot2   |        | physical  |        |          | f         | f      |            |      |              | 0/8000000   | 
(2 rows)

As you can see both replication slots have the same value for “restart_lsn”. This will make it very easy to use the basebackup for the two replicas and start them from the same position:

postgres@pgbox:/home/postgres/ [PGDEV] mkdir -p /var/tmp/replica1
postgres@pgbox:/home/postgres/ [PGDEV] mkdir -p /var/tmp/replica2
postgres@pgbox:/home/postgres/ [PGDEV] cp -pr /var/tmp/basebackup/* /var/tmp/replica1/
postgres@pgbox:/home/postgres/ [PGDEV] cp -pr /var/tmp/basebackup/* /var/tmp/replica2/
postgres@pgbox:/home/postgres/ [PGDEV] sed -i 's/myslot/myslot2/g' /var/tmp/replica2/postgresql.auto.conf 
postgres@pgbox:/home/postgres/ [PGDEV] echo "port=8888" >> /var/tmp/replica1/postgresql.auto.conf 
postgres@pgbox:/home/postgres/ [PGDEV] echo "port=8889" >> /var/tmp/replica2/postgresql.auto.conf 
postgres@pgbox:/home/postgres/ [PGDEV] chmod o-rwx /var/tmp/replica1
postgres@pgbox:/home/postgres/ [PGDEV] chmod o-rwx /var/tmp/replica2

What happened here:

  • Restore the same basebackup to the new replica locations
  • Change the slot to use for the second replica to our copied slot name
  • Change the ports of both replicas because we are running on the same host
  • Fix the permissions so pg_ctl will not complain

That’s it. We can startup both replicas:

postgres@pgbox:/home/postgres/ [PGDEV] pg_ctl -D /var/tmp/replica1/ start
postgres@pgbox:/home/postgres/ [PGDEV] pg_ctl -D /var/tmp/replica2/ start
postgres@pgbox:/home/postgres/ [PGDEV] psql -X -p 8888 -c "select pg_is_in_recovery()" postgres
 pg_is_in_recovery 
-------------------
 t
(1 row)

postgres@pgbox:/home/postgres/ [PGDEV] psql -X -p 8889 -c "select pg_is_in_recovery()" postgres
 pg_is_in_recovery 
-------------------
 t
(1 row)

Quite easy and we can confirm that both replicas are at the same location as previously:

postgres@pgbox:/home/postgres/ [PGDEV] psql -X -c "select * from pg_replication_slots" postgres
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 myslot    |        | physical  |        |          | f         | t      |      15622 |      |              | 0/9000148   | 
 myslot2   |        | physical  |        |          | f         | t      |      15632 |      |              | 0/9000148   | 
(2 rows)

You can also copy logical replication slots, of course. Nice, thanks all involved.

Cet article PostgreSQL 12: Copying replication slots est apparu en premier sur Blog dbi services.

PostgreSQL 12: generated columns

Sat, 2019-04-06 04:17

PostgreSQL 12 will finally bring a feature other database systems already have for quite some time: Generated columns. What exactly is that and how does is look like in PostgreSQL? As usual, lets start with a simple test setup.

We begin with a simple table containing two columns:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'aaa');
INSERT 0 1
postgres=# select * from t1;
 a |  b  
---+-----
 1 | aaa
(1 row)

postgres=# 

A generated column is not a “real” column because it’s value is computed:

postgres=# alter table t1 add column c int generated always as (a*2) stored;
ALTER TABLE
postgres=# select * from t1;
 a |  b  | c 
---+-----+---
 1 | aaa | 2
 postgres=# \d t1
                              Table "public.t1"
 Column |  Type   | Collation | Nullable |              Default               
--------+---------+-----------+----------+------------------------------------
 a      | integer |           |          | 
 b      | text    |           |          | 
 c      | integer |           |          | generated always as (a * 2) stored

The keyword “stored” means that the column is stored on disk. In a future version there will probably also be a “virtual” keyword which instructs PostgreSQL not to store the data on disk but always compute it when it is read rather then written.

What you can see here as well is, that you can refer other columns of the same table for the computation of the generated column. But this is not a requirement:

postgres=# alter table t1 add column d int generated always as (3*2) stored;
ALTER TABLE
postgres=# \d t1
                              Table "public.t1"
 Column |  Type   | Collation | Nullable |              Default               
--------+---------+-----------+----------+------------------------------------
 a      | integer |           |          | 
 b      | text    |           |          | 
 c      | integer |           |          | generated always as (a * 2) stored
 d      | integer |           |          | generated always as (3 * 2) stored

Referencing columns of other tables is not possible and it is not possible to reference another generated columns:

postgres=# alter table t1 add column d int generated always as (c*2) stored;
ERROR:  cannot use generated column "c" in column generation expression
DETAIL:  A generated column cannot reference another generated column.

Directly updating such a column does of course not work as well:

postgres=# update t1 set d=5;
psql: ERROR:  column "d" can only be updated to DEFAULT
DETAIL:  Column "d" is a generated column.
postgres=# update t1 set d=default;
UPDATE 1

What will happen when we create a generated column that uses a volatile function?

postgres=# alter table t1 add column e int generated always as (random()) stored;
psql: ERROR:  generation expression is not immutable
postgres=# 

Does not work as well. Only immutable expressions can be used here. That would work:

postgres=# alter table t1 add column e text generated always as (md5(b)) stored;
ALTER TABLE
postgres=# \d t1
                               Table "public.t1"
 Column |  Type   | Collation | Nullable |               Default               
--------+---------+-----------+----------+-------------------------------------
 a      | integer |           |          | 
 b      | text    |           |          | 
 c      | integer |           |          | generated always as (a * 2) stored
 d      | integer |           |          | generated always as (3 * 2) stored
 e      | text    |           |          | generated always as (md5(b)) stored

Nice feature. Documentation is here

Cet article PostgreSQL 12: generated columns est apparu en premier sur Blog dbi services.

RCSI with foreign keys, NULL values and paramater sniffing behavior

Thu, 2019-04-04 08:58

In this blog post let’s go back to the roots (DBA concern) with a discussion with one of my friends about a weird transaction locking issue. In fact, this discussion was specifically around two questions. The first one was why SQL Server continues to use shared locks in RCSI mode leading to blocking scenarios and the second one was about compiled objects with weird NULL value parameter sniffing behavior. This discussion was very funny for me because it included very interesting topics that we had to go through to figure out what happened in his case and I think this was enough funny to share it with you.

Let’s set the context: 2 tables (dbo.t1 and dbo.t2) in a parent-child relationship with a foreign key that allows NULL values. Transactions performed against these tables are performed in RCSI mode.

USE master;
GO

CREATE DATABASE test;
GO

-- Change default transaction isolation level to RCSI
ALTER DATABASE test SET READ_COMMITTED_SNAPSHOT ON;
GO

USE test;
GO

CREATE TABLE dbo.t1 
(
	id INT NOT NULL PRIMARY KEY,
	col1 CHAR(2) NOT NULL
);

-- Create table t2 with FK (id_parent) that references primary key on t1 (id)
CREATE TABLE dbo.t2 
(
	id INT NOT NULL PRIMARY KEY,
	id_parent INT NULL FOREIGN KEY REFERENCES dbo.t1 (id),
	col1 CHAR(2) NOT NULL
);
GO

-- Insert values in parent table t1 
INSERT INTO dbo.t1 VALUES (1, 'TT');

 

Let’s insert 2 rows in the child table dbo.t2 in different scenarios.

The first one concerns insertion to the dbo.t2 table with a non-empty value in the FK column. The second one concerns insertion to the same table and same FK column with an empty / NULL value:

-- Insert values in child table t2 (non NULL value in FK column) 
INSERT INTO dbo.t2 VALUES (1, 1, 'TT');

-- Insert values in child table t2 (non NULL value in FK column) 
INSERT INTO dbo.t2 VALUES (2, NULL, 'TT');

 

And here their respective execution plans:

  • Insert into dbo.t2 with a non-empty value in the FK column

In this first scenario, insert is performed by checking first any existing reference in the parent table (dbo.t1). This action is materialized by the clustered index seek operator and Nested Loop in the execution plan.

  • Insert into dbo.t2 with a NULL value in the FK column

In the second scenario, there is no need to check values in the dbo.t1 parent table due to the empty value in the FK column.

In both cases, this is an expected behavior. But let’s now consider locks that are supposed to be taken in this first scenario. 2 different structures must be accessed (and locked) in different modes with an X mode lock to access and update the clustered index of the dbo.t2 table. But what about the dbo.t1 table here? The cluster index structure must be accessed as part of the FK validation. As we are running in RCSI we may suppose in a first place no shared lock (S lock) should be held by the lock manager.  

Let’s configure an extended event to track locks acquired in this specific scenario:

USE test;
GO

SELECT DB_ID('test')
SELECT OBJECT_ID('dbo.t1') -- 1205579333
SELECT OBJECT_ID('dbo.t2') -- 1237579447

DROP EVENT SESSION [locks_check] ON SERVER 

CREATE EVENT SESSION [locks_check] 
ON SERVER 
ADD EVENT sqlserver.lock_acquired
(
	SET collect_resource_description=(1)
    WHERE 
	(
		[package0].[equal_uint64]([database_id],(10)) 
		AND [package0].[not_equal_boolean]([sqlserver].[is_system],(1)) 
		AND ([package0].[greater_than_equal_uint64]([associated_object_id],(1205579333)) 
		     OR [package0].[greater_than_equal_uint64]([associated_object_id],(1237579447))
			) 
		AND [sqlserver].[session_id]=(54)
	)
)
ADD TARGET package0.ring_buffer(SET max_memory=(65536))
WITH 
(
	MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
	MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,
	MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF
)
GO

ALTER EVENT SESSION [locks_check] 
ON SERVER STATE = START;
GO

 

Here the XE output generated for the first scenario:

;WITH target_data_xml
AS
(
	SELECT 
		CAST(t.target_data AS XML) AS target_data
	FROM sys.dm_xe_session_targets AS t
	JOIN sys.dm_xe_sessions AS s ON t.event_session_address = s.address
	WHERE s.name = 'locks_check'
),
target_data_output
AS
(
	SELECT 
		DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), T.X.value('(./@timestamp)', 'DATETIME')) AS [timestamp],
		T.X.value('(./data[@name="resource_type"]/text)[1]', 'sysname') AS lock_resource_type,
		T.X.value('(./data[@name="mode"]/text)[1]', 'sysname') AS lock_mode,
		T.X.value('(./data[@name="resource_0"]/value)[1]', 'INT') AS resource_0,
		T.X.value('(./data[@name="object_id"]/value)[1]', 'BIGINT') AS [object_id],
		T.X.value('(./data[@name="associated_object_id"]/value)[1]', 'BIGINT') AS [associated_object_id],
		T.X.value('(./data[@name="resource_description"]/value)[1]', 'sysname') AS [resource_description]
	FROM target_data_xml AS x
	CROSS APPLY target_data.nodes('//event') AS T(X)
)
SELECT 
	t.timestamp,
	t.lock_resource_type,
	t.lock_mode,
	CASE t.lock_resource_type
		WHEN 'OBJECT' THEN OBJECT_NAME(t.associated_object_id)
		ELSE (SELECT OBJECT_NAME(p.object_ID) FROM sys.partitions AS p WHERE hobt_id = t.associated_object_id)
	END AS [object_name],
	t.resource_description
FROM target_data_output AS t
GO

 

Well, not really what we might expect in this case because a shared lock (S) was taken on the parent table (dbo.t1) despite being in RCSI mode. In fact, this behavior is expected because for 2 tables in an FK relationship SQL Server switches automatically to locking read committed (shared lock) to ensure constraints are not violated by relying on eventual stale versioned reads. In other words, you may expect to face some potential blocking issues if there exist other sessions that try to access concurrently to the parent table for modification while running in RCSI mode.

For the second scenario there is no ambiguity because only the clustered index of the dbo.t2 table is accessed to insert data accordingly to what we saw in related execution plan above:

So now let’s go through to the second weird issue (NULL value is not “sniffed” correctly by the stored procedure) and let’s wrap the ad-hoc query into a stored procedure as follows:

CREATE PROCEDURE dbo.pr_test
(
	@id_parent INT = NULL
)
AS

INSERT INTO dbo.t2 VALUES (ABS(CAST(CAST(CHECKSUM(NEWID()) AS bigint) / 1000 % 2100000000 AS int)), @id_parent, 'TT');
GO

 

Let’s execute the procedure without specifying a parameter. In this case a NULL value will be inserted I the FK column of dbo.t2 table.

EXEC dbo.pr_test;

 

The corresponding execution plan:

First of all, the plan differs from what we’ve seen in the previous example with the ad-hoc query. The plan was compiled with a NULL value and we still continue to see operators related to the FK constraint check. At the first glance this plan shape was more related to the first scenario where we inserted a non-empty value in the FK column. This is not obvious but we may notice some differences compared to the first scenario.  With SQL Sentry plan explorer (v3 build 18.4.0.0) the relevant information is not displayed when you highlight the nested loop operator compared to SSMS execution plan but you may rely on the plan tree section by the adding Pass Thru column to retrieve the same information.

So, the question here is why SQL Server behaves different in this case? Well, when using a variable or parameter SQL Server needs to build a plan shape that will work correctly if reused for different values. However, the semi join operator has a pass-through predicate that skips the lookup if the runtime value is NULL (we may notice easily with SQL Sentry plan explorer that the lookup part of the plan is not used at the runtime in this case). With a constant NULL value (ad-hoc query scenario) the game changes because the optimizer is able to simplifiy the query and removes the join accordingly. In a nutshell, this an expected behavior by design and really related to a parameter sniffing issue. Thanks to  @SQL_Kiwi to help clarifying this last point and thanks to my friend for this funny troubleshooting game.

See you!

 

 

Cet article RCSI with foreign keys, NULL values and paramater sniffing behavior est apparu en premier sur Blog dbi services.

PostgreSQL 12: More progress reporting

Thu, 2019-04-04 03:51

PostgreSQL 9.6 introduced a new view called pg_stat_progress_vacuum which gives information about currently running vacuum processes. That was a great addition because since then you could easily estimate on how long a specific vacuum process will need to complete. PostgreSQL 12 will use the same infrastructure to extend that to more operations.

The first operation that can now be tracked is cluster. For that a new view is available which gives the following information:

postgres=# \d pg_stat_progress_cluster
           View "pg_catalog.pg_stat_progress_cluster"
       Column        |  Type   | Collation | Nullable | Default 
---------------------+---------+-----------+----------+---------
 pid                 | integer |           |          | 
 datid               | oid     |           |          | 
 datname             | name    |           |          | 
 relid               | oid     |           |          | 
 command             | text    |           |          | 
 phase               | text    |           |          | 
 cluster_index_relid | bigint  |           |          | 
 heap_tuples_scanned | bigint  |           |          | 
 heap_tuples_written | bigint  |           |          | 
 heap_blks_total     | bigint  |           |          | 
 heap_blks_scanned   | bigint  |           |          | 
 index_rebuild_count | bigint  |           |          | 

As always, lets generate a sample table and some indexes so we have something cluster can work on:

postgres=# create table t1 as select a,md5(a::text) as txt, now() as date from generate_series(1,3000000) a;
SELECT 3000000
postgres=# create index i1 on t1(a);
CREATE INDEX
postgres=# create index i2 on t1(txt);
CREATE INDEX
postgres=# create index i3 on t1(date);
CREATE INDEX
postgres=# \d t1
                         Table "public.t1"
 Column |           Type           | Collation | Nullable | Default 
--------+--------------------------+-----------+----------+---------
 a      | integer                  |           |          | 
 txt    | text                     |           |          | 
 date   | timestamp with time zone |           |          | 
Indexes:
    "i1" btree (a)
    "i2" btree (txt)
    "i3" btree (date)

Once we cluster that table we should see the progress in pg_stat_progress_cluster, so in the first session:

postgres=# cluster verbose t1 using i1;
psql: INFO:  clustering "public.t1" using index scan on "i1"
psql: INFO:  "t1": found 0 removable, 3000000 nonremovable row versions in 28038 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.82 s, system: 0.55 s, elapsed: 1.87 s.
CLUSTER
postgres=# 

… and in a second session:

postgres=# select * from pg_stat_progress_cluster;
 pid  | datid | datname  | relid | command |      phase       | cluster_index_relid | heap_tuples_scanned | heap_tuples_written | heap_blks_total | heap_blks_scanned | index_rebuild_count 
------+-------+----------+-------+---------+------------------+---------------------+---------------------+---------------------+-----------------+-------------------+---------------------
 1669 | 13586 | postgres | 16384 | CLUSTER | rebuilding index |               16390 |             3000000 |             3000000 |               0 |                 0 |                   2
(1 row)

Nice. And the same is now available when indexes get created:

postgres=# \d pg_stat_progress_create_index 
        View "pg_catalog.pg_stat_progress_create_index"
       Column       |  Type   | Collation | Nullable | Default 
--------------------+---------+-----------+----------+---------
 pid                | integer |           |          | 
 datid              | oid     |           |          | 
 datname            | name    |           |          | 
 relid              | oid     |           |          | 
 phase              | text    |           |          | 
 lockers_total      | bigint  |           |          | 
 lockers_done       | bigint  |           |          | 
 current_locker_pid | bigint  |           |          | 
 blocks_total       | bigint  |           |          | 
 blocks_done        | bigint  |           |          | 
 tuples_total       | bigint  |           |          | 
 tuples_done        | bigint  |           |          | 
 partitions_total   | bigint  |           |          | 
 partitions_done    | bigint  |           |          | 

Cet article PostgreSQL 12: More progress reporting est apparu en premier sur Blog dbi services.

Second Meetup SOUG / DOAG /AOUG Region Lake of Constance

Wed, 2019-04-03 07:37

Tuesday last week I attended the second meeting of SOUG region lake of constance which took place at Robotron Switzerland in Wil SG. Eleven people attended this event, Georg Russ hold a quite interesting presentation about “Hacked Biometric Data”.
He told about how to fake finger prints, iris images, face ids and patterns of hand venes.
After this presentation a general discussion about security and other Oracle related themes took place.

It was a good event and I am looking forward to attend the next one.

Further information can be found at https://www.meetup.com/OracleBeerRegioBodensee/events/258060976/

Cet article Second Meetup SOUG / DOAG /AOUG Region Lake of Constance est apparu en premier sur Blog dbi services.

Cause for looping sssd

Wed, 2019-04-03 00:49

In RedHat Enterprise Linux 7, the sssd daemons can connect to active directory servers. Default behaviour is to update DNS entries dynamically.
If a statical DNS entry already exists this can lead to a CPU consuming sssd_nss daemon.
To prevent this behaviour, the dynamic DNS updates should be switched off with this setting in every doman section of config file /etc/sssd/sssd.conf:


dyndns_update = False

After that sssd should be restarted to take this change effect.

Cet article Cause for looping sssd est apparu en premier sur Blog dbi services.

The EDB filter log extension

Mon, 2019-04-01 13:20

This is another post dedicated to EnterpriseDB Postgres. Sometimes you may want to get specific messages not getting logged to the server’s logfile or audit records. That might be specific error codes or, even more important, passwords you specify when you create users. EDB comes with a solution for that by providing an extension which is called EDB Filter Log. Lets see how you can install, and even more important, how to use that extension.

The first thing I usually do when I want to check what extensions are available is looking at . I was quite surprised that this extension is not listed there:

edb=# select * from pg_available_extensions where name like '%filter%';
 name | default_version | installed_version | comment 
------+-----------------+-------------------+---------
(0 rows)

Anyway you can load it by adjusting the shared_preload_libraries parameter:

edb=# show shared_preload_libraries;
             shared_preload_libraries              
---------------------------------------------------
 $libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq
(1 row)
edb=# alter system set shared_preload_libraries='$libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq,$libdir/edb_filter_log';
ALTER SYSTEM
edb=# \q
enterprisedb@edb1:/var/lib/edb/ [pg1] pg_ctl -D $PGDATA restart -m fast
enterprisedb@edb1:/var/lib/edb/ [pg1] psql edb
psql.bin (11.2.9)
Type "help" for help.


edb=# show shared_preload_libraries ;
                         shared_preload_libraries                         
--------------------------------------------------------------------------
 $libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq,$libdir/edb_filter_log
(1 row)

But even then the extension does not show up in pg_available_extensions:

edb=# select * from pg_available_extensions where name like '%filter%';
 name | default_version | installed_version | comment 
------+-----------------+-------------------+---------
(0 rows)

Lets assume you do not want violations on unique constraints to get logged in the server’s logfile. Usually you get this in the log file once a constraint is violated:

edb=# create table t1 ( a int );
CREATE TABLE
edb=# create unique index i1 on t1(a);
CREATE INDEX
edb=# insert into t1 values(1);
INSERT 0 1
edb=# insert into t1 values(1);
ERROR:  duplicate key value violates unique constraint "i1"
DETAIL:  Key (a)=(1) already exists.
edb=# select pg_current_logfile();
      pg_current_logfile       
-------------------------------
 log/edb-2019-03-24_162021.log
(1 row)
edb=# \! tail -20 $PGDATA/log/edb-2019-03-24_162021.log
...
2019-03-24 16:35:32 CET ERROR:  duplicate key value violates unique constraint "i1"
2019-03-24 16:35:32 CET DETAIL:  Key (a)=(1) already exists.
...

Using the extension you can do it like this (23505 is the SQLSTATE for unique constraint violations):

edb=# show edb_filter_log.errcode;
 edb_filter_log.errcode 
------------------------
 
(1 row)

edb=# alter system set edb_filter_log.errcode='23505';
ALTER SYSTEM
edb=# select context from pg_settings where name = 'edb_filter_log.errcode';
 context 
---------
 user
(1 row)
edb=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

edb=# show edb_filter_log.errcode;
 edb_filter_log.errcode 
------------------------
 23505
(1 row)

edb=# insert into t1 values(1);
ERROR:  duplicate key value violates unique constraint "i1"
DETAIL:  Key (a)=(1) already exists.
edb=# \! tail -20 $PGDATA/log/edb-2019-03-24_162021.log
...
2019-03-24 16:39:05 CET LOG:  received SIGHUP, reloading configuration files
2019-03-24 16:39:05 CET LOG:  parameter "edb_filter_log.errcode" changed to "23505"
edb=# 

This specific error is not any more reported in the logfile. Of course can use multiple codes for edb_filter_log.errcode by separating them with a comma. The complete list of codes is documented here.

This is for suppressing messages in the log file. What about passwords? Imagine you are logging all statements:

edb=# alter system set log_statement='all';
ALTER SYSTEM
edb=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

edb=# show log_statement;
 log_statement 
---------------
 all
(1 row)

In this configuration this will be captured as well and you will find the password in the logfile:

edb=# create user u1 with login password 'password';
CREATE ROLE
edb=# select pg_current_logfile();
      pg_current_logfile       
-------------------------------
 log/edb-2019-03-24_162021.log
edb=# \! tail -20 $PGDATA/log/edb-2019-03-24_162021.log | grep password

2019-03-24 16:46:59 CET LOG:  statement: create user u1 with login password 'password';

This is what you usually do not want to see there and exactly this is what “edb_filter_log.redact_password_commands” is for:

edb=# alter system set edb_filter_log.redact_password_commands = true;
ALTER SYSTEM
edb=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

edb=# show edb_filter_log.redact_password_commands;
 edb_filter_log.redact_password_commands 
-----------------------------------------
 on
(1 row)

When this is set to on the plain text password will not be anymore written to the log file when you create or alter users:

edb=# \! tail -20 $PGDATA/log/edb-2019-03-24_162021.log | grep secret
2019-03-24 16:51:19 CET STATEMENT:  create user u2 login with password 'secret';
2019-03-24 16:51:28 CET LOG:  statement: create user u2 with login password 'secret';

… and it is still there. A restart is required for that becoming active?:

enterprisedb@edb1:/var/lib/edb/as11/data/ [pg1] pg_ctl -D $PGDATA restart -m fast
enterprisedb@edb1:/var/lib/edb/as11/data/ [pg1] psql -X edb
psql.bin (11.2.9)
Type "help" for help.

edb=# create user u3 with login password 'topsecret';
CREATE ROLE
edb=# select pg_current_logfile();
      pg_current_logfile       
-------------------------------
 log/edb-2019-03-24_165229.log
(1 row)

edb=# \! tail -20 $PGDATA/log/edb-2019-03-24_165229.log | grep topsecret
2019-03-24 16:54:22 CET LOG:  statement: create user u3 with login password 'topsecret';

And we do still see it in the log file, why that? The issue is with the syntax. Consider this:

edb=# create user u4 with login password 'text';
CREATE ROLE
edb=# create user u5 login password 'text2';
CREATE ROLE
edb=# create user u6 password 'text3';
CREATE ROLE
edb=# 

Only the last command will replace the password in the log file:

2019-03-24 17:03:31 CET LOG:  statement: create user u4 with login password 'text';
2019-03-24 17:03:45 CET LOG:  statement: create user u5 login password 'text2';
2019-03-24 17:04:12 CET LOG:  statement: create user u6 password 'x';

You have to follow exactly this syntax:

{CREATE|ALTER} {USER|ROLE|GROUP} identifier { [WITH] [ENCRYPTED]
PASSWORD 'nonempty_string_literal' | IDENTIFIED BY {
'nonempty_string_literal' | bareword } } [ REPLACE {
'nonempty_string_literal' | bareword } ]

…otherwise it will not work.

Cet article The EDB filter log extension est apparu en premier sur Blog dbi services.

Auditing with EDB Postgres Enterprise

Sun, 2019-03-31 14:07

It might be that there is a requirement to audit operations in the database. Maybe because of legal requirements, maybe because of security requirements or whatever. I’ve already written a post in the past describing what you can do in community PostgreSQL, this post is specific to EDB Postgres. The auditting features come be default in EDB Postgres and you do not need to install any extension such as pgaudit.

I am using EDB Postgres Enterprise version 11.2 for this post but it should work the same in previous versions:

enterprisedb@edb1:/var/lib/edb/ [pg1] psql -X postgres
psql.bin (11.2.9)
Type "help" for help.

postgres=# select version();
                                                                    version                                                                    
-----------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2 (EnterpriseDB Advanced Server 11.2.9) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)

The parameter which controls if auditing is enabled or not is “edb_audit”:

postgres=# show edb_audit;
 edb_audit 
-----------
 
(1 row)

postgres=# 

When it is not set (the default) auditing is enabled. You have two options to enable it:

  • csv: Enabled autiting and will write the audit records to a csv file
  • xml: Enabled autiting and will write the audit records to a csv file

Before enabling auditing you should think about where you want to store the audit files. It should be a location only the operating system user which runs EDB Postgres should have access to. You might think of $PGDATA but do you really want to have all the audit files included in every base backup you will be doing in the future? A better location is outside $PGDATA so you can keep the audit files separated. Lets go with “/var/lib/edb/audit” for the scope of this post:

postgres=# \! mkdir /var/lib/edb/audit
postgres=# \! chmod 700 /var/lib/edb/audit
postgres=# alter system set edb_audit_directory = '/var/lib/edb/audit';
ALTER SYSTEM
postgres=# alter system set edb_audit='csv';
ALTER SYSTEM
postgres=# select name,context from pg_settings where name in ('edb_audit_directory','edb_audit');
        name         | context 
---------------------+---------
 edb_audit           | sighup
 edb_audit_directory | sighup
(2 rows)

Both parameter changes can be made active by reloading the server, a restart is not required:

postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

The default file name that will be used for the audit logs is:

postgres=# show edb_audit_filename;
 edb_audit_filename  
---------------------
 audit-%Y%m%d_%H%M%S
(1 row)

Lets keep that as it is which is sufficient for the scope of this post. Now you need to think about what you want to audit. There are several options available:

  • edb_audit_connect: Logs all connections to the instance, either successful, failed or all
  • edb_audit_disconnect: The opposite of edb_audit_connect, logs all disconnections/li>
  • edb_audit_statement: Here you have several options to log SQL statements such as insert,truncate, whatever, more on that later
  • edb_audit_tag: When set, adds a string value to all audit log files

We start with logging connections and disconnections. When we set edb_audit_connect to all, we should see all connections to a database, no matter if successful or failed:

postgres=# alter system set edb_audit_connect = 'all';
ALTER SYSTEM
postgres=# select context from pg_settings where name = 'edb_audit_connect';
 context 
---------
 sighup
(1 row)
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

From now on we should have the audit information in the log file for every successful connection and every connection attempt that failed. Is it true?

postgres=# \! psql edb
psql.bin (11.2.9)
Type "help" for help.

[local]:5444 enterprisedb@edb=# \q
postgres=# \! psql -U dummy edb
psql.bin: FATAL:  role "dummy" does not exist
postgres=# 

That should have produced two lines in the latest audit file:

enterprisedb@edb1:/var/lib/edb/audit/ [pg1] pwd
/var/lib/edb/audit
enterprisedb@edb1:/var/lib/edb/audit/ [pg1] ls -latr
total 8
drwx------. 5 enterprisedb enterprisedb 183 Mar 24 14:24 ..
-rw-------. 1 enterprisedb enterprisedb 611 Mar 24 14:38 audit-20190324_143640.csv
drwx------. 2 enterprisedb enterprisedb  72 Mar 24 14:38 .
-rw-------. 1 enterprisedb enterprisedb 412 Mar 24 14:41 audit-20190324_143805.csv
enterprisedb@edb1:/var/lib/edb/audit/ [pg1] cat audit-20190324_143805.csv
2019-03-24 14:40:54.683 CET,"enterprisedb","edb",1534,"[local]",5c9788e6.5fe,1,"authentication",2019-03-24 14:40:54 CET,5/133,0,AUDIT,00000,"connection authorized: user=enterprisedb database=edb",,,,,,,,,"","",""
2019-03-24 14:41:16.617 CET,"dummy","edb",1563,"[local]",5c9788fc.61b,1,"authentication",2019-03-24 14:41:16 CET,5/136,0,AUDIT,00000,"connection authorized: user=dummy database=edb",,,,,,,,,"","",""

a
As expected, we can see the successful connection request and in addition the one that failed. When we want to log disconnections as well, we can do so:

postgres=# alter system set edb_audit_disconnect = 'all';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)
postgres=# \! psql edb
psql.bin (11.2.9)
Type "help" for help.

[local]:5444 enterprisedb@edb=# \q
postgres=# 

In the same audit file as before:

2019-03-24 14:47:42.447 CET,"enterprisedb","edb",1929,"[local]",5c978a7a.789,2,"idle",2019-03-24 14:47:38 CET,,0,AUDIT,00000,"disconnection: session time: 0:00:03.708 user=enterprisedb database=edb host=[local]",,,,,,,,,"psql.bin","",""

The duration of the session is logged as well. So far for the basic auditing features. Logging connections and disconnections is a good start but probably not enough. You might soon come to a point where you want to have more information, such as what the user was doing exactly in the database. This is where “edb_audit_statement” comes into the game. You can set it to something simple like “all inserts” or “all updates”:

postgres=# alter system set edb_audit_statement = 'insert';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

postgres=# create table t1 ( a int );
CREATE TABLE
postgres=# insert into t1 values(1);
INSERT 0 1

Looking at the audit file:

2019-03-24 14:55:36.744 CET,,,9004,,5c977540.232c,3,,2019-03-24 13:17:04 CET,,0,LOG,00000,"received SIGHUP, reloading configuration files",,,,,,,,,"","",""
2019-03-24 14:55:53.460 CET,"enterprisedb","postgres",31899,"[local]",5c97827a.7c9b,6,"idle",2019-03-24 14:13:30 CET,4/477,0,AUDIT,00000,"statement: insert into t1 values(1);",,,,,,,,,"psql.bin","INSERT",""

The insert is logged. You may also spot a potential issue here: Depending on how the statement is written the actual values (1 in this case) is written to the log. This might open another security hole if the audit files are not handled with care. You can not prevent that using prepared statements and in fact the “prepare” part is logged as well:

postgres=# prepare stmt as insert into t1 values($1);
PREPARE
postgres=# execute stmt(2);
INSERT 0 1
postgres=# select * from t1;
 a 
---
 1
 2
(2 rows)

The entries in the audit log:

2019-03-24 14:58:50.395 CET,"enterprisedb","postgres",31899,"[local]",5c97827a.7c9b,7,"idle",2019-03-24 14:13:30 CET,4/478,0,AUDIT,00000,"statement: prepare stmt as insert into t1 values($1);",,,,,,,,,"psql.bin","PREPARE",""
2019-03-24 14:59:02.952 CET,"enterprisedb","postgres",31899,"[local]",5c97827a.7c9b,8,"idle",2019-03-24 14:13:30 CET,4/479,0,AUDIT,00000,"statement: execute stmt(2);","prepare: prepare stmt as insert into t1 values($1);",,,,,,,,"psql.bin","EXECUTE",""

Although we only asked to log “inserts”, the prepare and execute statements are logged as well. If we prepare an update it is not logged (what is correct):

postgres=# prepare stmt2 as update t1 set a = $1;
PREPARE
postgres=# execute stmt2(2);
UPDATE 5

The last line in the audit file is still this one:

2019-03-24 15:02:33.502 CET,"enterprisedb","postgres",31899,"[local]",5c97827a.7c9b,9,"idle",2019-03-24 14:13:30 CET,4/487,0,AUDIT,00000,"statement: execute stmt(5);","prepare: prepare stmt as insert into t1 values($1);",,,,,,,,"psql.bin","EXECUTE",""

The power of edb_audit_statement comes when you want to audit multiple kinds of statements but do not want to set it to “all” (this would log all the statements):

postgres=# alter system set edb_audit_statement='insert,update,delete,create table,drop view';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

This should log all inserts, updates and deletes and in addition every create table or drop view:

postgres=# create table t2 ( a int );
CREATE TABLE
postgres=# insert into t2 values(1);
INSERT 0 1
postgres=# update t2 set a = 2;
UPDATE 1
postgres=# delete from t2 where a = 2;
DELETE 1
postgres=# truncate t2;
TRUNCATE TABLE
postgres=# create view v1 as select * from t2;
CREATE VIEW
postgres=# drop view v1;
DROP VIEW

We should see entries for the insert, the update and the delete, but not for the truncate. The drop view should be logged as well:

2019-03-24 15:08:46.245 CET,"enterprisedb","postgres",31899,"[local]",5c97827a.7c9b,10,"idle",2019-03-24 14:13:30 CET,4/496,0,AUDIT,00000,"statement: create table t2 ( a int );",,,,,,,,,"psql.bin","CREATE TABLE",""
2019-03-24 15:08:59.713 CET,"enterprisedb","postgres",31899,"[local]",5c97827a.7c9b,12,"idle",2019-03-24 14:13:30 CET,4/498,0,AUDIT,00000,"statement: insert into t2 values(1);",,,,,,,,,"psql.bin","INSERT",""
2019-03-24 15:09:21.299 CET,"enterprisedb","postgres",31899,"[local]",5c97827a.7c9b,13,"idle",2019-03-24 14:13:30 CET,4/499,0,AUDIT,00000,"statement: update t2 set a = 2;",,,,,,,,,"psql.bin","UPDATE",""
2019-03-24 15:09:29.614 CET,"enterprisedb","postgres",31899,"[local]",5c97827a.7c9b,14,"idle",2019-03-24 14:13:30 CET,4/500,0,AUDIT,00000,"statement: delete from t2 where a = 2;",,,,,,,,,"psql.bin","DELETE",""
2019-03-24 15:12:51.652 CET,"enterprisedb","postgres",31899,"[local]",5c97827a.7c9b,15,"idle",2019-03-24 14:13:30 CET,4/503,0,AUDIT,00000,"statement: drop view v1;",,,,,,,,,"psql.bin","DROP VIEW",""

Fine. Using edb_audit_statement we have control of what exactly we want to log. What we did now was valid for the whole instance, can we modify auditing to a specific role? Yes, this is possible:

edb=# alter user enterprisedb set edb_audit_statement = 'truncate';
ALTER ROLE
edb=# create role test;
CREATE ROLE
edb=# alter role test set edb_audit_statement = 'truncate';
ALTER ROLE

The same is true on the database level:

edb=# alter database edb set edb_audit_statement = 'truncate';
ALTER DATABASE

Lets do a small test and create user and then set edb_audit_statement on the user level, and reset it on the instance level:

edb=# create user u1 with login password 'u1';
CREATE ROLE
edb=# alter user u1 set edb_audit_statement = 'create table';
ALTER ROLE
edb=# alter system set edb_audit_statement = 'none';
ALTER SYSTEM
edb=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

Create table statements from that user should now be logged:

edb=# \c edb u1
You are now connected to database "edb" as user "u1".
edb=> create table t1 ( a int );
CREATE TABLE

The statement is indeed logged:

2019-03-24 15:44:19.793 CET,"u1","edb",6243,"[local]",5c9797b7.1863,1,"idle",2019-03-24 15:44:07 CET,5/30177,0,AUDIT,00000,"statement: create table t1 ( a int );",,,,,,,,,"psql.bin","CREATE TABLE",""

Does the same work for a role?

edb=> \c edb enterprisedb
You are now connected to database "edb" as user "enterprisedb".
edb=# create role r1;
CREATE ROLE
edb=# alter role r1 set edb_audit_statement = 'drop table';
ALTER ROLE
edb=# grant r1 to u1;
GRANT ROLE
edb=# \c edb u1
You are now connected to database "edb" as user "u1".
edb=> drop table t1;
DROP TABLE
edb=> 

No, in this case the drop statement is not logged. You can set the parameter for a role, but is does not have any effect.

The last test for today: What happens when the directory we configured for the audit files is removed?

enterprisedb@edb1:/var/lib/edb/ [pg1] pwd
/var/lib/edb
enterprisedb@edb1:/var/lib/edb/ [pg1] ls -l
total 0
drwx------. 4 enterprisedb enterprisedb 51 Mar 24 13:16 as11
drwx------. 2 enterprisedb enterprisedb 72 Mar 24 14:38 audit
drwxrwxr-x. 3 enterprisedb enterprisedb 17 Mar 24 13:09 local
enterprisedb@edb1:/var/lib/edb/ [pg1] mv audit/ audit_org
enterprisedb@edb1:/var/lib/edb/ [pg1] ls -l
total 0
drwx------. 4 enterprisedb enterprisedb 51 Mar 24 13:16 as11
drwx------. 2 enterprisedb enterprisedb 72 Mar 24 14:38 audit_org
drwxrwxr-x. 3 enterprisedb enterprisedb 17 Mar 24 13:09 local

These two inserts should generate audit records:

edb=> create table t2 ( a int );
CREATE TABLE
edb=> create table t3 ( a int );
CREATE TABLE
edb=> 

Nothing happens, not even a log entry in the server log file. I would have at least expected to get a warning that the directory does not exist. Lets restart the instance:

enterprisedb@edb1:/var/lib/edb/as11/data/log/ [pg1] pg_ctl -D /var/lib/edb/as11/data/ restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....2019-03-24 15:51:59 CET LOG:  listening on IPv4 address "0.0.0.0", port 5444
2019-03-24 15:51:59 CET LOG:  listening on IPv6 address "::", port 5444
2019-03-24 15:51:59 CET LOG:  listening on Unix socket "/tmp/.s.PGSQL.5444"
2019-03-24 15:51:59 CET LOG:  redirecting log output to logging collector process
2019-03-24 15:51:59 CET HINT:  Future log output will appear in directory "log".
 done
server started

And again: Nothing. But the audit directory is being recreated once the server starts:

enterprisedb@edb1:/var/lib/edb/ [pg1] ls -l
total 0
drwx------. 4 enterprisedb enterprisedb 51 Mar 24 13:16 as11
drwx------. 2 enterprisedb enterprisedb 39 Mar 24 15:51 audit
drwx------. 2 enterprisedb enterprisedb 72 Mar 24 14:38 audit_org
drwxrwxr-x. 3 enterprisedb enterprisedb 17 Mar 24 13:09 local

Changing the permissions so that the enterprisedb user can not write anymore to that directory will prevent the server from restarting:

enterprisedb@edb1:/var/lib/edb/ [pg1] sudo chown root:root audit
enterprisedb@edb1:/var/lib/edb/ [pg1] ls -l
total 0
drwx------. 4 enterprisedb enterprisedb 51 Mar 24 13:16 as11
drwx------. 2 root         root         39 Mar 24 15:51 audit
drwx------. 2 enterprisedb enterprisedb 72 Mar 24 14:38 audit_org
drwxrwxr-x. 3 enterprisedb enterprisedb 17 Mar 24 13:09 local
enterprisedb@edb1:/var/lib/edb/ [pg1] pg_ctl -D /var/lib/edb/as11/data/ restart -m fast
waiting for server to shut down.... done
server stopped
waiting for server to start....2019-03-24 15:55:44 CET LOG:  listening on IPv4 address "0.0.0.0", port 5444
2019-03-24 15:55:44 CET LOG:  listening on IPv6 address "::", port 5444
2019-03-24 15:55:44 CET LOG:  listening on Unix socket "/tmp/.s.PGSQL.5444"
2019-03-24 15:55:44 CET FATAL:  could not open log file "/var/lib/edb/audit/audit-20190324_155544.csv": Permission denied
2019-03-24 15:55:44 CET LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

Hope that helps…

Cet article Auditing with EDB Postgres Enterprise est apparu en premier sur Blog dbi services.

Windocks, SQL Server and Azure in our internal CI pipeline

Sun, 2019-03-31 12:50

During the last DevOps Day in Geneva, I presented a sample of our CI implementation related to our MSSQL DMK maintenance product. It was definitely a very good experience for me and good opportunity to get feedback from DevOps community as well.

During the session I explained our CI pipeline includes SQL Server containers both on AKS (K8s as managed service in Azure) and Windocks. I got questions from some of attendees who asked me why we are using Windocks as container solution for SQL Server on the Windows side in our specific context? As promised here some explanations in this blog post but let’s provide now the quick answer: we are using Windocks to address challenges that exist with SQL Server containers on Windows side. The long answer will follow but let’s set first the context with a high-level overview of our continuous integration pipeline architecture:

We are using a hybrid scenario where tools for development (SSDT and GitLab) are located in an on-premises dbi internal infrastructure whereas the CI pipeline runs entirely on Azure. The pipeline breaks down into two main areas including CI testing performed on different SQL Server containers that run on Windows through Windocks as well as Microsoft SQL Server containers that run on Linux and AKS. The AKS (K8s managed service in Azure) hosts SQL Server availability groups (in a beta release) and Windocks  (surrounded in green in the above picture) is also part of this Azure architecture in IaaS mode within an Azure virtual machine Standard D4s v3 (4 vCPUs, 16 GB memory and 512GB of disk space). As an aside, we choose this machine size because nested virtualization is required by Windocks and cloned database feature that uses Hyper-V differencing disk capabilities in the background.

 

  • Maintaining docker SQL Server images on Windows may be cumbersome

The DMK maintenance tool performs database maintenance tasks, basically the tasks you may find usually on SQL Server environments including database backups, database check integrity and maintenance of indexes and statistics as well. We obviously brought our added value and best practices in the tool and we provided to our customers they want to use it. The main challenge here consists in supporting a wide range of versions from 2008R2 to 2017 versions at the moment of this write-up (both on Windows and Linux obviously) and most of issues encountered with Docker images came from SQL Server docker images on Windows. First, if you refer to the Docker Hub (and new Microsoft Container Registry), there are no real official images for SQL Server versions prior 2016. Thus, maintaining such images is at your own responsibility and risk and we were not confident to go this way. However, I kept motivated and I decided to perform further tests to check the feasibility with Docker images. However, I quickly figured out that the going through a Docker native based solution will lead to some boring challenges. Indeed, having no official images from Microsoft for older versions of SQL Server, I had to build mine but I was disappointed by the image size that was too large compared to those we may found officially for Linux – more than 10GB for a SQL Server docker image on Windows versus ~ 1.4GB on Linux.

SQL Server Docker image size on Windows after building the custom image

The total size includes the SQL Server binaries but event if we exclude it from the calculation the final size leads to the same conclusion.

SQL Server image size on Linux

In addition, building a basic image of SQL Server on Windows remains a boring stuff and may be time consuming to be honest because you need to write some pieces of code to install optional prerequisites, SQL Server itself meaning you have first to copy binaries (CUs or / and SPs according the version) and then run the command file to install it. A lot of work and no real added values (and no warranties) at the end. That is definitely at the opposite of what I may expect as part of a DevOps process when I want to be fast and to use simply a SQL Server docker based image. Indeed, in this case, I would like to just pick up the right docker image version and corresponding tag and then to focus on my work.

Windocks fills the gap that exists with older versions (and probably new ones) of SQL Server on Windows by providing a different way to create based images compared to the docker-native solution. The first step consists in installing SQL Server instances as we would in a traditional approach. Then the interesting point is that these instances will serve as based images when spinning up containers. This new approach provides several advantages we get through but here I would like to point out the ability to apply configuration settings directly at the SQL Server instance level that will be propagated automatically to new created containers. From my point of view, it is an interesting way to apply segregation of duties without compromising the architecture’s agility. DBAs (Ops) may still work on providing a well configured template from an architecture point of view whereas developers will focus on their work but both will interact with the same tool.

 

  •  Storage concern may exist even on DEV environment

Usually in DEV environment storage is not a big concern for DBAs. From my experience, they usually provide to developers a shared environment with different SQL Server instances and application databases as well. Most of time developers get often high privileges on those environments as – db_owner or sysadmin according to the context – because it is about a DEV environment after all and DBAs apply often a dirty fix to make these environments more “agile”. But this approach implies installing a static environment that is in fact not as flexible as we may think for developers. For instance, how to reinitialize an environment for a specific developer without impacting the work of other ones? The ideal context would be each developer is able to create quickly an isolated and ephemeral environment on-demand. But in turn this new approach comes with its new challenges: Indeed, how to deal with the total disk space consumption in this case? Let’s say each developer wants to spin up a new SQL Server container environment, then the total storage footprint would include the SQL Server docker image and the space consumed by the user databases as well, right?  Let’s take a real customer example who wants to provide fresh data from production databases every week to the developers (after applying sanitized data scripts or not). This is a common scenario by the way and let’s say the final storage size of databases is roughly 500GB for this customer. Adding ~ 20 developers in the game, I ‘m pretty sure you already guessed the potential storage concern which may result here if all developers want to spin up their own environment in the same time. Let’s do a quick math:  20 [developers] x (10GB [Docker image size] + 500GB [user databases] ~= 10 TB.

Going back to my specific context (our DMK maintenance tool) the storage footprint is not so exacerbated because we could be up to 7 developers at the same time with a total storage footprint of 770GB (10GB for the Docker image + 100GB of user databases). It remains too much for us even if we have provisioned 512GB of premium SSD and we can increase it in an easy way … Storage has also a cost on Azure right? Furthermore, we know that for each developer the ratio between the payload disk space and real consumed disk space is low for the most part of developed features. We need to find a way to improve this ratio and Windocks provides a simple way to address it by providing Hyper-V differencing disk capabilities directly integrated with containerization.

  • Security

How to secure our environment was a question that came at the end of our CI implementation. As many DevOps projects security is usually not at the heart of first concern but moving to the cloud helped to consider security as an important topic in our architecture.

First, we need to ensure images used by our team are secure. Insecure images are part of new issues that come with container environments and image checking process requires a more complex infrastructure with often EE capabilities and extra components on the container side (at least in the case you don’t want to put your images on a public repository. Using a private registry on Azure is another option but after some investigations we were in favor of Windocks capabilities in our context. Windocks goes through a different approach to create SQL Server images by using SQL Server native instance installation as based template rather than relying on a Docker native images and on potential docker registry. The built-in approach to prevent compromising the container infrastructure with potential malicious code without further complexifying the architecture was a good argument for us because it can help DBAs to keep security concerns under control here.

Then Windocks provides other features that help us securing the container environment in an easy way with basic authentication to prevent an unauthorized user to spin up a Windocks container for instance. The native support of Windows authentication was another good argument because it simplified the security management of admin users. We are using a mix of Windows sysadmin accounts and SQL Server logins for applications.

In the bottom line, as a small DEV team we are really satisfied with Windocks that was able to address challenges we faced on the operational side. But it is worth noting that our needs and challenges are closed to what we may see with some of our customers, but in a different order of magnitude, when SQL Server is introducing in their CI/CD pipeline. In our context, we are running standard edition of Windocks but EE capabilities are also available that are more suitable with enterprise-class environments.

See you

Cet article Windocks, SQL Server and Azure in our internal CI pipeline est apparu en premier sur Blog dbi services.

Using operating system users to connect to PostgreSQL

Thu, 2019-03-28 14:39

PostgreSQL supports many authentication methods by default and one of them is Ident authentication. Using that method you can use the users defined in the operating system and map them to users in PostgreSQL. So how does that work?

To start, lets create a new operating system user we want to use for connecting to the database:

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo groupadd user1
postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo useradd -g user1 -m user1

The next step is to create a so called user name map. A user map contains the name of the map, the operating system user and the user in PostgreSQL:

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] echo "my-map       user1         user1" >> $PGDATA/pg_ident.conf
postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] tail -5 $PGDATA/pg_ident.conf
# Put your actual configuration here
# ----------------------------------

# MAPNAME       SYSTEM-USERNAME         PG-USERNAME
my-map       user1         user1

In our case the name of the PostgreSQL user and the name of the operating system user is the same. You might well map the operating system to another user in PostgreSQL, e.g. user2.

Obviously our user needs to exist in PostgreSQL, so:

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] psql -c "create user user1 with login" postgres
CREATE ROLE

Finally we need to add an entry to pg_hba.conf that matches our map and authentication method:

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] echo "host    all    all    192.168.22.0/24    ident map=my-map" >> $PGDATA/pg_hba.conf
postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] pg_ctl -D $PGDATA reload
server signaled

Lets try to connect to the database with our new user:

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo su - user1
[user1@pgbox ~]$ /u01/app/postgres/product/DEV/db_1/bin/psql -h 192.168.22.99 -p 5433 -U user1 postgres
psql: FATAL:  Ident authentication failed for user "user1"

… and that fails. When we check the PostgreSQL log file this is reported:

2019-03-19 18:33:26.724 CET - 1 - 8174 - 192.168.22.99 - user1@postgres LOG:  could not connect to Ident server at address "192.168.22.99", port 113: Connection refused
2019-03-19 18:33:26.724 CET - 2 - 8174 - 192.168.22.99 - user1@postgres FATAL:  Ident authentication failed for user "user1"
2019-03-19 18:33:26.724 CET - 3 - 8174 - 192.168.22.99 - user1@postgres DETAIL:  Connection matched pg_hba.conf line 94: "host    all    all    192.168.22.0/24    ident map=my-map"

Our entry in pg_hba.conf matches, at least that is fine. But PostgreSQL is not able to connect to the Ident server and this confirms that nothing is listening on that port:

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo netstat -tulpen | grep 113

I am running CentOS 7 so the procedure for installing and starting an ident server is this:

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo yum search oident
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: pkg.adfinis-sygroup.ch
 * epel: pkg.adfinis-sygroup.ch
 * extras: mirror1.hs-esslingen.de
 * updates: mirror.softaculous.com
=============================================================================================== N/S matched: oident ===============================================================================================
oidentd.x86_64 : Implementation of the RFC1413 identification server

  Name and summary matches only, use "search all" for everything.

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo yum install oidentd
postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] systemctl list-unit-files | grep -i ident
oidentd.service                               disabled
postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo systemctl enable oidentd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/oidentd.service to /usr/lib/systemd/system/oidentd.service.
postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo systemctl start oidentd.service
postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo netstat -tulpen | grep 113
tcp        0      0 0.0.0.0:113             0.0.0.0:*               LISTEN      0          48553      8978/oidentd        

Lets try again:

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo su - user1
Last login: Tue Mar 19 18:33:25 CET 2019 on pts/1
[user1@pgbox ~]$ /u01/app/postgres/product/DEV/db_1/bin/psql -h 192.168.22.99 -p 5433 -U user1 postgres
psql (12devel)
Type "help" for help.

postgres=> 

… and now it works. We can connect using the operating system without specifying a password. To complete this post lets create another operating system user and map it to a different account in PostgreSQL:

postgres@pgbox:/home/postgres/ [PGDEV] sudo groupadd user2
postgres@pgbox:/home/postgres/ [PGDEV] sudo useradd -g user2 -m user2
postgres@pgbox:/home/postgres/ [PGDEV] echo "my-map       user2         user1" >> $PGDATA/pg_ident.conf
postgres@pgbox:/home/postgres/ [PGDEV] tail $PGDATA/pg_ident.conf
# a SIGHUP signal.  If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect.  You can
# use "pg_ctl reload" to do that.

# Put your actual configuration here
# ----------------------------------

# MAPNAME       SYSTEM-USERNAME         PG-USERNAME
my-map       user1         user1
my-map       user2         user1
postgres@pgbox:/home/postgres/ [PGDEV] pg_ctl -D $PGDATA reload
server signaled

user2 should now be able to connect to user1 in PostgreSQL as well:

postgres@pgbox:/u02/pgdata/DEV/ [PGDEV] sudo su - user2
Last login: Tue Mar 19 18:55:06 CET 2019 on pts/1
[user2@pgbox ~]$ /u01/app/postgres/product/DEV/db_1/bin/psql -h 192.168.22.99 -p 5433 -U user1 postgres
psql (12devel)
Type "help" for help.

postgres=> 

Finally, be careful with this authentication method. The documentation is very clear about that: “The drawback of this procedure is that it depends on the integrity of the client: if the client machine is untrusted or compromised, an attacker could run just about any program on port 113 and return any user name they choose. This authentication method is therefore only appropriate for closed networks where each client machine is under tight control and where the database and system administrators operate in close contact. In other words, you must trust the machine running the ident server. Heed the warning: The Identification Protocol is not intended as an authorization or access control protocol.”

Cet article Using operating system users to connect to PostgreSQL est apparu en premier sur Blog dbi services.

How To Push An Image Into Amazon ECR With Docker

Tue, 2019-03-19 01:59
8 Steps To Push An Image Into Amazon ECR With Docker

Please bear in mind that Amazon elastic container registry (ECR) is a managed AWS Docker registry service. In this topic, we will use the Docker CLI to push an CentOS image into Amazon ECR.

1. Install Docker desktop for Windows and AWS CLI

Verify and confirm that each version has been installed properly (see below):

  • docker (dash dash)version
  • aws (dash dash)version
2. Authentication to AWS

Open Power Shell interface with administration privileges and enter the following commands:

  • aws configure
  • Access key: ****
  • Secret key: ****

The region name and output format information are not mandatory.
The data above can be found from the IAM service on AWS console management.

3. Log in to AWS elastic container registry

Use the get-login command to log in to AWS elastic container registry and save it to a text file (see below):

  • aws ecr get-login (dash dash)region eu-west-3 > text.txt
4. Authenticate Docker to AWS elastic container registry

Replace the aws account id provided into the text file saved previously and specify the password:

  • docker login -u AWS https://aws_account_id.dkr.ecr.eu-west-3.amazonaws.com
  • Password: *****
  • Login_AWS
5. Download the CentOS image

Use the pull command to download the CentOs image:

  • docker pull centos:6.6
  • Docker_Pull_image
6. Create a repository
  • aws ecr create-repository (dash dash)repository-name centos

The repository has been created successfully into Amazon Elastic Container Registry (see below):

AWS_ECR_Repository

Before proceeding to the next step, make sure that the following requirements are met:

  • Docker version must be greater or equal to 1.7
  • The repository is created and that the user has sufficient privileges to access it
  • The Docker authentication is successful
7. List the images stored into Docker and tag them
  • docker images

Docker_images

  • docker tag centos:6.6 aws_account_id.dkr.ecr.eu-west-3.amazonaws.com/centos:6.6 (replace the aws_account_id by your account id)

Verify that the image has been tagged:

  • docker images

Docker_images2

8. Push the CentOS image into Amazon ECR

Use the push command to move the centos image into Amazon elastic container registry:

  • docker push aws_account_id.dkr.ecr.eu-west-3.amazonaws.com/centos:6.6 (replace the aws_account_id by your account id)

From the Amazon management console, verify that the image has been pushed properly into Amazon elastic container registry (see below):

ECR_Push_image

If you are in a test environment, to avoid extra costs, make sure to delete the image and the repository from Amazon elastic container registry.

Use the following command to delete the image:

  • aws ecr batch-delete-image (dash dash)repository-name centos (dash dash)image-ids imageTag=6.6

Use the following command to delete the repository:

  • aws ecr delete-repository (dash dash)repository-name centos

Need further details about Docker basics for Amazon ECR, click here.

Cet article How To Push An Image Into Amazon ECR With Docker est apparu en premier sur Blog dbi services.

Pages