BI & Warehousing

Is ETL still necessary?

Dylan's BI Notes - Tue, 2019-01-22 23:10
ETL stands for Extract, Transform, and Load. Extract and Load, their existence itself implies that the source data and target data are stored separately, so you need to extract from source and load the data into the target data store. Extract and Load won’t go away if the data used for reporting is not stored […]
Categories: BI & Warehousing

ORA-04080: trigger ‘PRICE_HISTORY_TRIGGERV1’ does not exist

Amardeep Sidhu - Tue, 2019-01-22 07:45

It is actually a dumb one. I was disabling triggers in a schema and ran this SQL to generate the disable statements. (Example from here)

HR@test> select 'alter trigger '||trigger_name|| ' disable;' from user_triggers where table_name='PRODUCT';

'ALTERTRIGGER'||TRIGGER_NAME||'DISABLE;'
--------------------------------------------------------------------------------
alter trigger PRICE_HISTORY_TRIGGERv1 disable;

HR@test> alter trigger PRICE_HISTORY_TRIGGERv1 disable;
alter trigger PRICE_HISTORY_TRIGGERv1 disable
*
ERROR at line 1:
ORA-04080: trigger 'PRICE_HISTORY_TRIGGERV1' does not exist


HR@test>

WTF ? It is there but the disable didn’t work. I was in hurry, tried to connect through SQL developer and disable and it worked ! Double WTF ! Then i spotted the problem. Someone created it with one letter in the name in small. So to make it work, we need to use double quotes.

HR@test> alter trigger "PRICE_HISTORY_TRIGGERv1" disable;

Trigger altered.

HR@test>

One of the reasons why you shouldn’t use case sensitive names in Oracle. That is stupid.

Categories: BI & Warehousing

Oracle OpenWorld Europe : London 2019

Rittman Mead Consulting - Tue, 2019-01-22 07:39

Some eleven thousand people descended on Oracle OpenWorld Europe in London last week for two days of business and technical sessions delivered by a mixture of members of Oracle’s product team and end users giving real-world case studies of adoption of Oracle’s Cloud offerings and product roadmaps.

Screen-Shot-2019-01-22-at-13.16.24

Something that may not surprise anyone is that at OpenWorld, to speak of anything other than Cloud or Autonomous would be somewhat blasphemous.

It’s a shrewd move this by Oracle to branch outside of their flagship annual conference held in Redwood Shores in October and the attendance backed up the rationale that offering free entry was the right thing to do.

Some of the observations that I made after attending were:

The future is Autonomous

Oracle’s Autonomous Database offering is being heavily pushed despite being a relatively immature product with very few real-world examples yet. The concept is certainly valid and it’s worth new and existing customers of Oracle seriously considering trialling.

There are two autonomous offerings. The autonomous data warehouse (ADW) and autonomous transaction processing (ATP).

Both are fully cloud managed by Oracle, are elastic so that they can be scaled up and down on demand, and most importantly - are autonomous. So the marketing spiel goes, they are self driving, self securing, self repairing. You’ll see this a lot but basically it means that the manual tasks that a DBA would normally perform are taken care of by Oracle. Think patching etc…

AI & ML

You can tell that Oracle are really getting behind the latest trends in the technology market. AI will be a feature of all of their Cloud applications with Mark Hurd (Oracle CEO) predicting that by 2025 all applications on the market with have AI factored in (fair prediction)

Further more Oracle's 2018 acquisiton of DataScience.com show's the strategic vision of the companies board.

Blockchain

Also picking up on the cyber security side of things, Oracle spoke a lot about the role that Blockchain will play in enterprises going forwards. Oracle’s Blockchain cloud platform offering gives enterprises a rapid and simplified deployment of blockchain networks.

Final Thoughts

In summary, this was a really good event for Oracle to run and I really hope they continue to do so. It gave a chance for the Oracle community to come together again and in a growingly competitive market for Cloud, Oracle needs to keep investing in its community going forwards.

Conceptually Oracle has some very timely cloud offerings in their armoury and it will be interesting to come back in 12 months time and see how the adoption of these applications & platforms is going.

Categories: BI & Warehousing

Is Star Schema necessary?

Dylan's BI Notes - Fri, 2019-01-18 12:30
A star schema describes the data by fact and dimension. From one angle, it is a data modeling technique for designing the data warehouse based on relational database technology.  In the old OLAP world, even though a cube is also links to the dimensions that describe the measure, we typically won’t call them Star Schema. […]
Categories: BI & Warehousing

Error while running ggsci

Amardeep Sidhu - Sat, 2019-01-12 08:21

This was another issue that I faced while trying to configure GoldenGate in HA mode. ggsci was working fine after normal installation but after configuring it in HA mode and trying to run ggsci, it resulted in this:

[oragg@node2 product]$ ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 12.3.0.1.4 OGGCORE_12.3.0.1.0_PLATFORMS_180415.0359_FBO
Linux, x64, 64bit (optimized), Oracle 12c on Apr 16 2018 00:53:30
Operating system character set identified as UTF-8.
Copyright (C) 1995, 2018, Oracle and/or its affiliates. All rights reserved.
2019-01-08 16:28:37.913
CLSD: An error occurred while attempting to generate a full name. Logging may not be active for this process
Additional diagnostics: CLSU-00100: operating system function: sclsdgcwd failed with error data: -1
CLSU-00103: error location: sclsdgcwd2
(:CLSD00183:)
GGSCI (exadatadb02.industowers.com) 1>

No obvious clues in the error message but little searching revealed that it had something to do with permissions. It was on Exadata so i tried to do a strace of ggsci and see if it could give some clues. There we go:

[oragg@node2 product]$ strace ggsci
.
.
mkdir("/u01/app/oracle/product/12.1.0.2/dbhome_4/log/exadatadb02", 01777) = -1 EACCES (Permission denied)

That is the Oracle database home, GoldenGate is supposed to use. It is trying to create a directory at the mentioned path and not able to do it. There was another directory called client needed inside this. I created both of them and set the needed permissions & the sticky bit and it worked fine. It was working fine on the other node, so i could check the permissions over there and do the same on this node.

Categories: BI & Warehousing

Failed to execute the command “”/u01/app/xag/bin/clsecho”

Amardeep Sidhu - Tue, 2019-01-08 11:22

I was configuring GoldenGate in HA mode by following this document. Everything worked ok but in the end while running agctl config goldengate to view the configuration of GoldenGate resource, it was failing with the following error:

[oracle@exadatadb02 ~]$ agctl config goldengate GG_TARGET
Failed to execute the command ""/u01/app/xag/bin/clsecho" -p xag -f xag -m 5080 "GG_TARGET"" (rc=134), with the message:
Oracle Clusterware infrastructure fatal error in clsecho.bin (OS PID 126367_140570897783808): Internal error (ID (:CLSB00107:)) - Error -1 (ORA-08275) determining Oracle base
/u01/app/xag/bin/clsecho: line 45: 126367 Aborted (core dumped) ${CRS_HOME}/bin/clsecho.bin "$@"
Failed to execute the command ""/u01/app/xag/bin/clsecho" -p xag -f xag -m 5081 "/u01/app/oragg/product"" (rc=134), with the message:

If you look at the error in bold it sounds kinda obvious that it is not able to figure our where the ORACLE_BASE is. But somehow it didn’t strike me at that moment. So started looking around. If we look at the command it is running, it runs clsecho. This is simply a shell script which in turn calls $CRS_HOME/bin/clsecho.bin . In the script, it sets various environment variables and that is where the problem was. There are lines like:

ORACLE_BASE=
export ORACLE_BASE

Nowhere in the script, it is setting the value of ORACLE_BASE. That was causing an issue. I changed the first line to set the ORACLE_BASE location and it worked fine after that. There was another issue i faced with ggsci after doing xag configuration. Will do another blog post on that.

Categories: BI & Warehousing

dbca doesn’t list diskgroups

Amardeep Sidhu - Wed, 2018-12-26 09:31

This is an Exadata machine running GI version 18.3.0.0.180717 and DB version 12.1.0.2.180717. On one of the DB nodes while running dbca, it doesn’t list the diskgroups. it works fine on the other node.

I cheked the dbca trace and found that the kfod command was failing. I tried to run it manually and got the same error:

[oracle@exadb01 ~]$ /u01/app/18.0.0.0/grid/bin/kfod op=groups verbose=true
KFOD-00300: OCI error [-1] [OCI error] [Could not fetch details] [-105777048]

KFOD-00105: Could not open pfile 'init@.ora'
[oracle@exadb01 ~]$

I ran it with strace then:

[oracle@exadb01 ~]$ strace /u01/app/18.0.0.0/grid/bin/kfod op=groups verbose=true
execve("/u01/app/18.0.0.0/grid/bin/kfod", ["/u01/app/18.0.0.0/grid/bin/kfod", "op=groups", "verbose=true"], [/* 18 vars */]) = 0
brk(0) = 0x2641000
.
.
.
.
.
open("/u01/app/18.0.0.0/grid/dbs/ab_+ASM1.dat", O_RDONLY) = -1 EACCES (Permission denied)
geteuid() = 1003
open("/u01/app/18.0.0.0/grid/rdbms/mesg/kfodus.msb", O_RDONLY) = 13
fcntl(13, F_SETFD, FD_CLOEXEC) = 0
lseek(13, 0, SEEK_SET) = 0
read(13, "\25\23\"\1\23\3\t\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"…, 280) = 280
lseek(13, 512, SEEK_SET) = 512
read(13, "\352\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"…, 512) = 512
lseek(13, 1024, SEEK_SET) = 1024
read(13, ".\1=\1E\1M\1X\1\352\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"…, 512) = 512
lseek(13, 1536, SEEK_SET) = 1536
read(13, "\n\0d\0\0\0D\0e\0\1\0e\0f\0\1\0\230\0g\0\1\0\306\0h\0\2\0\325\0"…, 512) = 512
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), …}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f43f85f2000
write(1, "KFOD-00300: OCI error [-1] [OCI "…, 78KFOD-00300: OCI error [-1] [OCI error] [Could not fetch details] [-132605848]
) = 78

The text in bold just before the kfod error caught my attention. When I checked actually oracle user wasn’t able to read the file. The permissions looked like this:

[root@exadb01 dbs]# ls -ltr
total 20
-rw-r--r-- 1 oragrid oinstall 3079 May 14 2015 init.ora
-rw-r--r-- 1 oragrid oinstall 587 Dec 12 15:33 initbackuppfile.ora
-rw-rw---- 1 oragrid asmadmin 1656 Dec 20 14:26 ab_+ASM1.dat
-rw-rw---- 1 oragrid oinstall 1544 Dec 20 14:26 hc_+APX1.dat
-rw-rw---- 1 oragrid oinstall 1544 Dec 21 16:57 hc_+ASM1.dat
[root@exadb01 dbs]#

Whereas on node2 they were like:

[oracle@exadb02 dbs]$ ls -ltr 
total 16
-rwxrwxrwx 1 oragrid oinstall 3079 Dec 12 14:52 init.ora
-rwxrwxrwx 1 oragrid oinstall 1544 Dec 21 16:57 hc_+ASM2.dat
-rw-rw---- 1 oragrid oinstall 1720 Dec 21 16:57 ab_+ASM2.dat
-rwxrwxrwx 1 oragrid oinstall 1544 Dec 21 16:57 hc_+APX2.dat
[oracle@exadb02 dbs]$

Since oracle user isn’t member of asmadmin group, it is not able to read the meniotned file. Changing the owner to oracle:oinstall fixed the issue.

Categories: BI & Warehousing

New web based OEDA for Exadata

Amardeep Sidhu - Wed, 2018-11-21 03:17

It started with an xls sheet (that was called dbm configurator) . Then OEDA (Oracle Exadata Deployment Assistant) was introduced that was a Java based GUI tool to enter all the information needed to configure an Exadata machine. Now with the latest patch released in Oct, OEDA has changed again; to become a web based tool. It is deployed on WebLogic and comes with some new features as well. SuperCluster deployments will continue to use the Java based OEDA tool.  The new interface has support for Exadata, ZDLRA and ExaCC. It is backward compatible and can import the XMLs generated by older versions of OEDA. Some of the new features include the ability to configure single instance homes, create more than 2 diskgroups, create more than 1 database homes and databases, allow ILOMs to have a different subnet etc.

To configure the OEDA application you need to unzip the contents and run the installWls script with -p switch (that mentions the port). It will deploy the application on WebLogic and give you the URL to access the OEDA. The interface is similar to the older version. Just that it runs in a browser and there are some new features added. MOS note 2460104.1 and the Exadata documentation has more details:

Using Oracle Exadata Deployment Assistant

 

 

Categories: BI & Warehousing

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

Rittman Mead Consulting - Mon, 2018-11-12 05:18
How Are My Users Connecting? Analyzing OAC and OBIEE entry points

Are you managing an OAC or OBIEE instance and your life is nice and easy since you feel like having everything in control: your users browse existing dashboards, create content via Analysis, Data Visualization or SmartView and deliver data via Agents or download dashboard content to use in Excel. You feel safe since you designed your platform to provide aggregated data and track every query via Usage Tracking.

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

But one day you start noticing new BI tools appearing in your company that provide similar KPIs to the ones you are already exposing and you start questioning where those data are coming from. Then suddently realize they are automagically sourcing data from your platform in ways you don't think you can control or manage.
Well, you're not alone, let me introduce you on how to monitor OAC/OBIEE connections via network sniffing and usage tracking in this new world of self-service BI platforms.

A Bit of History

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

Anybody who has been for some time in the Analytics market will be able to recognise the situation described in the image above as a direct experience: multiple people having different views on a KPI calculation and therefore results. Back in the days, that problem was strictly related to the usage of Excel as BI tool and the fact that everybody was directly accessing raw data to build up their own KPIs.

Centralised BI Solutions

The landscape started to change when Centralised Enterprise BI Solutions (like OBIEE or in more recent times OAC ) started appearing and being developed in the market. The Key point of those solutions was to provide a unique source of truth for a certain set of KPIs across the organization.

However, the fact that those tools were centralised in the hands of the IT department, meant most of the times a lack of agility for the Business Departments: every new KPI had to be well defined, understood, documented, implemented by IT, validated and delivered in a process that could take months. Even when the development phase was optimised, via DevOps practices for example, time was still burned due to the communication and coordination efforts which are necessary between Business and IT teams.

Self Service BI Platforms

In order to solve the agility problem, in the last few years a new bottom-up approach has been suggested by the latest set of self-service Analytics tools: a certain set of KPIs is developed locally directly by the Business Department and then, once the KPI has been validated and accepted, its definition and the related data model is certified to allow a broader audience to use it.

Oracle has historically been a leader on the Centralised BI platform space with OBIEE being the perfect tool for this kind of reporting. In recent years, Data Visualization closed the gap of the Self-Service Analytics, providing tools for data preparation, visualization and machine learning directly in the hands of Business Users. Oracle Analytics Cloud (OAC) combines in a unique tool both the traditional centralised BI as well as the self-service analytics providing the best option for each use case.

What we have seen at various customer is a proliferation of BI tools being acquired from various departments: most of the time a centralised BI tool is used side by side with one or more self-service with little or no control over data source usage or KPI calculation.

The transition from old-school centralised BI platform to the new bottom-up certified systems is not immediate and there is no automated solution for it. Moreover, centralised BI platforms are still key in most corporates with big investments associated with them in order to get fully automated KPI management. A complete rewrite of the well-working legacy BI solutions following the latest BI trends and tools is not a doable/affordable on short-term and definitively not a priority for the business.

A Mix of The Two

So, how can we make the old and the new world coexist in a solution which is efficient, agile, and doesn't waste all well defined KPIs that are already produced? The solution that we are suggesting more and more is the re-usage of the central BI solution as a curated data source for the self-service tools.

Just imagine the case where we have a very complex Churn Prediction formula, based on a series of fields in a star schema that has been already validated and approved by the Business. Instead of forcing a new user to rewrite the whole formula from the base tables we could just offer, based on the centralised BI system, something like:

Select "Dim Account"."Account Code", "Fact Churn"."Churn Prediction" from "Churn"

There are various benefits to this:

  • No mistakes in formula recalculation
  • No prior knowledge of joining Condition, filtering, aggregation needed
  • Security system inheritance if specific filters or security-sensitive fields were defined, those settings will still be valid.
  • No duplication of code, with different people accessing various versions of the same KPIs.

Using the centralised BI system to query existing KPIs and mashing-up with new datasources is the optimal way of giving agility to the business but at the same time certifying the validity of the core KPIs.

OBIEE as a datasource

A lot of our customers have OBIEE as their own centralised BI reporting tool and are now looking into expanding the BI footprint with a self-service tool. If the chosen tool is Oracle Data Visualization then all the hard work is already done: it natively interfaces with OBIEE's RPD and all the Subject Areas are available together with the related security constraints since the security system is shared.

But what if the self-service tool is not Oracle Data Visualization? How can you expose OBIEE's Data to an external system? Well, there are three main ways:

The first one is by using web-services: OAC (OBIEE) provides a set of SOAP web-services that can be called via python for example, with one of them being executeSQLQuery. After passing the SQL in a string the results are returned in XML format. This is the method used for example by Rittman Mead Insights. SOAP Web-services, however, can't directly be queried by BI tools this is why we created Unify to allow OBIEE connections from Tableau (which is now available for FREE!).
If you aren't using Tableau, a more generic connection method that can is accessible by most of BI tools is via ODBC: OBIEE's BIServer (the component managing the RPD) can be exposed via ODBC by installing the AdminTool Drivers and creating an ODBC connection.
How Are My Users Connecting? Analyzing OAC and OBIEE entry points

Please note that the ODBC method is only available if the BIServer port is not blocked by firewalls. Once the port is open, the ODBC datasource can be queried by any tool having ODBC querying capabilities.

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

The last method is obviously Smartview, that allows sourcing from pre-existing or the creation of new Analysis with the option of refreshing the data on demand. Smartview is the perfect choice if your target Analytical tool is one of the two supported: Excel or Powerpoint.

Good for all use-cases?

Are the above connection methods good in every situation?

via GIPHY

The solutions described above work really well if you let OBIEE do its job: KPI calculations, aggregations, group by and joins or, in other terms, if your aim is to extract aggregated data. OBIEE is not a massive data exporting tool, if your plan is to export 100k rows (just a random number) every time then you may need to rethink about the solution since you:

  • will experience poor performances since you're adding a layer (OAC) between where the data resides (DB) and yourself
  • put the OBIEE environment under pressure since it has to run the query and transform the resultset in XML before pushing it to you

If that's the use case you're looking for then you should think about alternative solutions like sourcing the data directly from the database and possibly moving your security settings there.

How Can You Monitor Who is Connecting?

Let's face the reality, in our days everyone tries to make his work as easy as it can. Business Analysts are tech savvy and configurations and connection options are just a google search away. Stopping people from finding alternative solutions to accelerate their work is counterproductive: there will be tension since the analyst work is slowed down thus the usage of the centralized BI platform will decline quickly since analysts will just move to other platforms giving them the required flexibility.

Blocking ports and access methods is not the correct way of providing a (BI) service that should be centrally controlled but used by the maximum amount of people in an organization. Therefore monitoring solutions should be created in order to:

  • Understand how users are interacting with the platform
  • Provide specific workarounds in cases when there is a misuse of the platform

But how can you monitor user's access? Well, you really have two options: network sniffing or usage tracking.

Network Sniffing

Let's take the example of ODBC connections directly to BI Server (RPD). Those connections can be of three main types:

  • From/To the Presentation Service in order to execute queries in the front-end (e.g. via analysis) and to retrieve the data
  • From OBI administrators Admin Tool to modify OAC/OBIEE's metadata but this shouldn't happen in Production systems
  • From End Users ODBC connections to query OAC/OBIEE data with other BI tools

In the type one connection both the sender and receiver (Presentation and BI server) share the same IP (or IPs in case of cluster), while in the second and third type (the one we are interested) the IP address of the packet sender/receiver is different from the IP of the OBIEE server.
We can then simply use a Linux network analysis tool like tcpdump to check the traffic. With the following command, we are able to listen on port 9516 (the BI Server one) and exclude all the traffic generated from the Presentation Server (IP 192.168.1.30)

sudo tcpdump  -i eth0 -ennA 'port 9516' | grep -v "IP 192.168.1.30" 

The following is a representation of the traffic

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

We can clearly see the traffic passing between the user's machine (IP ending with 161 and the BI Server port (IP ending with 30 and port 56639).
This is the first tracking effort and it already provides us with some information (like users IP address) however is limited to ODBC and doesn't tell us the username. Let's see now what can we get from Usage Tracking.

Usage Tracking

We wrote a lot about Usage Tracking, how to enhance and how to use it so I don't want to repeat that. A very basic description of it: is a database table containing statistics of every query generated by OBIEE.
The "every query" bit is really important: the query doesn't have to be generated by the standard front-end (analytics), but a record is created even if is coming from Smartview or with a direct ODBC access to the BIServer.

Looking into S_NQ_ACCT (the default table name) there is an interesting field named QUERY_SRC_CD that, from Oracle documentation contains

The source of the request.

Checking the values for that table we can see:
How Are My Users Connecting? Analyzing OAC and OBIEE entry points
Analysing the above data in Detail

  • DashboardPrompt and ValuePrompt are related to display values in Prompts
  • DisplayValueMap, Member Browser Display Values and Member Browser Path to Value seem related to items display when creating analysis
  • Report is an Analysis execution
  • SOAP is the webservices
  • rawSQL is the usage of Raw SQL (shouldn't be permitted)

So SOAP identifies the webservices, what about the direct ODBC connections? they don't seem to be logged! Not really, looking more in detail in a known dataset, we discovered that ODBC connections are marked with NULL value in QUERY_SRC_CD together with some other traffic.
Looking into the details of the Null QUERY_SRC_CD transactions we can see two types of logs:

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

  • The ones starting with SELECT are proper queries sent via an ODBC call
  • The ones starting with CALL are requests from the Presentation Server to the BI Server

Summarizing all the findings, the following query should give you the list of users accessing OBIEE via either ODBC, SOAP or using rawSQL.

SELECT DISTINCT 
  USER_NAME,
  NVL(QUERY_SRC_CD, 'RPD ODBC') SOURCE, 
  TRUNC(START_TS) TS
FROM S_NQ_ACCT 
WHERE 
   AND 
    (
     QUERY_SRC_CD IS NULL OR 
     UPPER(QUERY_SRC_CD) IN ('SOAP', 'RAWSQL')
    ) 
   AND QUERY_TEXT NOT LIKE '{CALL%'
ORDER BY 3 DESC;

You can, of course, do more than this, like analysing query volumes (ROW_COUNT column) and Subject Areas afflicted in order to understand any potential misuse of the platform!

Real Example

Let's see an example I'll try logging in via ODBC and executing a query. For this I'm using RazorSQL a SQL query tool and OBIEE, exactly the same logs can be found in Oracle Analytics Cloud (OAC) once the Usage Tracking is enabled so, administrators, don't afraid your job is not going to extinct right now.

Small note: Usage Tracking may be available only on non-Autonomous version of Oracle Analytics Cloud, since some parts of the setup need command line access and server configuration changes which may not available on the Autonomous version

Setup

First a bit of a setup: In order to connect to OAC all you need to do is to download OBIEE's Administration Tool, install it and create an ODBC connection. After this we can open RazorSQL and add create a connection.

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

Then we need to specify our connection details, by selecting Add Connection Profile, specifying OTHER as Connection Profile, then selecting ODBC as Connection Type and filling in the remaining properties. Please note that:

  • Datasource Name: Select the ODBC connection entry created with the Admin tool drivers
  • Login/Password: Enter the OAC/OBIEE credentials

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

Querying and Checking the Logs

Then it's time to connect. As expected we see in RazorSQL the list of Subject Areas as datapoints which depend on the security settings configured in Weblogic and RPD.

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

The Login action is not visible from Usage Tracking S_NQ_ACCT table, it should be logged in the S_NQ_INITBLOCK if you have Init Blocks associated with the login. Let's start checking the data and see what's going to happen. First of all, let's explore which Tables and Columns are part of the Usage Tracking Subject Area, by clicking on the + Icon next to it.

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

The various Dims and Facts are exposed as Tables by the ODBC driver, now let's see if this action is logged in the database with the query

SELECT USER_NAME, 
  QUERY_TEXT, 
  QUERY_SRC_CD, 
  START_TS, 
  END_TS, 
  ROW_COUNT 
FROM S_NQ_ACCT

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

We can clearly see that even checking the columns within the Measures table is logged as ODBC call, with the column QUERY_SRC_CD as Null as expected.
Now let's try to fire a proper SQL, we need to remember that the SQL we are writing needs to be in the Logical SQL syntax. An example can be

select `Topic`.`Repository Name` from `Usage Tracking`

Which in RazorSQL returns the row

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

And in the database is logged as

How Are My Users Connecting? Analyzing OAC and OBIEE entry points

We can see the user who run the query, the execution time (START_TS and END_TS) as well as the number of rows returned (ROW_COUNT).
We demonstrated that we now have all the info neccessary to start tracking any misuse of OAC/OBIEE as a datasource via ODBC connections.

Automating the Tracking

The easiest solution to properly track this type of OBIEE usage is to have an Agent that on daily basis reports users accessing OAC/OBIEE via ODBC. This solution is very easy to implement since all the Usage Tracking tables are already part of the Repository. Creating an Agent that reports on Usage Tracking rows having QUERY_SRC_CD field as Null, SOAP or rawSQL covers all the "non traditional" use-cases we have been talking about.

As mentioned above sourcing aggregated data from OAC/OBIEE should be considered a "good practice" since it provides the unique source of truth across the company. On the other side, exporting massive amount of data should be avoided since end-user performances will be slow and there will be an impact on OAC/OBIEE server. Thus setting an upper limit on the number of rows (e.g. ROW_COUNT > 100k) reported by the Agent could also mean identifying all the specific data-exports cases that should drive an impact assessment and a possible solution redesign.

Conclusion

Tools and Options in the Analytical Market are exploding and more and more we'll see companies using a number of different solutions for specific purposes. Centralised BI solutions, built over the years, provide the significant advantage of containing the unique source of truth across the company and should be preserved. Giving agility to Analysts and at the same time keeping the centrality of well defined and calculated KPIs is a challenge we'll face more and more often in the future.
OAC (or OBIEE on-premises) offers the duality of both Centralised and Self-Service Analytics methods together with a variety (webservices, ODBC, Smartview) of connecting methods which makes it the perfect cornerstone of a company analytical system.
Tracking down usage, discovering potential misuse of the platform is very easy so inefficiencies can be addressed quickly to provide adequate agility and performance to all analytical business cases!

Categories: BI & Warehousing

Exciting News for Unify

Rittman Mead Consulting - Tue, 2018-11-06 02:37
Announcement: Unify for Free

We are excited to announce we are going to make Unify available for free. To get started send an email to unify@rittmanmead.com, we will ask you to complete a short set of qualifying questions, then we can give you a demo, provide a product key and a link to download the latest version.

The free version of Unify will come with no support obligations or SLAs. On sign up, we will give you the option to join our Unify Slack channel, through which you can raise issues and ask for help.

If you’d like a supported version, we have built a special Expert Service Desk package for Unify which covers

  • Unify support, how to, bugs and fixes
  • Assistance with configuration issues for OBIEE or Tableau
  • Assistance with user/role issues within OBIEE
  • Ad-hoc support queries relating to OBIEE, Tableau and Unify

Beyond supporting Unify, the Expert Service Desk package can also be used to provide technical support and expert services for your entire BI and analytics platform, including:

  • An agreed number of hours per month for technical support of Oracle and Tableau's BI and DI tools
  • Advisory, strategic and roadmap planning for your platform
  • Use of any other Rittman Mead accelerators including support for our other Open Source tools and DevOps Developer Toolkits
  • Access to Rittman Mead’s On Demand Training
New Release: Unify 10.0.17

10.0.17 is the new version of Unify. This release doesn’t change how Unify looks and feels, but there are some new features and improvements under the hood.

The most important feature is that now you can get more data from OBIEE using fewer resources. While we are not encouraging you to download all your data from OBIEE to Tableau all time (please use filters, aggregation etc.), we realise that downloading the large datasets is sometimes required. With the new version, you can do it. Hundreds of thousands of rows can be retrieved without causing your Unify host to grind to a halt.

The second feature we would like to highlight is that now you can use OBIEE instances configured with self-signed SSL certificates. Self-signed certificates are often used for internal systems, and now Unify supports such configurations.

The final notable change is that you can now run Unify Server as a Windows service. It wasn't impossible to run Unify Server at system startup before, but it is even easier.

And, of course, we fixed some bugs and enhanced the logging. We would like to see our software function without bugs, but sometimes they just happen, and when they do, you will get a better explanation of what happened.

On most platforms, Unify Desktop should auto update, if it doesn’t, then please download manually.

Unify is 100% owned and maintained by Rittman Mead Consulting Ltd, and while this announcement makes it available for free, all copies must be used under an End User Licence Agreement (EULA) with Rittman Mead Consulting Ltd.

Categories: BI & Warehousing

Connect to DV Datasets and explore many more new features in OAC / OAAC 18.3.3.0

Tim Dexter - Wed, 2018-10-17 05:26

Greetings !

Oracle Analytics Cloud (OAC) and Oracle Autonomous Analytics Cloud (OAAC) version 18.3.3.0 (also known as V5) got released last month. A rich set of new features have been introduced in this release across different products (with product version 12.2.5.0.0) in the suite. You can check all the new features of OAC / OAAC in the video here.

The focus for BI Publisher on OAC / OAAC in this release has been to compliment Data Visualization for pixel perfect reporting, performance optimizations and adding self service abilities. Here is a list of new features added this release:

BI Publisher New Features in OAC V5.0

New Feature Description 1. DV Datasets

Now you can leverage a variety of data sources covered by Data Visualization data sets, including Cloud based data sources such as Amazon Redshift, Autonomous Data Warehouse Cloud; Big Data sources such as Spark, Impala, Hive; and Application data sources such as Salesforce, Oracle Applications etc. BI Publisher is here to compliment DV to create pixel perfect reports using DV datasets.

Check the documentation for additional details. Also, check this video to see how this feature works.

2. Upload Center

Now upload all files for custom configuration such as fonts, ICC Profile, Private Keys, Digital Signature etc.from the Upload Center as a self service feature available in the Administration page.

Additional details can be found in the documentation here.

3. Validate Data Model

Report Authors can now validate a data model before deploying the report in a production environment. This will help during a custom data model creation where data sets, LOVs and Bursting Queries can be validated against standard guidelines to avoid any undesired performance impact to the report server. 

Details available here.

4. Skip unused data sets

When a data model contains multiple data sets for different layouts, each layout might not use all the data sets defined in the data model. Now Report Authors can select data model property to skip the execution of the unused data sets in a layout. Setting this property reduces the data extraction time, memory usage and improves overall report performance.

Additional details can be found here.

5. Apply Digital Signature to PDF Documents

Digital Signature is widely used feature in on-prem deployments and now this has been added in OAC too, where in Digital Signature can be applied to a PDF output. Digital Signatures can be uploaded from the Upload Center, required signature can be selected under security center, and then applied to PDF outputs by configuring attributes under report properties or run-time properties. 

You can find the documentation here. Also check this video for a quick demonstration.

6. Password protect MS Office Outputs - DocX, PPTX, XLSX

Now protect your MS Office output files with a password defined at report or server level.

Check the PPTX output properties, DocX output properties, Excel 2007 output properties

7. Deliver reports in compressed format

You can select this option to compress the output by including the file in a zip file before delivery via email, FTP, etc.

Additional details can be found here.

8. Request read-receipt and delivery confirmation notification 

You can opt to get delivery and read-receipt notification for scheduled job delivery via email.

Check documentation for additional details. 

9. Add scalability mode for Excel Template to handle large data size

Now you can set up scalability mode for an excel template. This can be done at system level, report level or at template level. By setting this attribute to true, the engine will flush memory after a threshold value and when the data exceeds 65K rows it will rollover data into multiple sheets.

You can find the documentation here.

 

Stay tuned to hear more updates on features and functionalities ! Happy BIP'ing ...

 

Categories: BI & Warehousing

Fixing* Baseline Validation Tool** Using Network Sniffer

Rittman Mead Consulting - Wed, 2018-10-17 05:22

* Sort of
** Not exactly

In the past, Robin Moffatt wrote a number of blogs showing how to use various Linux tools for diagnosing OBIEE and getting insights into how it works (one, two, three, ...). Some time ago I faced a task which allowed me to continue Robin's cycle of posts and show you how to use Wireshark to understand how a certain Oracle tool works and how to search for the solution of a problem more effectively.

To be clear, this blog is not about the issue itself. I could simply write a tweet like "If you faced issue A then patch B solves it". The idea of this blog is to demonstrate how you can use somewhat unexpected tools and get things done.

Obviously, my way of doing things is not the only one. If you are good in searching at My Oracle Support, you possibly can do it even faster, but what is good about my way (except for it is mine, which is enough for me) is that it doesn't involve uneducated guessing. I do an observation and get a clarified answer.

Most of my blogs have disclaimers. This one is not an exception, while its disclaimer is rather small. There is still no silver bullet. This won't work for every single problem in OBIEE. I didn't say this.

Now, let's get started.

The Task

The problem was the following: a client was upgrading its OBIEE system from 11g to 12c and obviously wanted to test for regression, making sure that the upgraded system worked exactly the same as the old one. Manual comparison wasn't an option since they have hundreds or even thousands of analyses and dashboards, so Oracle Baseline Validation Tool (usually called just BVT) was the first candidate as a solution to automate the checks.

Using BVT is quite simple:

  • Create a baseline for the old system.
  • Upgrade
  • Create a new baseline
  • Compare them
  • ???
  • Profit! Congratulations. You are ready to go live.

Right? Well, almost. The problem that we faced was that BVT Dashboards plugin for 11g (a very old 11.1.1.7.something) gave exactly what was expected. But for 12c (12.2.1.something) we got all numbers with a decimal point even while all analyses had "no decimal point" format. So the first feeling we got at this point was that BVT doesn't work well for 12c and that was somewhat disappointing.

SPOILER That wasn't true.

I made a simple dashboard demonstrating the issue.

OBIEE 11g

11g-dash-vs-bvt
Measure values in the XML produced by BVT are exactly as on the dashboard. Looks good.

OBIEE 12c

12c-dash-vs-bvt-1
Dashboard looks good, but values in the XML have decimal digits.

failed

As you can see, the analyses are the same or at least they look very similar but the XMLs produced by BVT aren't. From regression point of view this dashboard must get "DASHBOARDS PASSED" result, but it got "DASHBOARDS DIFFERENT".

Reading the documentation gave us no clear explanation for this behaviour. We had to go deeper and understand what actually caused it. Is it BVT screwing up the data it gets from 12c? Well, that is a highly improbable theory. Decimals were not simply present in the result but they were correct. Correct as in "the same as stored in the database", we had to reject this theory.
Or maybe the problem is that BVT works differently with 11g and 12c? Well, this looks more plausible. A few years have passed since 11.1.1.7 was released and it would not be too surprising if the old version and the modern one had different APIs used by BVT and causing this problem. Or maybe the problem is that 12c itself ignores formatting settings. Let's find out.

The Tool

Neither BVT, nor OBIEE logs gave us any insights. From every point of view, everything was working fine. Except that we were getting 100% mismatch between the source and the target. My hypothesis was that BVT worked differently with OBIEE 11g and 12c. How can I check this? Decompiling the tool and reading its code would possibly give me the answer, but it is not legal. And even if it was legal, the latest BVT size is more than 160 megabytes which would give an insane amount of code to read, especially considering the fact I don't actually know what I'm looking for. Not an option. But BVT talks to OBIEE via the network, right? Therefore we can intercept the network traffic and read it. Shall we?

There are a lot of ways to do it. I work with OBIEE quite a lot and Windows is the obvious choice for my platform. And hence the obvious tool for me was Wireshark.

Wireshark is the world’s foremost and widely-used network protocol analyzer. It lets you see what’s happening on your network at a microscopic level and is the de facto (and often de jure) standard across many commercial and non-profit enterprises, government agencies, and educational institutions. Wireshark development thrives thanks to the volunteer contributions of networking experts around the globe and is the continuation of a project started by Gerald Combs in 1998.

What this "About" doesn't say is that Wireshark is open-source and free. Which is quite nice I think.

Installation Details

I'm not going to go into too many details about the installation process. It is quite simple and straightforward. Keep all the defaults unless you know what you are doing, reboot if asked and you are fine.

If you've never used Wireshark or analogues, the main question would be "Where to install it?". The answer is pretty simple - install it on your workstation, the same workstation where BVT is installed. We're going to intercept our own traffic, not someone else's.

A Bit of Wireshark

Before going to the task we want to solve let's spend some time familiarizing with Wireshark. Its starting screen shows all the network adapters I have on my machine. The one I'm using to connect to the OBIEE servers is "WiFi 2".

Screenshot-2018-10-09-13.50.44

I double-click it and immediately see a constant flow of network packets flying back and forth between my computer and local network machines and the Internet. It's a bit hard to see any particular server in this stream. And "a bit hard" is quite an understatement, to be honest, it is impossible.

wireshark

I need a filter. For example, I know that my OBIEE 12c instance IP is 192.168.1.226. So I add ip.addr==192.168.1.226 filter saying that I only want to see traffic to or from this machine. Nothing to see right now, but if I open the login page in a browser, for example, I can see traffic between my machine (192.168.1.25) and the server. It is much better now but still not perfect.

Screenshot-2018-10-09-14.08.52

If I add http to the filter like this http and ip.addr==192.168.1.226, I definitely can get a much more clear view.

For example, here I opened http://192.168.1.226:9502/analytics page just like any other user would do. There are quite a lot of requests and responses. The browser asked for /analytics URL, the server after a few redirects replied what the actual address for this URL is login.jsp page, then browser requested /bi-security-login/login.jsp page using GET method and got the with HTTP code 200. Code 200 shows that there were no issues with the request.

startpage

Let's try to log in.

login

The top window is a normal browser and the bottom one is Wireshark. Note that my credentials been sent via clear text and I think that is a very good argument in defence of using HTTPS everywhere.

That is a very basic use of Wireshark: start monitoring, do something, see what was captured. I barely scratched the surface of what Wireshark can do, but that is enough for my task.

Wireshark and BVT 12c

The idea is quite simple. I should start capturing my traffic then use BVT as usual and see how it works with 12c and then how it works with 11g. This should give me the answer I need.

Let's see how it works with 12c first. To make things more simple I created a catalogue folder with just one analysis placed on a dashboard.

bvt-dashboard-1

It's time to run BVT and see what happens.

Screenshot-2018-10-11-17.49.59

Here is the dataset I got from OBIEE 12c. I slightly edited and formatted it to make easier to read, but didn't change anything important.

dataset12--1

What did BVT do to get this result? What API did it use? Let's look at Wireshark.

Screenshot-2018-10-11-19.09.27

First three lines are the same as with a browser. I don't know why it is needed for BVT, but I don't mind. Then BVT gets WSDL from OBIEE (GET /analytics-ws/saw.dll/wsdl/v6/private). There are multiple pairs of similar query-response flying back and forth because WSDL is big enough and downloaded in chunks. A purely technical thing, nothing strange or important here.
But now we know what API BVT uses to get data from OBIEE. I don't think anyone is surprised that it is Web Services API. Let's take a look at Web Services calls.

First logon method from nQSessionService. It logs into OBIEE and starts a session.

Screenshot-2018-10-11-19.36.59

Next requests get catalogue items descriptions for objects in my /shared/BVT folder. We can see a set of calls to webCatalogServce methods. These calls are reading my web catalogue structure: all folders, subfolders, dashboard and analysis. Pretty simple, nothing really interesting or unexpected here.

ws01

Then we can see how BVT uses generateReportSQLResult from reportService to get logical SQL for the analysis.

Screenshot-2018-10-11-19.42.07

And gets analysis' logical SQL as the response.

Screenshot-2018-10-11-19.45.10

And the final step - BVT executes this SQL and gets the data. Unfortunately, it is hard to show the data on a screenshot, but the line starting with [truncated] is the XML I showed before.

Screenshot-2018-10-12-12.19.58

And that's all. That's is how BVT gets data from OBIEE.

I did the same for 11g and saw absolutely the same procedure.

Screenshot-2018-10-11-21.01.35

My initial theory that BVT may have been using different APIs for 11g and 12c was busted.

From my experiment, I found out that BVT used xmlViewService to actually get the data. And also I know now that it uses logical SQL for getting the data. Looking at the documentation I can see that xmlViewService has no options related to any formatting. It is a purely data-retrieval service. It can't preserve any formatting and supposed to give only the data. But hey, I've started with the statement "11g preserves formatting", how is that possible? Well, that was a simple coincidence. It doesn't.

In the beginning, I had very little understanding of what keywords to use on MoS to solve the issue. "BVT for 12c doesn't preserve formatting"? "BVT decimal part settings"? "BVT works differently for 11g and 12c"? Now I have something much better - "executeSQLQuery decimal". 30 seconds of searching and I know the answer.

mos-1

This was fixed in 11.1.1.9, but there is a patch for 11.1.1.7.some_of_them. The patch fixes an 11g issue which prevents BVT from getting decimal parts of numbers.

pass

As you may have noticed I had no chance of finding this using my initial problem description. Nether BVT, nor 12g or 11.1.1.7 were mentioned. This thread looks completely unrelated to the issue, I had zero chances to find it.

Conlusion

OBIEE is a complex software and solving issues is not always easy. Unfortunately, no single method is enough for solving all problems. Usually, log files will help you. But when something works but not the way you expect, log files can be useless. In my case BVT was working fine, 11g was working fine, 12c was working fine too. Nothing special to write to logs was happening. That is why sometimes you may need unexpected tools. Just like this. Thanks for reading!

Categories: BI & Warehousing

ODC Appreciation Day: Oracle Cloud PSM Cli

Rittman Mead Consulting - Thu, 2018-10-11 03:11
 Oracle Cloud PSM Cli

Oracle Developer Community (ODC) Appreciation Day (previously know as OTN Appreciation Day) is a day, started from an initiative of Tim Hall, where everyone can share their Thanks to the Oracle community by writing about a favourite product, an experience, a story related to Oracle technology.

 Oracle Cloud PSM Cli

Last year I wrote about OBIEE Time Hierarchies and how they are very useful to perform time comparison, shifts, and aggregations.

This year I want to write about Oracle Paas Service Manager (PSM) Client!
I've already written a blog post about it in detail, basically Oracle PSM allows Oracle cloud administrators to manage their instances via command line instead of forcing them to use the Web-UI.

 Oracle Cloud PSM Cli

PSM Cli allows you to create an Oracle Analytics Cloud instance by just calling

psm analytics create-service -c <CONFIG_FILE> -of <OUTPUT_FORMAT>

and passing a JSON <CONFIG_FILE> which can easily be downloaded after following the creation process in the Web-UI, a bit like the response file in on-premises OBIEE can be saved and customised for future reuse after the first UI installation. Examples of the PSM JSON payloads can be found here.

OAC Instances can also easily be started/stopped/restarted with the command

psm analytics start/stop/restart -s <INSTANCE_NAME>

And the status of each command tracked with

psm analytics operation-status -j <JOB_ID>

As mentioned in my previous post, PSM Cli opens also the doors for instance management automation which is a requirement for providing cost-effective fully isolated feature-related OAC instances useful when thinking about DevOps practices. The fact that PSM Cli is command line, means that it can be integrated in any automation tool like Jenkins and thus integrated in any DevOps flow being designed in any company.

So Thank you, Oracle, for enabling such automation with PSM Cli!

Follow the #ThanksODC hashtag on Twitter to check which post have been published on the same theme!

Categories: BI & Warehousing

OAC 18.3.3: New Features

Rittman Mead Consulting - Fri, 2018-09-21 07:58
 New Features

I believe there is a hidden strategy behind Oracle's product release schedule: every time I'm either on holidays or in a business trip full of appointments a new version of Oracle Analytics Cloud is published with a huge set of new features!

 New Features

OAC 18.3.3 went live last week and contains a big set of enhancements, some of which were already described at Kscope18 during the Sunday Symposium. New features are appearing in almost all the areas covered by OAC, from Data Preparation to the main Data Flows, new Visualization types, new security and configuration options and BIP and Essbase enhancements. Let's have a look at what's there!

Data Preparation

A recurring theme in Europe since last year is GDPR, the General Data Protection Regulation which aims at protecting data and privacy of all European citizens. This is very important in our landscape since we "play" with data on daily basis and we should be aware of what data we can use and how.
Luckily for us now OAC helps to address GDPR with the Data Preparation Recommendations step: every time a dataset is added, each column is profiled and a list of recommended transformations is suggested to the user. Please note that Data Preparation Recommendations is only suggesting changes to the dataset, thus can't be considered the global solution to GDPR compliance.
The suggestion may include:

  • Complete or partial obfuscation of the data: useful when dealing with security/user sensitive data
  • Data Enrichment based on the column data can include:
    • Demographical information based on names
    • Geographical information based on locations, zip codes

 New Features

Each of the suggestion applied to the dataset is stored in a data preparation script that can easily be reapplied if the data is updated.

 New Features

Data Flows

Data Flows is the "mini-ETL" component within OAC which allows transformations, joins, aggregations, filtering, binning, machine learning model training and storing the artifacts either locally or in a database or Essbase cube.
The dataflows however had some limitations, the first one was that they had to be run manually by the user. With OAC 18.3.3 now there is the option to schedule Data Flows more or less like we were used to when scheduling Agents back in OBIEE.

 New Features

Another limitation was related to the creation of a unique Data-set per Data Flow which has been solved with the introduction of the Branch node which allows a single Data Flow to produce multiple data-sets, very useful when the same set of source data and transformations needs to be used to produce various data-sets.

 New Features

Two other new features have been introduced to make data-flows more reusable: Parametrized Sources and Outputs and Incremental Processing.
The Parametrized Sources and Outputs allows to select the data-flow source or target during runtime, allowing, for example, to create a specific and different dataset for today's load.

 New Features

The Incremental Processing, as the name says, is a way to run Data Flows only on top of the data added since the last run (Incremental loads in ETL terms). In order to have a data flow working with incremental loads we need to:

  • Define in the source dataset which is the key column that can be used to indicate new data (e.g. CUSTOMER_KEY or ORDER_DATE) since the last run
  • When including the dataset in a Data Flow enable the execution of the Data Flow with only the new data
  • In the target dataset define if the Incremental Processing replaces existing data or appends data.

Please note that the Incremental Load is available only when using Database Sources.

Another important improvement is the Function Shipping when Data Flows are used with Big Data Cloud: If the source datasets are coming from BDC and the results are stored in BDC, all the transformations like joining, adding calculation columns and filtering are shipped to BDC as well, meaning there is no additional load happening on OAC for the Data Flow.

Lastly there is a new Properties Inspector feature in Data Flow allowing to check the properties like name and description as well as accessing and modifying the scheduling of the related flow.

 New Features

Data Replication

Now is possible to use OAC to replicate data from a source system like Oracle's Fusion Apps, Talend or Eloqua directly into Big Data Cloud, Database Cloud or Data Warehouse Cloud. This function is extremely useful since allows decoupling the queries generated by the analytical tools from the source systems.
As expected the user can select which objects to replicate, the filters to apply, the destination tables and columns, and the load type between Full or Incremental.

Project Creation

New visualization capabilities have been added which include:

  • Grid HeatMap
  • Correlation Matrix
  • Discrete Shapes
  • 100% Stacked Bars and Area Charts

In the Map views, Multiple Map Layers can now be added as well as Density and Metric based HeatMaps, all on top of new background maps including Baidu and Google.

 New Features

Tooltips are now supported in all visualizations, allowing the end user to add measure columns which will be shown when over a section of any graph.

 New Features

The Explain feature is now available on metrics and not only on attributes and has been enhanced: a new anomaly detection algorithm identifies anomalies in combinations of columns working in the background in asynchronous mode, allowing the anomalies to be pushed as soon as they are found.

A new feature that many developers will appreciate is the AutoSave: we are all used to autosave when using google docs, the same applies to OAC, a project is saved automatically at every change. Of course this feature can be turn off if necessary.
Another very interesting addition is the Copy Data to Clipboard: with a right click on any graph, an option to save the underline data to clipboard is available. The data can then natively be pasted in Excel.

Did you create a new dataset and you want to repoint your existing project to it? Now with Dataset replacement it's just few clicks away: you need only to select the new dataset and re-map all the columns used in your current project!

 New Features

Data Management

The datasets/dataflows/project methodology is typical of what Gartner defined as Mode 2 analytics: analysis done by a business user whitout any involvement from the IT. The step sometimes missing or hard to be performed in self-service tools is the publishing: once a certain dataset is consistent and ready to be shared, it's rather difficult to open it to a larger audience within the same toolset.
New OAC administrative options have been addressing this problem: a dataset Certification by an administrator allows a certain dataset to be queried via Ask and DayByDay by other users. There is also a dataset Permissions tab allowing the definition of Full Control, Edit or Read Only access at user or role level. This is the way of bringing the self service dataset back to corporate visibility.

 New Features

A Search tab allows a fine control over the indexing of a certain dataset used by Ask and DayByDay. There are now options to select when then indexing is executed as well as which columns to index and how (by column name and value or by column name only).

 New Features

BIP and Essbase

BI Publisher was added to OAC in the previous version, now includes new features like a tighter integration with the datasets which can be used as datasources or features like email delivery read receipt notification and compressed output and password protection that were already available on the on-premises version.
There is also a new set of features for Essbase including new UI, REST APIs, and, very important security wise, all the external communications (like Smartview) are now over HTTPS.
For a detailed list of new features check this link

Conclusion

OAC 18.3.3 includes an incredible amount of new features which enable the whole analytics story: from self-service data discovery to corporate dashboarding and pixel-perfect formatting, all within the same tool and shared security settings. Options like the parametrized and incremental Data Flows allows content reusability and enhance the overall platform performances reducing the load on source systems.
If you are looking into OAC and want to know more don't hesitate to contact us

Categories: BI & Warehousing

Looker for OBIEE Experts: Introduction and Concepts

Rittman Mead Consulting - Thu, 2018-08-23 08:28
 Introduction and Concepts

Recently I've been doing some personal study around various areas including streaming, machine learning and data visualization and one of the tools that got my attention is Looker. I've initially heard about Looker from a Drill to Detail podcast and increasingly been hearing about it in conferences and use cases together with other cloud solutions like BigQuery, Snowflake and Fivetran.

I decided to give it a try myself and, since most of my career was based on Oracle Business Intelligence (OBI) writing down a comparison between the tools that could help others sharing my experience getting introduced to Looker.

OBIEE's Golden Feature: The Semantic Model

As you probably know if you have been working with OBIEE for some time the centrepiece of its architecture is the Semantic Model contained in the Repository (RPD)

 Introduction and Concepts

In the three layers of the RPD, we model our source data (e.g. database tables) into attributes, metrics, hierarchies which can then be easily dragged and dropped by the end-user in the analysis or data visualization.

I called the RPD "OBIEE's Golden Feature" because to me it's the main benefit of the platform: abstracting the data complexity from end-users and, at the same time, optimizing the query definition to take care of all the features that could be set in the datasource. The importance of the RPD is also its centrality: within the traditional OBIEE all Analysis and Dashboard had to be based on Subject Areas exposed by the RPD meaning that the definition of the metrics was done in a unique place in a consistent manner and then spread across all the reporting providing the unique source of truth for the important KPIs in the company typical of what Gartner calls the Mode 1 Analytics.

RPD Development Speed Limitation and Mode 2 Analytics

The RPD is a centralized binary object within the OBIEE infrastructure: in order to develop and test a full OBIEE instance is required, and the merges between different streams are natively performed via the RPD's admin tool.

This complexity unified to the deep knowledge required to correctly build a valid semantic model limits the number of people being able to create and publish new content thus slowing down the process from data to insights typical of the centralized Mode 1 Analytic platform provided centrally by IT teams. Moreover, RPD development is entirely point-and-click within the admintool which is somehow considered slow and old fashion in a world of scripting, code versioning and git merging. Several solutions are out in the market (including Rittman Mead Developer Toolkit) to enhance the agility of the development but still, the skills and the toolset required to develop new content makes it a purely IT manageable solution.

In order to overcome this limitation several tools like Tableau, QlikView or Oracle's Data Visualization (included in OAC or in the Desktop version) give all the power in the ends of the end-user: from data-sources to graphing, the tools allow an end-to-end data discovery to visualization journey. The problem with those tools (called Mode 2 Analytics by Gartner) is that there is no central definition of the KPI since it's demanded to every analyst. All those tools are addressing the problem by providing some sort of datasource certification allowing a datasource to be visible and reusable publicly only when it's validated centrally. Again, for most of those tools, the modelling is done in a visual format, which makes it difficult to debug, version control and automate. I've been speaking about this subject in my presentation "DevOps and OBIEE do it before it's too late".

What if we could provide the same centralized source of truth data modelling with an easily scriptable syntax that can be developed from business users without any deep knowledge of SQL or source tables? Well, what we just described is LookML!

LookML

LookerML takes the best part of OBIEE: the idea of a modelling layer and democratizes it in order to be available to all business user with a simple language and set of concepts. Moreover, the code versioning is embedded in the tool, so there's no need to teach git branch, commit, push or pull to non-IT people.

So, what are the concepts behing LookerML and how can you get familiar with it when comparing it to the medatada modelling in the RPD?

LookML Concepts

Let's start from the basic of the RPD modelling: a database table. In LookerML each table is represented by an object called View (naming is a bit confusing). Moreover, LookerML's Views can be used not only to map existing database tables but also to create new tables based on existing content and a SQL definition, like the opaque views in OBIEE. On top of this LookML allows the phisicalization of those objects (into a table) and the definition of a schedule for the refresh. This concept is very useful when aggregates are needed, the aggregate definition (SQL) is defined within the LookML View together with the related refresh schedule.

 Introduction and Concepts

The View itself defines only the source, a bit like the RPD's physical layer, the next step is defining how multiple Views interact within each other, or, in OBIEE terms, the Business Layer. In LookML there is an entity called Explores and is the place where we can define which Views we want to group together, and what's the linkage between them. Multiple Explores are defined in a Model, which should be unique per database. So, in OBIEE words, a Model can be compared to a Business Model with Explores being a subset of Facts and Dimensions grouped in a Subject Area.

 Introduction and Concepts

Ok, all "easy" so far, but where do we map the columns? and where do we set the aggregations? As you might expect both are mapped within a LookML View into Fields. Fields is a generic term which includes in both metrics and attributes, LookML naming is the below:

  • Dimension: in OBIEE's terms attributes of a dimension. The terminology is confusing since in LookML the Dimension is the column itself while in OBIEE terms is the table. A Dimension can be a column value or a combination of multiple values (like OBIEE's BM Logical Sources formulas). A Dimension in LookML can't have any aggregation (as in OBIEE).
  • Measures: in OBIEE's terms a metric. The definition includes, the source formula in SQL syntax, the type of aggregation (min/max/count...) and the drill fields.
    Filters: this is not something usually defined in OBIEE's RPD, filters are a way of passing a user choice based on a column value back to an RPD calculation formula, a bit like, for the OBIEE experts, overriding session variables with dashboard prompt values.
  • Parameters: again this is not something usually defined in OBIEE's RPD, you can think a Parameter as a way of setting up variables function. E.g. a Parameter with values SUM, AVG, MIN, MAX could be used to change how a certain Measure is aggregated

All good so far? Stick with me and in the future we'll explore more about LookML syntax and Looker in general!

Categories: BI & Warehousing

Parsing Badly Formatted JSON in Oracle DB with APEX_JSON

Rittman Mead Consulting - Mon, 2018-08-20 05:38
Parsing Badly Formatted JSON in Oracle DB with APEX_JSON

After some blogging silence due to project work and holidays, I thought it was a good idea to do a write-up about a problem I faced this week. One of the tasks I was assigned was to parse a set of JSON files stored in an Oracle 12.1 DB Table.

As probably all of you already know JSON (JavaScript Object Notation) is a lightweight data-interchange format and is the format used widely when talking of web-services due to its flexibility. In JSON there is no header to define (think CSV as example), every field is defined in a format like "field name":"field value", there is no "set of required columns" for a JSON object, when a new attribute needs to be defined, the related name and value can be added to the structure. On top of this "schema-free" definition, the field value can either be

  • a single value
  • an array
  • a nested JSON object

Basically, when you start parsing JSON you feel like

Parsing Badly Formatted JSON in Oracle DB with APEX_JSON

The Easy Part

The task assigned wasn't too difficult, after reading the proper documentation, I was able to parse a JSON File like

{
 "field1": "abc",
 "field2": "cde"
}

Using a simple SQL like

select * 
 from TBL_NAME d,
 JSON_TABLE(d.text, '$' COLUMNS (
   field1 VARCHAR2(10) PATH '$.field1',
   field2 VARCHAR2(10) PATH '$.field2'
   )
 )

Parsing arrays is not very complex either, a JSON file like

{
 "field1": "abc",
 "field2": "cde",
 "field3": ["fgh","ilm","nop"]
}

Can be easily parsed using the NESTED PATH call

select * 
 from TBL_NAME d, 
 JSON_TABLE(d.text, '$' COLUMNS (
   field1 VARCHAR2(10) PATH '$.field1',
   field2 VARCHAR2(10) PATH '$.field2',
   NESTED PATH '$.field3[*]' COLUMNS (
     field3 VARCHAR2(10) PATH '$'
   )
 )
)

In case the Array contains nested objects, those can be parsed using the same syntax as before, for example, field4 and field5 of the following JSON

{
 "field1": "abc",
 "field2": "cde",
 "field3": [
           {
            "field4":"fgh",
            "field5":"ilm"
           },
           {
            "field4":"nop",
            "field5":"qrs"
           }
           ] 
}

can be parsed with

NESTED PATH '$.field3[*]' COLUMNS ( 
   field4 VARCHAR2(10) PATH '$.field4',
   field5 VARCHAR2(10) PATH '$.field5' 
)
...Where things got complicated

All very very easy with well-formatted JSON files, but then I faced the following

{ 
"field1": "abc", 
"field2": "cde", 
"field3": [
    {
     "field4": "aaaa", 
     "field5":{ 
           "1234":"8881", 
           "5678":"8893" 
          }
     },
     {
      "field4": "bbbb",  
      "field5":{ 
            "9876":"8881", 
            "7654":"8945",
            "4356":"7777"
          }
      } 
      ] 
}

Basically the JSON file started including fields with names representing the Ids meaning an association like Product Id (1234) is member of Brand Id (8881). This immediately triggered my reaction:

Parsing Badly Formatted JSON in Oracle DB with APEX_JSON

After checking the documentation again, I wasn't able to find anything that could help me parsing that, since all the calls were including a predefined PATH string, that in the case of Ids I couldn't know beforehand.

I then reached out to my network on Twitter

To all my @Oracle SQL friends out there: I need to parse a JSON object which has a strange format of {“name”:”abc”, “345678”:”123456”} with the 345678 being an Id I need to extract, any suggestions? none of the ones mentioned here seems to help https://t.co/DRWdGvCVfu pic.twitter.com/PfhtUnAeR4

— Francesco Tisiot (@FTisiot) 14 agosto 2018

That generated quite a lot of responses. Initially, the discussion was related to the correctness of the JSON structure, that, from a purist point of view should be mapped as

{ 
"field1": "abc",
"field2": "cde",
"field3": [ 
     {
      "field4": "aaaa", 
      "field5":
           { 
             "association": [
                  {"productId":"1234", "brandId":"8881"},
                  {"productId":"5678", "brandId":"8893"}
                  ]
           },
      },
      {
       "field4": "bbbb", 
       "field5":    
           {
             "association": [
                  {"productId":"9876", "brandId":"8881"},
                  {"productId":"7654", "brandId":"8945"},
                  {"productId":"4356", "brandId":"7777"}
                  ]
           }
      } 
      ]
}

basically going back to standard field names like productId and brandId that could be easily parsed. In my case this wasn't possible since the JSON format was aready widely used at the client.

Possible Solutions

Since a change in the JSON format wasn't possible, I needed to find a way of parsing it, few solutions were mentioned in the twitter thread:

  • Regular Expressions
  • Bash external table preprocessor
  • Java Stored functions
  • External parsing before storing data into the database

All the above were somehow discarded since I wanted to try achieving a solution based only on existing database functions. Other suggestion included JSON_DATAGUIDE and JSON_OBJECT.GET_KEYS that unfortunately are available only from 12.2 (I was on 12.1).

But, just a second before surrendering, Alan Arentsen suggested using APEX_JSON.PARSE procedure!

The Chosen One: APEX_JSON

The APEX_JSON package offers a series of procedures to parse JSON in a PL/SQL package, in particular:

  • PARSE: Parses a JSON formatted string contained in a VARCHAR2 or CLOB storing all the members.
  • GET_COUNT: Returns the number of array elements or object members
  • GET_MEMBERS: Returns the table of members of an object

You can already imagine how a combination of those calls can parse the JSON text defined above, let's have a look at the JSON again:

{ 
"field1": "abc", 
"field2": "cde", 
"field3": [
    {
     "field4": "aaaa", 
     "field5":{ 
           "1234":"8881", 
           "5678":"8893" 
          }
     },
     {
      "field4": "bbbb",  
      "field5":{ 
            "9876":"8881", 
            "7654":"8945",
            "4356":"7777"
          }
      } 
      ] 
}

The parsing process should iterate over the field3 entries (2 in this case), and for each entry, then iterate over the fields in field5 to get both the field name as well as the field value.
The number of field3 entries can be found with

APEX_JSON.GET_COUNT(p_path=>'field3',p_values=>j);

And the list of members of field5 with

APEX_JSON.GET_MEMBERS(p_path=>'field3[%d].field5',p_values=>j,p0=>i);

Note the p_path parameter set to field3[%d].field5 meaning that we want to extract the field5 from the nth row in field3. The rownumber is defined by p0=>i with i being the variable we use in our FOR loop.

The complete code is the following

DECLARE 
   j APEX_JSON.t_values; 
   r_count number;
   field5members   WWV_FLOW_T_VARCHAR2;
   p0 number;
   BrandId VARCHAR2(10);
BEGIN
APEX_JSON.parse(j,'<INSERT_JSON_STRING>');
# Getting number of field3 elements
r_count := APEX_JSON.GET_COUNT(p_path=>'field3',p_values=>j);
dbms_output.put_line('Nr Records: ' || r_count);

# Looping for each element in field3
FOR i IN 1 .. r_count LOOP
# Getting field5 members for the ith member of field3
 field5members := APEX_JSON.GET_MEMBERS(p_path=>'field3[%d].field5',p_values=>j,p0=>i);
# Looping all field5 members
 FOR q in 1 .. field5members.COUNT LOOP
# Extracting BrandId
   BrandId := APEX_JSON.GET_VARCHAR2(p_path=>'field3[%d].field5.'||field5members(q) ,p_values=>j,p0=>i);
# Printing BrandId and Product Id
   dbms_output.put_line('Product Id ="'||field5members(q)||'" BrandId="'||BrandId ||'"');
 END LOOP;
END LOOP;
   
END;

Note that, in order to extract the BrandId we used

APEX_JSON.GET_VARCHAR2(p_path=>'field3[%d].field5.'||field5members(q) ,p_values=>j,p0=>i);

Specifically the PATH is field3[%d].field5.'||field5members(q). As you can imagine we are appending the member name (field5members(q)) to the path described previously to extract the value, forming a string like field3[1].field5.1234 that will correctly extract the value associated.

Conclusion

Three things to save from this experience. The first is the usage of JSON_TABLE: with JSON_TABLE you can parse well-constructed JSON documents and it's very easy and powerful.
The second: APEX_JSON useful package to parse "not very well" constructed JSON documents, allows iteration across elements of JSON arrays and object members.
The last, which is becoming every day more relevant in my career, is the importance of networking and knowledge sharing: blogging, speaking at conferences, helping others in various channels allows you to know other people and be known with the nice side effect of sometimes being able with a single tweet to get help solving problems you may face!

Categories: BI & Warehousing

Making our way into Dremio

Rittman Mead Consulting - Wed, 2018-07-25 04:07

In an analytics system, we typically have an Operational Data Store (ODS) or staging layer; a performance layer or some data marts; and on top, there would be an exploration or reporting tool such as Tableau or Oracle's OBIEE. This architecture can lead to latency in decision making, creating a gap between analysis and action. Data preparation tools like Dremio can address this.

Dremio is a Data-as-a-Service platform allowing users to quickly query data, directly from the source or in any layer, regardless of its size or structure. The product makes use of Apache Arrow, allowing it to virtualise data through an in-memory layer, creating what is called a Data Reflection.

The intent of this post is an introduction to Dremio; it provides a step by step guide on how to query data from Amazon's S3 platform.

I wrote this post using my MacBook Pro, Dremio is supported on MacOS. To install it, I needed to make some configuration changes due to the Java version. The latest version of Dremio uses Java 1.8. If you have a more recent Java version installed, you’ll need to make some adjustments to the Dremio configuration files.

Lets start downloading Dremio and installing it. Dremio can be found for multiple platforms and we can download it from here.

Dremio uses Java 1.8, so if you have an early version please make sure you install java 1.8 and edit /Applications/Dremio.app/Contents/Java/dremio/conf/dremio-env to point to the directory where java 1.8 home is located.

After that you should be able to start Dremio as any other MacOs application and access http://localhost:9047

Image Description

Configuring S3 Source

Dremio can connect to relational databases (both commercial and open source), NoSQL, Hadoop, cloud storage, ElasticSearch, among others. However the scope of this post is to use a well known NoSQL storage S3 bucket (more details can be found here) and show the query capabilities of Dremio against unstructured data.

For this demo we're using Garmin CSV activity data that can be easily downloaded from Garmin activity page.

Here and example of a CSV Garmin activity. If you don't have a Garmin account you can always replicate the data above.

act,runner,Split,Time,Moving Time,Distance,Elevation Gain,Elev Loss,Avg Pace,Avg Moving Paces,Best Pace,Avg Run Cadence,Max Run Cadence,Avg Stride Length,Avg HR,Max HR,Avg Temperature,Calories
1,NMG,1,00:06:08.258,00:06:06.00,1,36,--,0:06:08  ,0:06:06  ,0:04:13  ,175.390625,193.0,92.89507499768523,--,--,--,65
1,NMG,2,00:10:26.907,00:10:09.00,1,129,--,0:10:26  ,0:10:08  ,0:06:02  ,150.140625,236.0,63.74555754497759,--,--,--,55

For user information data we have used the following dataset

runner,dob,name
JM,01-01-1900,Jon Mead
NMG,01-01-1900,Nelio Guimaraes

Add your S3 credentials to access

After configuring your S3 account all buckets associated to it, will be prompted under the new source area.

For this post I’ve created two buckets : nmgbuckettest and nmgdremiouser containing data that could be interpreted as a data mart

Image Description

nmgbuckettest - contains Garmin activity data that could be seen as a fact table in CSV format :

Act,Runner,Split,Time,Moving Time,Distance,Elevation Gain,Elev Loss,Avg Pace,Avg Moving Paces,Best Pace,Avg Run Cadence,Max Run Cadence,Avg Stride Length,Avg HR,Max HR,Avg Temperature,Calories

nmgdremiouser - contains user data that could be seen as a user dimension in a CSV format:

runner,dob,name

Creating datasets

After we add the S3 buckets we need to set up the CSV format. Dremio makes most of the work for us, however we had the need to adjust some fields, for example date formats or map a field as an integer.

By clicking on the gear icon we access the following a configuration panel where we can set the following options. Our CSV's were pretty clean so I've just change the line delimiter for \n and checked the option Extract Field Name

Lets do the same for the second set of CSV's (nmgdremiouser bucket)

Click in saving will drive us to a new panel where we can start performing some queries.

However as mentioned before at this stage we might want to adjust some fields. Right here I'll adapt the dob field from the nmgdremiouser bucket to be in the dd-mm-yyyy format.

Apply the changes and save the new dataset under the desire space.

Feel free to do the same for the nmgbuckettest CSV's. As part of my plan to make I'll call D_USER for the dataset coming from nmgdremiouser bucket and F_ACTIVITY for data coming from nmgbuckettest

Querying datasets

Now that we have D_USER and F_ACTIVITY datasets created we can start querying them and do some analysis.

This first analysis will tell us which runner climbs more during his activities:

SELECT round(nested_0.avg_elev_gain) AS avg_elev_gain, round(nested_0.max_elev_gain) AS max_elev_gain, round(nested_0.sum_elev_gain) as sum_elev_gain, join_D_USER.name AS name
FROM (
  SELECT avg_elev_gain, max_elev_gain, sum_elev_gain, runner
  FROM (
    SELECT AVG(to_number("Elevation Gain",'###')) as avg_elev_gain,
    MAX(to_number("Elevation Gain",'###')) as max_elev_gain,
    SUM(to_number("Elevation Gain",'###')) as sum_elev_gain,
    runner
    FROM dremioblogpost.F_ACTIVITY
    where "Elevation Gain" != '--'
    group by runner
  ) nested_0
) nested_0
 INNER JOIN dremioblogpost.D_USER AS join_D_USER ON nested_0.runner = join_D_USER.runner    

To enrich the example lets understand who is the fastest runner with analysis based on the total climbing

 SELECT round(nested_0.km_per_hour) AS avg_speed_km_per_hour, nested_0.total_climbing AS total_climbing_in_meters, join_D_USER.name AS name
FROM ( 
  SELECT km_per_hour, total_climbing, runner
  FROM (
    select avg(cast(3600.0/((cast(substr("Avg Moving Paces",3,2) as integer)*60)+cast(substr("Avg Moving Paces",6,2) as integer)) as float)) as km_per_hour,
        sum(cast("Elevation Gain" as integer)) total_climbing,
        runner
        from dremioblogpost.F_ACTIVITY
        where "Avg Moving Paces" != '--'
        and "Elevation Gain" != '--'
        group by runner
  ) nested_0
) nested_0
 INNER JOIN dremioblogpost.D_USER AS join_D_USER ON nested_0.runner = join_D_USER.runner

Conclusions

Dremio is an interesting tool capable of unifying existing repositories of unstructured data. Is Dremio capable of working with any volume of data and complex relationships? Well, I believe that right now the tool isn't capable of this, even with the simple and small data sets used in this example the performance was not great.

Dremio does successfully provide self service access to most platforms meaning that users don't have to move data around before being able to perform any analysis. This is probably the most exciting part of Dremio. It might well be in the paradigm of a "good enough" way to access data across multiple sources. This will allow data scientists to do analysis before the data is formally structured.

Categories: BI & Warehousing

Making our way into Dremio

Rittman Mead Consulting - Wed, 2018-07-25 04:07

In an analytics system, we typically have an Operational Data Store (ODS) or staging layer; a performance layer or some data marts; and on top, there would be an exploration or reporting tool such as Tableau or Oracle's OBIEE. This architecture can lead to latency in decision making, creating a gap between analysis and action. Data preparation tools like Dremio can address this.

Dremio is a Data-as-a-Service platform allowing users to quickly query data, directly from the source or in any layer, regardless of its size or structure. The product makes use of Apache Arrow, allowing it to virtualise data through an in-memory layer, creating what is called a Data Reflection.

The intent of this post is an introduction to Dremio; it provides a step by step guide on how to query data from Amazon's S3 platform.

I wrote this post using my MacBook Pro, Dremio is supported on MacOS. To install it, I needed to make some configuration changes due to the Java version. The latest version of Dremio uses Java 1.8. If you have a more recent Java version installed, you’ll need to make some adjustments to the Dremio configuration files.

Lets start downloading Dremio and installing it. Dremio can be found for multiple platforms and we can download it from here.

Dremio uses Java 1.8, so if you have an early version please make sure you install java 1.8 and edit /Applications/Dremio.app/Contents/Java/dremio/conf/dremio-env to point to the directory where java 1.8 home is located.

After that you should be able to start Dremio as any other MacOs application and access http://localhost:9047

Image Description

Configuring S3 Source

Dremio can connect to relational databases (both commercial and open source), NoSQL, Hadoop, cloud storage, ElasticSearch, among others. However the scope of this post is to use a well known NoSQL storage S3 bucket (more details can be found here) and show the query capabilities of Dremio against unstructured data.

For this demo we're using Garmin CSV activity data that can be easily downloaded from Garmin activity page.

Here and example of a CSV Garmin activity. If you don't have a Garmin account you can always replicate the data above.

act,runner,Split,Time,Moving Time,Distance,Elevation Gain,Elev Loss,Avg Pace,Avg Moving Paces,Best Pace,Avg Run Cadence,Max Run Cadence,Avg Stride Length,Avg HR,Max HR,Avg Temperature,Calories
1,NMG,1,00:06:08.258,00:06:06.00,1,36,--,0:06:08  ,0:06:06  ,0:04:13  ,175.390625,193.0,92.89507499768523,--,--,--,65
1,NMG,2,00:10:26.907,00:10:09.00,1,129,--,0:10:26  ,0:10:08  ,0:06:02  ,150.140625,236.0,63.74555754497759,--,--,--,55

For user information data we have used the following dataset

runner,dob,name
JM,01-01-1900,Jon Mead
NMG,01-01-1900,Nelio Guimaraes

Add your S3 credentials to access

After configuring your S3 account all buckets associated to it, will be prompted under the new source area.

For this post I’ve created two buckets : nmgbuckettest and nmgdremiouser containing data that could be interpreted as a data mart

Image Description

nmgbuckettest - contains Garmin activity data that could be seen as a fact table in CSV format :

Act,Runner,Split,Time,Moving Time,Distance,Elevation Gain,Elev Loss,Avg Pace,Avg Moving Paces,Best Pace,Avg Run Cadence,Max Run Cadence,Avg Stride Length,Avg HR,Max HR,Avg Temperature,Calories

nmgdremiouser - contains user data that could be seen as a user dimension in a CSV format:

runner,dob,name

Creating datasets

After we add the S3 buckets we need to set up the CSV format. Dremio makes most of the work for us, however we had the need to adjust some fields, for example date formats or map a field as an integer.

By clicking on the gear icon we access the following a configuration panel where we can set the following options. Our CSV's were pretty clean so I've just change the line delimiter for \n and checked the option Extract Field Name

Lets do the same for the second set of CSV's (nmgdremiouser bucket)

Click in saving will drive us to a new panel where we can start performing some queries.

However as mentioned before at this stage we might want to adjust some fields. Right here I'll adapt the dob field from the nmgdremiouser bucket to be in the dd-mm-yyyy format.

Apply the changes and save the new dataset under the desire space.

Feel free to do the same for the nmgbuckettest CSV's. As part of my plan to make I'll call D_USER for the dataset coming from nmgdremiouser bucket and F_ACTIVITY for data coming from nmgbuckettest

Querying datasets

Now that we have DUSER and FACTIVITY datasets created we can start querying them and do some analysis.

This first analysis will tell us which runner climbs more during his activities:

SELECT round(nested_0.avg_elev_gain) AS avg_elev_gain, round(nested_0.max_elev_gain) AS max_elev_gain, round(nested_0.sum_elev_gain) as sum_elev_gain, join_D_USER.name AS name
FROM (
  SELECT avg_elev_gain, max_elev_gain, sum_elev_gain, runner
  FROM (
    SELECT AVG(to_number("Elevation Gain",'###')) as avg_elev_gain,
    MAX(to_number("Elevation Gain",'###')) as max_elev_gain,
    SUM(to_number("Elevation Gain",'###')) as sum_elev_gain,
    runner
    FROM dremioblogpost.F_ACTIVITY
    where "Elevation Gain" != '--'
    group by runner
  ) nested_0
) nested_0
 INNER JOIN dremioblogpost.D_USER AS join_D_USER ON nested_0.runner = join_D_USER.runner    

To enrich the example lets understand who is the fastest runner with analysis based on the total climbing

 SELECT round(nested_0.km_per_hour) AS avg_speed_km_per_hour, nested_0.total_climbing AS total_climbing_in_meters, join_D_USER.name AS name
FROM ( 
  SELECT km_per_hour, total_climbing, runner
  FROM (
    select avg(cast(3600.0/((cast(substr("Avg Moving Paces",3,2) as integer)*60)+cast(substr("Avg Moving Paces",6,2) as integer)) as float)) as km_per_hour,
        sum(cast("Elevation Gain" as integer)) total_climbing,
        runner
        from dremioblogpost.F_ACTIVITY
        where "Avg Moving Paces" != '--'
        and "Elevation Gain" != '--'
        group by runner
  ) nested_0
) nested_0
 INNER JOIN dremioblogpost.D_USER AS join_D_USER ON nested_0.runner = join_D_USER.runner

Conclusions

Dremio is an interesting tool capable of unifying existing repositories of unstructured data. Is Dremio capable of working with any volume of data and complex relationships? Well, I believe that right now the tool isn't capable of this, even with the simple and small data sets used in this example the performance was not great.

Dremio does successfully provide self service access to most platforms meaning that users don't have to move data around before being able to perform any analysis. This is probably the most exciting part of Dremio. It might well be in the paradigm of a "good enough" way to access data across multiple sources. This will allow data scientists to do analysis before the data is formally structured.

Categories: BI & Warehousing

Oracle Analytics Cloud Workshop FAQ

Rittman Mead Consulting - Mon, 2018-07-23 07:59

A few weeks ago, I had the opportunity to present the Rittman Mead Oracle Analytics Cloud workshop in Oracle's head office in London. The aim of the workshop was to educate potential OAC customers and give them the tools and knowledge to decide whether or not OAC was the right solution for them. We had a great cross section of multiple industries (although telecoms were over represented!) and OBIEE familiarity. Together we came up with a series of questions that needed to be answered to help in the decision making process. In the coming workshops we will add more FAQ style posts to the blog to help flesh out the features of the product.

If you are interested in coming along to one of the workshops to get some hands on time with OAC, send an email to training@rittmanmead.com and we can give you the details.

Do Oracle provide a feature comparison list between OBIEE on premise and OAC?

Oracle do not provide a feature comparison between on-premise and OAC. However, Rittman Mead have done an initial comparison between OAC and traditional on-premise OBIEE 12c installations:

High Level
  • Enterprise Analytics is identical to 12c Analytics
  • Only two Actions available in OAC: Navigate to BI content, Navigate to Web page
  • BI Publisher is identical in 12c and OAC
  • Data Visualiser has additional features and a slightly different UI in OAC compared to 12c
BI Developer Client Tool for OAC
  • Looks exactly the same as the OBIEE client
  • Available only for Windows, straightforward installation
  • OAC IP address and BI Server port must be provided to create an ODBC data source
  • Allows to open and edit online the OAC model
  • Allows offline development. Snapshots interface used to upload it to OAC (it will completely replace existing model)
Data Modeler
  • Alternative tool to create and manage metadata models
  • Very easy to use, but limited compared to the BI Developer Client.
Catalog
  • It's possible to archive/unarchive catalog folders from on-premise to OAC.
BAR file
  • It's possible to create OAC bar files
  • It's possible to migrate OAC bar files to OBIEE 12c
Can you ever be charged by network usage, for example connection to an on premise data source using RDC?

Oracle will not charge you for network usage as things stand. Your charges come from the following:

  • Which version of OAC you have (Standard, Data Lake or Enterprise)
  • Whether you are using Pay-as-you-go or Monthly Commitments
  • The amount of disk space you have specified during provisioning
  • The combination of OCPU and RAM currently in use (size).
  • The up-time of your environment.

So for example an environment that has 1 OCPU with 7.5 GB RAM will cost less than an environment with 24 OCPUs with 180 GB RAM if they are up for the same amount of time, everything else being equal. This being said, there is an additional charge to the analytics license as a cloud database is required to configure and launch an analytics instance which should be taken into consideration when choosing Oracle Analytics Cloud.

Do you need to restart the OAC environment when you change the RAM and OCPU settings?

Configuring the number of OCPUs and associated RAM is done from the Analytics Service Console. This can be done during up time without a service restart, however the analytics service will be unavailable:

alt

PaaS Service Manager Command Line Interface (PSM Cli), which Francesco covered here, will allow this to be scripted and scheduled. An interesting use case for this would be to allow an increase in resources during month end processing where your concurrent users are at its highest, whilst in the quieter parts of the month you can scale back down.

This is done using the 'scale' command, this command takes a json file as a parameter which contains information about what the environment should look like. You will notice in the example below that the json file refers to an object called 'shape'; this is the combination of OCPU and RAM that you want the instance to scale to. Some examples of shapes are:

  • oc3 — 1 OCPU with 7.5 GB RAM
  • oc4 — 2 OCPUs with 15 GB RAM
  • oc5 — 4 OCPUs with 30 GB RAM
  • oc6 — 8 OCPUs with 60 GB RAM
  • oc7 — 16 OCPUs with 120 GB RAM
  • oc8 — 24 OCPUs with 180 GB RAM
  • oc9 — 32 OCPUs with 240 GB RAM

For example:

The following example scales the rittmanmead-analytics-prod service to the oc9 shape.

$ psm analytics scale -s rittmanmead-analytics-prod -c ~/oac-obiee/scale-to-monthend.json
where the JSON file contains the following:

{ "components" : { "BI" : "shape" : "oc9", "hosts":["rittmanmead-prod-1"] } } }

Oracle supply documentation for the commands required here: https://docs.oracle.com/en/cloud/paas/java-cloud/pscli/analytics-scale2.html .

How is high availability provisioned in Oracle Analytics Cloud?

Building a high available infrastructure in the cloud needs to take into consideration three main areas:

Server Failure: Oracle Analytics Cloud can be clustered, additional nodes (up to 10) can be added dynamically in the Cloud 'My Services' console should they need to be:

alt

It is also possible to provision a load balancer, as you can see from the screenshot below:

alt

Zone Failure: Sometimes it more than just a single server that causes the failure. Cloud architecture is built in server farms, which themselves can be network issues, power failures and weather anomalies. Oracle Analytics Cloud allows you to create an instance in a region, much like Amazons "availability zone". A sensible precaution would be to create a disaster recover environment in different region to your main prod environment, to help reduce costs this can be provisioned on the Pay-as-you-go license model and therefore only be chargeable when its being used.

Cloud Failure: Although rare, sometimes the cloud platform can fail. For example both your data centres that you have chosen to counter the previous point could be victim to a weather anomaly. Oracle Analytics Cloud allows you to take regular backups of your reports, dashboards and metadata which can be downloaded and stored off-cloud and re implemented in another 12c Environment.

In addition to these points, its advisable to automate and test everything. Oracle supply a very handy set of scripts and API called PaaS Service Manager Command Line Interface (PSM Cli) which can be used to achieve this. For example it can be used to automate backups, set up monitoring and alerting and finally and arguably most importantly it can be used to test your DR and HA infrastructure.

Can you push the user credentials down to the database?

At this point in time there is no way to configure database authentication providers in a similar way to Weblogic providors of the past. However, Oracle IDCS does have a REST API that could be used to simulate this functionality, documentation can be found here: https://docs.oracle.com/en/cloud/paas/identity-cloud/rest-api/OATOAuthClientWebApp.html

You can store user group memberships in a database and for your service’s authentication provider to access this information when authenticating a user's identity. You can use the script configure_bi_sql_group_provider to set up the provider and create the tables that you need (GROUPS and GROUPMEMBERS). After you run the script, you must populate the tables with your group and group member (user) information.

Group memberships that you derive from the SQL provider don't show up in the Users and Roles page in Oracle Analytics Cloud Console as you might expect but the member assignments work correctly.

alt

These tables are in the Oracle Database Cloud Service you configured for Oracle Analytics Cloud and in the schema created for your service. Unlike the on-premises equivalent functionality, you can’t change the location of these tables or the SQL that retrieves the results.
The script to achieve this is stored on the analytics server itself, and can be accessed using SSH (using the user 'opc') and the private keys that you created during the instance provisioning process. They are stored in: /bi/app/public/bin/configure_bi_sql_group_provider

Can you implement SSL certificates in Oracle Analytics Cloud?

The short answer is yes.

When Oracle Analytics Cloud instances are created, similarly to on-premise OBIEE instances, a a self-signed certificate is generated. The self-signed certificate is intended to be temporary and you must replace it with a new private key and a certificate signed by a certification authority. Doc ID 2334800.1 on support.oracle.com has the full details on how to implement this, but the high level steps (take from the document itself) are:

  • Associate a custom domain name against the public ip of your OAC instance
  • Get the custom SSL certificate from a Certificate Authority
  • Specify the DNS registered host name that you want to secure with SSL in servername.conf
  • Install Intermediate certificateRun the script to Register the new private key and server certificate
Can you implement Single Sign On (SSO) in Oracle Analytics Cloud?

Oracle Identity Cloud Service (IDCS) allows administrators to create security providors for OAC, much like the providors in on premise OBIEE weblogic providors. These can be created/edited to include single sign on URLs,Certificates etc, as shown in the screenshot below:

alt

Oracle support Doc ID 2399789.1 covers this in detail between Microsoft Azure AD and OAC, and is well worth the read.

Are RPD files (BAR files) backwards compatible?

This would depend what has changed between the releases. The different version numbers of OAC doesn't necessarily include changes to the OBIEE components themselves (e.g. it could just be an improvement to the 'My Services' UI). However, if there have been changes to the way the XML is formed in reports for example, these wont be compatible with different previous versions of the catalog. This all being said, the environments look like they can be upgraded at any time so you should be able to take a snapshot of your environment and upgrade it to match the newer version and then redeploy/refresh from your snapshot

How do you connect securely to AWS?

There doesn't seem to be any documentation on how exactly Visual Analyzer connects to Amazon Redshift using the 'Create Connection' wizard. However, there is an option to create an SSL ODBC connection to the Redshift database that can then be used to connect using the Visual Analyzer ODBC connection wizard:

alt

Can you still edit instanceconfig and nqsconfig files?

Yes you can, you need to use your ssh keys to sign into the box (using the user 'opc'). They are contained in the following locations:

/bi/domain/fmw/user_projects/domains/bi/config/fmwconfig/biconfig/OBIPS/instanceconfig.xml

/bi/domain/fmw/user_projects/domains/bi/config/fmwconfig/biconfig/OBIS/NQSConfig.INI

Its also worth mentioning that there is a guide here which explains where the responsibility lies should anything break during customisations of the platform.

Who is responsible for what regarding support?

Guide to Customer vs Oracle Management Responsibilities in Oracle Infrastructure and Platform Cloud Services (Doc ID 2309936.1)

Guide to Customer vs Oracle Management Responsibilities in Oracle Infrastructure and Platform Cloud Services

Categories: BI & Warehousing

Oracle Analytics Cloud Workshop FAQ

Rittman Mead Consulting - Mon, 2018-07-23 07:59

A few weeks ago, I had the opportunity to present the Rittman Mead Oracle Analytics Cloud workshop in Oracle's head office in London. The aim of the workshop was to educate potential OAC customers and give them the tools and knowledge to decide whether or not OAC was the right solution for them. We had a great cross section of multiple industries (although telecoms were over represented!) and OBIEE familiarity. Together we came up with a series of questions that needed to be answered to help in the decision making process. In the coming workshops we will add more FAQ style posts to the blog to help flesh out the features of the product.

If you are interested in coming along to one of the workshops to get some hands on time with OAC, send an email to training@rittmanmead.com and we can give you the details.

Do Oracle provide a feature comparison list between OBIEE on premise and OAC?

Oracle do not provide a feature comparison between on-premise and OAC. However, Rittman Mead have done an initial comparison between OAC and traditional on-premise OBIEE 12c installations:

High Level
  • Enterprise Analytics is identical to 12c Analytics
  • Only two Actions available in OAC: Navigate to BI content, Navigate to Web page
  • BI Publisher is identical in 12c and OAC
  • Data Visualiser has additional features and a slightly different UI in OAC compared to 12c
BI Developer Client Tool for OAC
  • Looks exactly the same as the OBIEE client
  • Available only for Windows, straightforward installation
  • OAC IP address and BI Server port must be provided to create an ODBC data source
  • Allows to open and edit online the OAC model
  • Allows offline development. Snapshots interface used to upload it to OAC (it will completely replace existing model)
Data Modeler
  • Alternative tool to create and manage metadata models
  • Very easy to use, but limited compared to the BI Developer Client.
Catalog
  • It's possible to archive/unarchive catalog folders from on-premise to OAC.
BAR file
  • It's possible to create OAC bar files
  • It's possible to migrate OAC bar files to OBIEE 12c
Can you ever be charged by network usage, for example connection to an on premise data source using RDC?

Oracle will not charge you for network usage as things stand. Your charges come from the following:

  • Which version of OAC you have (Standard, Data Lake or Enterprise)
  • Whether you are using Pay-as-you-go or Monthly Commitments
  • The amount of disk space you have specified during provisioning
  • The combination of OCPU and RAM currently in use (size).
  • The up-time of your environment.

So for example an environment that has 1 OCPU with 7.5 GB RAM will cost less than an environment with 24 OCPUs with 180 GB RAM if they are up for the same amount of time, everything else being equal. This being said, there is an additional charge to the analytics license as a cloud database is required to configure and launch an analytics instance which should be taken into consideration when choosing Oracle Analytics Cloud.

Do you need to restart the OAC environment when you change the RAM and OCPU settings?

Configuring the number of OCPUs and associated RAM is done from the Analytics Service Console. This can be done during up time without a service restart, however the analytics service will be unavailable:

alt

PaaS Service Manager Command Line Interface (PSM Cli), which Francesco covered here, will allow this to be scripted and scheduled. An interesting use case for this would be to allow an increase in resources during month end processing where your concurrent users are at its highest, whilst in the quieter parts of the month you can scale back down.

This is done using the 'scale' command, this command takes a json file as a parameter which contains information about what the environment should look like. You will notice in the example below that the json file refers to an object called 'shape'; this is the combination of OCPU and RAM that you want the instance to scale to. Some examples of shapes are:

  • oc3 — 1 OCPU with 7.5 GB RAM
  • oc4 — 2 OCPUs with 15 GB RAM
  • oc5 — 4 OCPUs with 30 GB RAM
  • oc6 — 8 OCPUs with 60 GB RAM
  • oc7 — 16 OCPUs with 120 GB RAM
  • oc8 — 24 OCPUs with 180 GB RAM
  • oc9 — 32 OCPUs with 240 GB RAM

For example:

The following example scales the rittmanmead-analytics-prod service to the oc9 shape.

$ psm analytics scale -s rittmanmead-analytics-prod -c ~/oac-obiee/scale-to-monthend.json where the JSON file contains the following:

{ "components" : { "BI" : "shape" : "oc9", "hosts":["rittmanmead-prod-1"] } } }

Oracle supply documentation for the commands required here: https://docs.oracle.com/en/cloud/paas/java-cloud/pscli/analytics-scale2.html .

How is high availability provisioned in Oracle Analytics Cloud?

Building a high available infrastructure in the cloud needs to take into consideration three main areas:

Server Failure: Oracle Analytics Cloud can be clustered, additional nodes (up to 10) can be added dynamically in the Cloud 'My Services' console should they need to be:

alt

It is also possible to provision a load balancer, as you can see from the screenshot below: 

alt

Zone Failure: Sometimes it more than just a single server that causes the failure. Cloud architecture is built in server farms, which themselves can be network issues, power failures and weather anomalies. Oracle Analytics Cloud allows you to create an instance in a region, much like Amazons "availability zone". A sensible precaution would be to create a disaster recover environment in different region to your main prod environment, to help reduce costs this can be provisioned on the Pay-as-you-go license model and therefore only be chargeable when its being used.

Cloud Failure: Although rare, sometimes the cloud platform can fail. For example both your data centres that you have chosen to counter the previous point could be victim to a weather anomaly. Oracle Analytics Cloud allows you to take regular backups of your reports, dashboards and metadata which can be downloaded and stored off-cloud and re implemented in another 12c Environment. 

In addition to these points, its advisable to automate and test everything. Oracle supply a very handy set of scripts and API called PaaS Service Manager Command Line Interface (PSM Cli) which can be used to achieve this. For example it can be used to automate backups, set up monitoring and alerting and finally and arguably most importantly it can be used to test your DR and HA infrastructure.

Can you push the user credentials down to the database?

At this point in time there is no way to configure database authentication providers in a similar way to Weblogic providors of the past. However, Oracle IDCS does have a REST API that could be used to simulate this functionality, documentation can be found here: https://docs.oracle.com/en/cloud/paas/identity-cloud/rest-api/OATOAuthClientWebApp.html

You can store user group memberships in a database and for your service’s authentication provider to access this information when authenticating a user's identity. You can use the script configure_bi_sql_group_provider to set up the provider and create the tables that you need (GROUPS and GROUPMEMBERS). After you run the script, you must populate the tables with your group and group member (user) information.

Group memberships that you derive from the SQL provider don't show up in the Users and Roles page in Oracle Analytics Cloud Console as you might expect but the member assignments work correctly.

alt

These tables are in the Oracle Database Cloud Service you configured for Oracle Analytics Cloud and in the schema created for your service. Unlike the on-premises equivalent functionality, you can’t change the location of these tables or the SQL that retrieves the results.
The script to achieve this is stored on the analytics server itself, and can be accessed using SSH (using the user 'opc') and the private keys that you created during the instance provisioning process. They are stored in: /bi/app/public/bin/configurebisqlgroupprovider

Can you implement SSL certificates in Oracle Analytics Cloud?

The short answer is yes.

When Oracle Analytics Cloud instances are created, similarly to on-premise OBIEE instances, a a self-signed certificate is generated. The self-signed certificate is intended to be temporary and you must replace it with a new private key and a certificate signed by a certification authority. Doc ID 2334800.1 on support.oracle.com has the full details on how to implement this, but the high level steps (take from the document itself) are:

  • Associate a custom domain name against the public ip of your OAC instance
  • Get the custom SSL certificate from a Certificate Authority
  • Specify the DNS registered host name that you want to secure with SSL in servername.conf
  • Install Intermediate certificateRun the script to Register the new private key and server certificate
Can you implement Single Sign On (SSO) in Oracle Analytics Cloud?

Oracle Identity Cloud Service (IDCS) allows administrators to create security providors for OAC, much like the providors in on premise OBIEE weblogic providors. These can be created/edited to include single sign on URLs,Certificates etc, as shown in the screenshot below:

alt

Oracle support Doc ID 2399789.1 covers this in detail between Microsoft Azure AD and OAC, and is well worth the read.

Are RPD files (BAR files) backwards compatible?

This would depend what has changed between the releases. The different version numbers of OAC doesn't necessarily include changes to the OBIEE components themselves (e.g. it could just be an improvement to the 'My Services' UI). However, if there have been changes to the way the XML is formed in reports for example, these wont be compatible with different previous versions of the catalog. This all being said, the environments look like they can be upgraded at any time so you should be able to take a snapshot of your environment and upgrade it to match the newer version and then redeploy/refresh from your snapshot

How do you connect securely to AWS?

There doesn't seem to be any documentation on how exactly Visual Analyzer connects to Amazon Redshift using the 'Create Connection' wizard. However, there is an option to create an SSL ODBC connection to the Redshift database that can then be used to connect using the Visual Analyzer ODBC connection wizard:

alt

Can you still edit instanceconfig and nqsconfig files?

Yes you can, you need to use your ssh keys to sign into the box (using the user 'opc'). They are contained in the following locations:

/bi/domain/fmw/user_projects/domains/bi/config/fmwconfig/biconfig/OBIPS/instanceconfig.xml

/bi/domain/fmw/user_projects/domains/bi/config/fmwconfig/biconfig/OBIS/NQSConfig.INI

Its also worth mentioning that there is a guide here which explains where the responsibility lies should anything break during customisations of the platform.

Who is responsible for what regarding support?

Guide to Customer vs Oracle Management Responsibilities in Oracle Infrastructure and Platform Cloud Services (Doc ID 2309936.1)

Guide to Customer vs Oracle Management Responsibilities in Oracle Infrastructure and Platform Cloud Services

Categories: BI & Warehousing

Pages

Subscribe to Oracle FAQ aggregator - BI &amp; Warehousing