Rittman Mead Consulting

Subscribe to Rittman Mead Consulting feed Rittman Mead Consulting
Rittman Mead consults, trains, and innovates within the world of Oracle Business Intelligence, data integration, and analytics.
Updated: 12 hours 51 min ago

ODC Appreciation Day: Oracle Cloud PSM Cli

Thu, 2018-10-11 03:11
 Oracle Cloud PSM Cli

Oracle Developer Community (ODC) Appreciation Day (previously know as OTN Appreciation Day) is a day, started from an initiative of Tim Hall, where everyone can share their Thanks to the Oracle community by writing about a favourite product, an experience, a story related to Oracle technology.

 Oracle Cloud PSM Cli

Last year I wrote about OBIEE Time Hierarchies and how they are very useful to perform time comparison, shifts, and aggregations.

This year I want to write about Oracle Paas Service Manager (PSM) Client!
I've already written a blog post about it in detail, basically Oracle PSM allows Oracle cloud administrators to manage their instances via command line instead of forcing them to use the Web-UI.

 Oracle Cloud PSM Cli

PSM Cli allows you to create an Oracle Analytics Cloud instance by just calling

psm analytics create-service -c <CONFIG_FILE> -of <OUTPUT_FORMAT>

and passing a JSON <CONFIG_FILE> which can easily be downloaded after following the creation process in the Web-UI, a bit like the response file in on-premises OBIEE can be saved and customised for future reuse after the first UI installation. Examples of the PSM JSON payloads can be found here.

OAC Instances can also easily be started/stopped/restarted with the command

psm analytics start/stop/restart -s <INSTANCE_NAME>

And the status of each command tracked with

psm analytics operation-status -j <JOB_ID>

As mentioned in my previous post, PSM Cli opens also the doors for instance management automation which is a requirement for providing cost-effective fully isolated feature-related OAC instances useful when thinking about DevOps practices. The fact that PSM Cli is command line, means that it can be integrated in any automation tool like Jenkins and thus integrated in any DevOps flow being designed in any company.

So Thank you, Oracle, for enabling such automation with PSM Cli!

Follow the #ThanksODC hashtag on Twitter to check which post have been published on the same theme!

Categories: BI & Warehousing

OAC 18.3.3: New Features

Fri, 2018-09-21 07:58
 New Features

I believe there is a hidden strategy behind Oracle's product release schedule: every time I'm either on holidays or in a business trip full of appointments a new version of Oracle Analytics Cloud is published with a huge set of new features!

 New Features

OAC 18.3.3 went live last week and contains a big set of enhancements, some of which were already described at Kscope18 during the Sunday Symposium. New features are appearing in almost all the areas covered by OAC, from Data Preparation to the main Data Flows, new Visualization types, new security and configuration options and BIP and Essbase enhancements. Let's have a look at what's there!

Data Preparation

A recurring theme in Europe since last year is GDPR, the General Data Protection Regulation which aims at protecting data and privacy of all European citizens. This is very important in our landscape since we "play" with data on daily basis and we should be aware of what data we can use and how.
Luckily for us now OAC helps to address GDPR with the Data Preparation Recommendations step: every time a dataset is added, each column is profiled and a list of recommended transformations is suggested to the user. Please note that Data Preparation Recommendations is only suggesting changes to the dataset, thus can't be considered the global solution to GDPR compliance.
The suggestion may include:

  • Complete or partial obfuscation of the data: useful when dealing with security/user sensitive data
  • Data Enrichment based on the column data can include:
    • Demographical information based on names
    • Geographical information based on locations, zip codes

 New Features

Each of the suggestion applied to the dataset is stored in a data preparation script that can easily be reapplied if the data is updated.

 New Features

Data Flows

Data Flows is the "mini-ETL" component within OAC which allows transformations, joins, aggregations, filtering, binning, machine learning model training and storing the artifacts either locally or in a database or Essbase cube.
The dataflows however had some limitations, the first one was that they had to be run manually by the user. With OAC 18.3.3 now there is the option to schedule Data Flows more or less like we were used to when scheduling Agents back in OBIEE.

 New Features

Another limitation was related to the creation of a unique Data-set per Data Flow which has been solved with the introduction of the Branch node which allows a single Data Flow to produce multiple data-sets, very useful when the same set of source data and transformations needs to be used to produce various data-sets.

 New Features

Two other new features have been introduced to make data-flows more reusable: Parametrized Sources and Outputs and Incremental Processing.
The Parametrized Sources and Outputs allows to select the data-flow source or target during runtime, allowing, for example, to create a specific and different dataset for today's load.

 New Features

The Incremental Processing, as the name says, is a way to run Data Flows only on top of the data added since the last run (Incremental loads in ETL terms). In order to have a data flow working with incremental loads we need to:

  • Define in the source dataset which is the key column that can be used to indicate new data (e.g. CUSTOMER_KEY or ORDER_DATE) since the last run
  • When including the dataset in a Data Flow enable the execution of the Data Flow with only the new data
  • In the target dataset define if the Incremental Processing replaces existing data or appends data.

Please note that the Incremental Load is available only when using Database Sources.

Another important improvement is the Function Shipping when Data Flows are used with Big Data Cloud: If the source datasets are coming from BDC and the results are stored in BDC, all the transformations like joining, adding calculation columns and filtering are shipped to BDC as well, meaning there is no additional load happening on OAC for the Data Flow.

Lastly there is a new Properties Inspector feature in Data Flow allowing to check the properties like name and description as well as accessing and modifying the scheduling of the related flow.

 New Features

Data Replication

Now is possible to use OAC to replicate data from a source system like Oracle's Fusion Apps, Talend or Eloqua directly into Big Data Cloud, Database Cloud or Data Warehouse Cloud. This function is extremely useful since allows decoupling the queries generated by the analytical tools from the source systems.
As expected the user can select which objects to replicate, the filters to apply, the destination tables and columns, and the load type between Full or Incremental.

Project Creation

New visualization capabilities have been added which include:

  • Grid HeatMap
  • Correlation Matrix
  • Discrete Shapes
  • 100% Stacked Bars and Area Charts

In the Map views, Multiple Map Layers can now be added as well as Density and Metric based HeatMaps, all on top of new background maps including Baidu and Google.

 New Features

Tooltips are now supported in all visualizations, allowing the end user to add measure columns which will be shown when over a section of any graph.

 New Features

The Explain feature is now available on metrics and not only on attributes and has been enhanced: a new anomaly detection algorithm identifies anomalies in combinations of columns working in the background in asynchronous mode, allowing the anomalies to be pushed as soon as they are found.

A new feature that many developers will appreciate is the AutoSave: we are all used to autosave when using google docs, the same applies to OAC, a project is saved automatically at every change. Of course this feature can be turn off if necessary.
Another very interesting addition is the Copy Data to Clipboard: with a right click on any graph, an option to save the underline data to clipboard is available. The data can then natively be pasted in Excel.

Did you create a new dataset and you want to repoint your existing project to it? Now with Dataset replacement it's just few clicks away: you need only to select the new dataset and re-map all the columns used in your current project!

 New Features

Data Management

The datasets/dataflows/project methodology is typical of what Gartner defined as Mode 2 analytics: analysis done by a business user whitout any involvement from the IT. The step sometimes missing or hard to be performed in self-service tools is the publishing: once a certain dataset is consistent and ready to be shared, it's rather difficult to open it to a larger audience within the same toolset.
New OAC administrative options have been addressing this problem: a dataset Certification by an administrator allows a certain dataset to be queried via Ask and DayByDay by other users. There is also a dataset Permissions tab allowing the definition of Full Control, Edit or Read Only access at user or role level. This is the way of bringing the self service dataset back to corporate visibility.

 New Features

A Search tab allows a fine control over the indexing of a certain dataset used by Ask and DayByDay. There are now options to select when then indexing is executed as well as which columns to index and how (by column name and value or by column name only).

 New Features

BIP and Essbase

BI Publisher was added to OAC in the previous version, now includes new features like a tighter integration with the datasets which can be used as datasources or features like email delivery read receipt notification and compressed output and password protection that were already available on the on-premises version.
There is also a new set of features for Essbase including new UI, REST APIs, and, very important security wise, all the external communications (like Smartview) are now over HTTPS.
For a detailed list of new features check this link

Conclusion

OAC 18.3.3 includes an incredible amount of new features which enable the whole analytics story: from self-service data discovery to corporate dashboarding and pixel-perfect formatting, all within the same tool and shared security settings. Options like the parametrized and incremental Data Flows allows content reusability and enhance the overall platform performances reducing the load on source systems.
If you are looking into OAC and want to know more don't hesitate to contact us

Categories: BI & Warehousing

Looker for OBIEE Experts: Introduction and Concepts

Thu, 2018-08-23 08:28
 Introduction and Concepts

Recently I've been doing some personal study around various areas including streaming, machine learning and data visualization and one of the tools that got my attention is Looker. I've initially heard about Looker from a Drill to Detail podcast and increasingly been hearing about it in conferences and use cases together with other cloud solutions like BigQuery, Snowflake and Fivetran.

I decided to give it a try myself and, since most of my career was based on Oracle Business Intelligence (OBI) writing down a comparison between the tools that could help others sharing my experience getting introduced to Looker.

OBIEE's Golden Feature: The Semantic Model

As you probably know if you have been working with OBIEE for some time the centrepiece of its architecture is the Semantic Model contained in the Repository (RPD)

 Introduction and Concepts

In the three layers of the RPD, we model our source data (e.g. database tables) into attributes, metrics, hierarchies which can then be easily dragged and dropped by the end-user in the analysis or data visualization.

I called the RPD "OBIEE's Golden Feature" because to me it's the main benefit of the platform: abstracting the data complexity from end-users and, at the same time, optimizing the query definition to take care of all the features that could be set in the datasource. The importance of the RPD is also its centrality: within the traditional OBIEE all Analysis and Dashboard had to be based on Subject Areas exposed by the RPD meaning that the definition of the metrics was done in a unique place in a consistent manner and then spread across all the reporting providing the unique source of truth for the important KPIs in the company typical of what Gartner calls the Mode 1 Analytics.

RPD Development Speed Limitation and Mode 2 Analytics

The RPD is a centralized binary object within the OBIEE infrastructure: in order to develop and test a full OBIEE instance is required, and the merges between different streams are natively performed via the RPD's admin tool.

This complexity unified to the deep knowledge required to correctly build a valid semantic model limits the number of people being able to create and publish new content thus slowing down the process from data to insights typical of the centralized Mode 1 Analytic platform provided centrally by IT teams. Moreover, RPD development is entirely point-and-click within the admintool which is somehow considered slow and old fashion in a world of scripting, code versioning and git merging. Several solutions are out in the market (including Rittman Mead Developer Toolkit) to enhance the agility of the development but still, the skills and the toolset required to develop new content makes it a purely IT manageable solution.

In order to overcome this limitation several tools like Tableau, QlikView or Oracle's Data Visualization (included in OAC or in the Desktop version) give all the power in the ends of the end-user: from data-sources to graphing, the tools allow an end-to-end data discovery to visualization journey. The problem with those tools (called Mode 2 Analytics by Gartner) is that there is no central definition of the KPI since it's demanded to every analyst. All those tools are addressing the problem by providing some sort of datasource certification allowing a datasource to be visible and reusable publicly only when it's validated centrally. Again, for most of those tools, the modelling is done in a visual format, which makes it difficult to debug, version control and automate. I've been speaking about this subject in my presentation "DevOps and OBIEE do it before it's too late".

What if we could provide the same centralized source of truth data modelling with an easily scriptable syntax that can be developed from business users without any deep knowledge of SQL or source tables? Well, what we just described is LookML!

LookML

LookerML takes the best part of OBIEE: the idea of a modelling layer and democratizes it in order to be available to all business user with a simple language and set of concepts. Moreover, the code versioning is embedded in the tool, so there's no need to teach git branch, commit, push or pull to non-IT people.

So, what are the concepts behing LookerML and how can you get familiar with it when comparing it to the medatada modelling in the RPD?

LookML Concepts

Let's start from the basic of the RPD modelling: a database table. In LookerML each table is represented by an object called View (naming is a bit confusing). Moreover, LookerML's Views can be used not only to map existing database tables but also to create new tables based on existing content and a SQL definition, like the opaque views in OBIEE. On top of this LookML allows the phisicalization of those objects (into a table) and the definition of a schedule for the refresh. This concept is very useful when aggregates are needed, the aggregate definition (SQL) is defined within the LookML View together with the related refresh schedule.

 Introduction and Concepts

The View itself defines only the source, a bit like the RPD's physical layer, the next step is defining how multiple Views interact within each other, or, in OBIEE terms, the Business Layer. In LookML there is an entity called Explores and is the place where we can define which Views we want to group together, and what's the linkage between them. Multiple Explores are defined in a Model, which should be unique per database. So, in OBIEE words, a Model can be compared to a Business Model with Explores being a subset of Facts and Dimensions grouped in a Subject Area.

 Introduction and Concepts

Ok, all "easy" so far, but where do we map the columns? and where do we set the aggregations? As you might expect both are mapped within a LookML View into Fields. Fields is a generic term which includes in both metrics and attributes, LookML naming is the below:

  • Dimension: in OBIEE's terms attributes of a dimension. The terminology is confusing since in LookML the Dimension is the column itself while in OBIEE terms is the table. A Dimension can be a column value or a combination of multiple values (like OBIEE's BM Logical Sources formulas). A Dimension in LookML can't have any aggregation (as in OBIEE).
  • Measures: in OBIEE's terms a metric. The definition includes, the source formula in SQL syntax, the type of aggregation (min/max/count...) and the drill fields.
    Filters: this is not something usually defined in OBIEE's RPD, filters are a way of passing a user choice based on a column value back to an RPD calculation formula, a bit like, for the OBIEE experts, overriding session variables with dashboard prompt values.
  • Parameters: again this is not something usually defined in OBIEE's RPD, you can think a Parameter as a way of setting up variables function. E.g. a Parameter with values SUM, AVG, MIN, MAX could be used to change how a certain Measure is aggregated

All good so far? Stick with me and in the future we'll explore more about LookML syntax and Looker in general!

Categories: BI & Warehousing

Parsing Badly Formatted JSON in Oracle DB with APEX_JSON

Mon, 2018-08-20 05:38
Parsing Badly Formatted JSON in Oracle DB with APEX_JSON

After some blogging silence due to project work and holidays, I thought it was a good idea to do a write-up about a problem I faced this week. One of the tasks I was assigned was to parse a set of JSON files stored in an Oracle 12.1 DB Table.

As probably all of you already know JSON (JavaScript Object Notation) is a lightweight data-interchange format and is the format used widely when talking of web-services due to its flexibility. In JSON there is no header to define (think CSV as example), every field is defined in a format like "field name":"field value", there is no "set of required columns" for a JSON object, when a new attribute needs to be defined, the related name and value can be added to the structure. On top of this "schema-free" definition, the field value can either be

  • a single value
  • an array
  • a nested JSON object

Basically, when you start parsing JSON you feel like

Parsing Badly Formatted JSON in Oracle DB with APEX_JSON

The Easy Part

The task assigned wasn't too difficult, after reading the proper documentation, I was able to parse a JSON File like

{
 "field1": "abc",
 "field2": "cde"
}

Using a simple SQL like

select * 
 from TBL_NAME d,
 JSON_TABLE(d.text, '$' COLUMNS (
   field1 VARCHAR2(10) PATH '$.field1',
   field2 VARCHAR2(10) PATH '$.field2'
   )
 )

Parsing arrays is not very complex either, a JSON file like

{
 "field1": "abc",
 "field2": "cde",
 "field3": ["fgh","ilm","nop"]
}

Can be easily parsed using the NESTED PATH call

select * 
 from TBL_NAME d, 
 JSON_TABLE(d.text, '$' COLUMNS (
   field1 VARCHAR2(10) PATH '$.field1',
   field2 VARCHAR2(10) PATH '$.field2',
   NESTED PATH '$.field3[*]' COLUMNS (
     field3 VARCHAR2(10) PATH '$'
   )
 )
)

In case the Array contains nested objects, those can be parsed using the same syntax as before, for example, field4 and field5 of the following JSON

{
 "field1": "abc",
 "field2": "cde",
 "field3": [
           {
            "field4":"fgh",
            "field5":"ilm"
           },
           {
            "field4":"nop",
            "field5":"qrs"
           }
           ] 
}

can be parsed with

NESTED PATH '$.field3[*]' COLUMNS ( 
   field4 VARCHAR2(10) PATH '$.field4',
   field5 VARCHAR2(10) PATH '$.field5' 
)
...Where things got complicated

All very very easy with well-formatted JSON files, but then I faced the following

{ 
"field1": "abc", 
"field2": "cde", 
"field3": [
    {
     "field4": "aaaa", 
     "field5":{ 
           "1234":"8881", 
           "5678":"8893" 
          }
     },
     {
      "field4": "bbbb",  
      "field5":{ 
            "9876":"8881", 
            "7654":"8945",
            "4356":"7777"
          }
      } 
      ] 
}

Basically the JSON file started including fields with names representing the Ids meaning an association like Product Id (1234) is member of Brand Id (8881). This immediately triggered my reaction:

Parsing Badly Formatted JSON in Oracle DB with APEX_JSON

After checking the documentation again, I wasn't able to find anything that could help me parsing that, since all the calls were including a predefined PATH string, that in the case of Ids I couldn't know beforehand.

I then reached out to my network on Twitter

To all my @Oracle SQL friends out there: I need to parse a JSON object which has a strange format of {“name”:”abc”, “345678”:”123456”} with the 345678 being an Id I need to extract, any suggestions? none of the ones mentioned here seems to help https://t.co/DRWdGvCVfu pic.twitter.com/PfhtUnAeR4

— Francesco Tisiot (@FTisiot) 14 agosto 2018

That generated quite a lot of responses. Initially, the discussion was related to the correctness of the JSON structure, that, from a purist point of view should be mapped as

{ 
"field1": "abc",
"field2": "cde",
"field3": [ 
     {
      "field4": "aaaa", 
      "field5":
           { 
             "association": [
                  {"productId":"1234", "brandId":"8881"},
                  {"productId":"5678", "brandId":"8893"}
                  ]
           },
      },
      {
       "field4": "bbbb", 
       "field5":    
           {
             "association": [
                  {"productId":"9876", "brandId":"8881"},
                  {"productId":"7654", "brandId":"8945"},
                  {"productId":"4356", "brandId":"7777"}
                  ]
           }
      } 
      ]
}

basically going back to standard field names like productId and brandId that could be easily parsed. In my case this wasn't possible since the JSON format was aready widely used at the client.

Possible Solutions

Since a change in the JSON format wasn't possible, I needed to find a way of parsing it, few solutions were mentioned in the twitter thread:

  • Regular Expressions
  • Bash external table preprocessor
  • Java Stored functions
  • External parsing before storing data into the database

All the above were somehow discarded since I wanted to try achieving a solution based only on existing database functions. Other suggestion included JSON_DATAGUIDE and JSON_OBJECT.GET_KEYS that unfortunately are available only from 12.2 (I was on 12.1).

But, just a second before surrendering, Alan Arentsen suggested using APEX_JSON.PARSE procedure!

The Chosen One: APEX_JSON

The APEX_JSON package offers a series of procedures to parse JSON in a PL/SQL package, in particular:

  • PARSE: Parses a JSON formatted string contained in a VARCHAR2 or CLOB storing all the members.
  • GET_COUNT: Returns the number of array elements or object members
  • GET_MEMBERS: Returns the table of members of an object

You can already imagine how a combination of those calls can parse the JSON text defined above, let's have a look at the JSON again:

{ 
"field1": "abc", 
"field2": "cde", 
"field3": [
    {
     "field4": "aaaa", 
     "field5":{ 
           "1234":"8881", 
           "5678":"8893" 
          }
     },
     {
      "field4": "bbbb",  
      "field5":{ 
            "9876":"8881", 
            "7654":"8945",
            "4356":"7777"
          }
      } 
      ] 
}

The parsing process should iterate over the field3 entries (2 in this case), and for each entry, then iterate over the fields in field5 to get both the field name as well as the field value.
The number of field3 entries can be found with

APEX_JSON.GET_COUNT(p_path=>'field3',p_values=>j);

And the list of members of field5 with

APEX_JSON.GET_MEMBERS(p_path=>'field3[%d].field5',p_values=>j,p0=>i);

Note the p_path parameter set to field3[%d].field5 meaning that we want to extract the field5 from the nth row in field3. The rownumber is defined by p0=>i with i being the variable we use in our FOR loop.

The complete code is the following

DECLARE 
   j APEX_JSON.t_values; 
   r_count number;
   field5members   WWV_FLOW_T_VARCHAR2;
   p0 number;
   BrandId VARCHAR2(10);
BEGIN
APEX_JSON.parse(j,'<INSERT_JSON_STRING>');
# Getting number of field3 elements
r_count := APEX_JSON.GET_COUNT(p_path=>'field3',p_values=>j);
dbms_output.put_line('Nr Records: ' || r_count);

# Looping for each element in field3
FOR i IN 1 .. r_count LOOP
# Getting field5 members for the ith member of field3
 field5members := APEX_JSON.GET_MEMBERS(p_path=>'field3[%d].field5',p_values=>j,p0=>i);
# Looping all field5 members
 FOR q in 1 .. field5members.COUNT LOOP
# Extracting BrandId
   BrandId := APEX_JSON.GET_VARCHAR2(p_path=>'field3[%d].field5.'||field5members(q) ,p_values=>j,p0=>i);
# Printing BrandId and Product Id
   dbms_output.put_line('Product Id ="'||field5members(q)||'" BrandId="'||BrandId ||'"');
 END LOOP;
END LOOP;
   
END;

Note that, in order to extract the BrandId we used

APEX_JSON.GET_VARCHAR2(p_path=>'field3[%d].field5.'||field5members(q) ,p_values=>j,p0=>i);

Specifically the PATH is field3[%d].field5.'||field5members(q). As you can imagine we are appending the member name (field5members(q)) to the path described previously to extract the value, forming a string like field3[1].field5.1234 that will correctly extract the value associated.

Conclusion

Three things to save from this experience. The first is the usage of JSON_TABLE: with JSON_TABLE you can parse well-constructed JSON documents and it's very easy and powerful.
The second: APEX_JSON useful package to parse "not very well" constructed JSON documents, allows iteration across elements of JSON arrays and object members.
The last, which is becoming every day more relevant in my career, is the importance of networking and knowledge sharing: blogging, speaking at conferences, helping others in various channels allows you to know other people and be known with the nice side effect of sometimes being able with a single tweet to get help solving problems you may face!

Categories: BI & Warehousing

Making our way into Dremio

Wed, 2018-07-25 04:07

In an analytics system, we typically have an Operational Data Store (ODS) or staging layer; a performance layer or some data marts; and on top, there would be an exploration or reporting tool such as Tableau or Oracle's OBIEE. This architecture can lead to latency in decision making, creating a gap between analysis and action. Data preparation tools like Dremio can address this.

Dremio is a Data-as-a-Service platform allowing users to quickly query data, directly from the source or in any layer, regardless of its size or structure. The product makes use of Apache Arrow, allowing it to virtualise data through an in-memory layer, creating what is called a Data Reflection.

The intent of this post is an introduction to Dremio; it provides a step by step guide on how to query data from Amazon's S3 platform.

I wrote this post using my MacBook Pro, Dremio is supported on MacOS. To install it, I needed to make some configuration changes due to the Java version. The latest version of Dremio uses Java 1.8. If you have a more recent Java version installed, you’ll need to make some adjustments to the Dremio configuration files.

Lets start downloading Dremio and installing it. Dremio can be found for multiple platforms and we can download it from here.

Dremio uses Java 1.8, so if you have an early version please make sure you install java 1.8 and edit /Applications/Dremio.app/Contents/Java/dremio/conf/dremio-env to point to the directory where java 1.8 home is located.

After that you should be able to start Dremio as any other MacOs application and access http://localhost:9047

Image Description

Configuring S3 Source

Dremio can connect to relational databases (both commercial and open source), NoSQL, Hadoop, cloud storage, ElasticSearch, among others. However the scope of this post is to use a well known NoSQL storage S3 bucket (more details can be found here) and show the query capabilities of Dremio against unstructured data.

For this demo we're using Garmin CSV activity data that can be easily downloaded from Garmin activity page.

Here and example of a CSV Garmin activity. If you don't have a Garmin account you can always replicate the data above.

act,runner,Split,Time,Moving Time,Distance,Elevation Gain,Elev Loss,Avg Pace,Avg Moving Paces,Best Pace,Avg Run Cadence,Max Run Cadence,Avg Stride Length,Avg HR,Max HR,Avg Temperature,Calories
1,NMG,1,00:06:08.258,00:06:06.00,1,36,--,0:06:08  ,0:06:06  ,0:04:13  ,175.390625,193.0,92.89507499768523,--,--,--,65
1,NMG,2,00:10:26.907,00:10:09.00,1,129,--,0:10:26  ,0:10:08  ,0:06:02  ,150.140625,236.0,63.74555754497759,--,--,--,55

For user information data we have used the following dataset

runner,dob,name
JM,01-01-1900,Jon Mead
NMG,01-01-1900,Nelio Guimaraes

Add your S3 credentials to access

After configuring your S3 account all buckets associated to it, will be prompted under the new source area.

For this post I’ve created two buckets : nmgbuckettest and nmgdremiouser containing data that could be interpreted as a data mart

Image Description

nmgbuckettest - contains Garmin activity data that could be seen as a fact table in CSV format :

Act,Runner,Split,Time,Moving Time,Distance,Elevation Gain,Elev Loss,Avg Pace,Avg Moving Paces,Best Pace,Avg Run Cadence,Max Run Cadence,Avg Stride Length,Avg HR,Max HR,Avg Temperature,Calories

nmgdremiouser - contains user data that could be seen as a user dimension in a CSV format:

runner,dob,name

Creating datasets

After we add the S3 buckets we need to set up the CSV format. Dremio makes most of the work for us, however we had the need to adjust some fields, for example date formats or map a field as an integer.

By clicking on the gear icon we access the following a configuration panel where we can set the following options. Our CSV's were pretty clean so I've just change the line delimiter for \n and checked the option Extract Field Name

Lets do the same for the second set of CSV's (nmgdremiouser bucket)

Click in saving will drive us to a new panel where we can start performing some queries.

However as mentioned before at this stage we might want to adjust some fields. Right here I'll adapt the dob field from the nmgdremiouser bucket to be in the dd-mm-yyyy format.

Apply the changes and save the new dataset under the desire space.

Feel free to do the same for the nmgbuckettest CSV's. As part of my plan to make I'll call D_USER for the dataset coming from nmgdremiouser bucket and F_ACTIVITY for data coming from nmgbuckettest

Querying datasets

Now that we have D_USER and F_ACTIVITY datasets created we can start querying them and do some analysis.

This first analysis will tell us which runner climbs more during his activities:

SELECT round(nested_0.avg_elev_gain) AS avg_elev_gain, round(nested_0.max_elev_gain) AS max_elev_gain, round(nested_0.sum_elev_gain) as sum_elev_gain, join_D_USER.name AS name
FROM (
  SELECT avg_elev_gain, max_elev_gain, sum_elev_gain, runner
  FROM (
    SELECT AVG(to_number("Elevation Gain",'###')) as avg_elev_gain,
    MAX(to_number("Elevation Gain",'###')) as max_elev_gain,
    SUM(to_number("Elevation Gain",'###')) as sum_elev_gain,
    runner
    FROM dremioblogpost.F_ACTIVITY
    where "Elevation Gain" != '--'
    group by runner
  ) nested_0
) nested_0
 INNER JOIN dremioblogpost.D_USER AS join_D_USER ON nested_0.runner = join_D_USER.runner    

To enrich the example lets understand who is the fastest runner with analysis based on the total climbing

 SELECT round(nested_0.km_per_hour) AS avg_speed_km_per_hour, nested_0.total_climbing AS total_climbing_in_meters, join_D_USER.name AS name
FROM ( 
  SELECT km_per_hour, total_climbing, runner
  FROM (
    select avg(cast(3600.0/((cast(substr("Avg Moving Paces",3,2) as integer)*60)+cast(substr("Avg Moving Paces",6,2) as integer)) as float)) as km_per_hour,
        sum(cast("Elevation Gain" as integer)) total_climbing,
        runner
        from dremioblogpost.F_ACTIVITY
        where "Avg Moving Paces" != '--'
        and "Elevation Gain" != '--'
        group by runner
  ) nested_0
) nested_0
 INNER JOIN dremioblogpost.D_USER AS join_D_USER ON nested_0.runner = join_D_USER.runner

Conclusions

Dremio is an interesting tool capable of unifying existing repositories of unstructured data. Is Dremio capable of working with any volume of data and complex relationships? Well, I believe that right now the tool isn't capable of this, even with the simple and small data sets used in this example the performance was not great.

Dremio does successfully provide self service access to most platforms meaning that users don't have to move data around before being able to perform any analysis. This is probably the most exciting part of Dremio. It might well be in the paradigm of a "good enough" way to access data across multiple sources. This will allow data scientists to do analysis before the data is formally structured.

Categories: BI & Warehousing

Making our way into Dremio

Wed, 2018-07-25 04:07

In an analytics system, we typically have an Operational Data Store (ODS) or staging layer; a performance layer or some data marts; and on top, there would be an exploration or reporting tool such as Tableau or Oracle's OBIEE. This architecture can lead to latency in decision making, creating a gap between analysis and action. Data preparation tools like Dremio can address this.

Dremio is a Data-as-a-Service platform allowing users to quickly query data, directly from the source or in any layer, regardless of its size or structure. The product makes use of Apache Arrow, allowing it to virtualise data through an in-memory layer, creating what is called a Data Reflection.

The intent of this post is an introduction to Dremio; it provides a step by step guide on how to query data from Amazon's S3 platform.

I wrote this post using my MacBook Pro, Dremio is supported on MacOS. To install it, I needed to make some configuration changes due to the Java version. The latest version of Dremio uses Java 1.8. If you have a more recent Java version installed, you’ll need to make some adjustments to the Dremio configuration files.

Lets start downloading Dremio and installing it. Dremio can be found for multiple platforms and we can download it from here.

Dremio uses Java 1.8, so if you have an early version please make sure you install java 1.8 and edit /Applications/Dremio.app/Contents/Java/dremio/conf/dremio-env to point to the directory where java 1.8 home is located.

After that you should be able to start Dremio as any other MacOs application and access http://localhost:9047

Image Description

Configuring S3 Source

Dremio can connect to relational databases (both commercial and open source), NoSQL, Hadoop, cloud storage, ElasticSearch, among others. However the scope of this post is to use a well known NoSQL storage S3 bucket (more details can be found here) and show the query capabilities of Dremio against unstructured data.

For this demo we're using Garmin CSV activity data that can be easily downloaded from Garmin activity page.

Here and example of a CSV Garmin activity. If you don't have a Garmin account you can always replicate the data above.

act,runner,Split,Time,Moving Time,Distance,Elevation Gain,Elev Loss,Avg Pace,Avg Moving Paces,Best Pace,Avg Run Cadence,Max Run Cadence,Avg Stride Length,Avg HR,Max HR,Avg Temperature,Calories
1,NMG,1,00:06:08.258,00:06:06.00,1,36,--,0:06:08  ,0:06:06  ,0:04:13  ,175.390625,193.0,92.89507499768523,--,--,--,65
1,NMG,2,00:10:26.907,00:10:09.00,1,129,--,0:10:26  ,0:10:08  ,0:06:02  ,150.140625,236.0,63.74555754497759,--,--,--,55

For user information data we have used the following dataset

runner,dob,name
JM,01-01-1900,Jon Mead
NMG,01-01-1900,Nelio Guimaraes

Add your S3 credentials to access

After configuring your S3 account all buckets associated to it, will be prompted under the new source area.

For this post I’ve created two buckets : nmgbuckettest and nmgdremiouser containing data that could be interpreted as a data mart

Image Description

nmgbuckettest - contains Garmin activity data that could be seen as a fact table in CSV format :

Act,Runner,Split,Time,Moving Time,Distance,Elevation Gain,Elev Loss,Avg Pace,Avg Moving Paces,Best Pace,Avg Run Cadence,Max Run Cadence,Avg Stride Length,Avg HR,Max HR,Avg Temperature,Calories

nmgdremiouser - contains user data that could be seen as a user dimension in a CSV format:

runner,dob,name

Creating datasets

After we add the S3 buckets we need to set up the CSV format. Dremio makes most of the work for us, however we had the need to adjust some fields, for example date formats or map a field as an integer.

By clicking on the gear icon we access the following a configuration panel where we can set the following options. Our CSV's were pretty clean so I've just change the line delimiter for \n and checked the option Extract Field Name

Lets do the same for the second set of CSV's (nmgdremiouser bucket)

Click in saving will drive us to a new panel where we can start performing some queries.

However as mentioned before at this stage we might want to adjust some fields. Right here I'll adapt the dob field from the nmgdremiouser bucket to be in the dd-mm-yyyy format.

Apply the changes and save the new dataset under the desire space.

Feel free to do the same for the nmgbuckettest CSV's. As part of my plan to make I'll call D_USER for the dataset coming from nmgdremiouser bucket and F_ACTIVITY for data coming from nmgbuckettest

Querying datasets

Now that we have DUSER and FACTIVITY datasets created we can start querying them and do some analysis.

This first analysis will tell us which runner climbs more during his activities:

SELECT round(nested_0.avg_elev_gain) AS avg_elev_gain, round(nested_0.max_elev_gain) AS max_elev_gain, round(nested_0.sum_elev_gain) as sum_elev_gain, join_D_USER.name AS name
FROM (
  SELECT avg_elev_gain, max_elev_gain, sum_elev_gain, runner
  FROM (
    SELECT AVG(to_number("Elevation Gain",'###')) as avg_elev_gain,
    MAX(to_number("Elevation Gain",'###')) as max_elev_gain,
    SUM(to_number("Elevation Gain",'###')) as sum_elev_gain,
    runner
    FROM dremioblogpost.F_ACTIVITY
    where "Elevation Gain" != '--'
    group by runner
  ) nested_0
) nested_0
 INNER JOIN dremioblogpost.D_USER AS join_D_USER ON nested_0.runner = join_D_USER.runner    

To enrich the example lets understand who is the fastest runner with analysis based on the total climbing

 SELECT round(nested_0.km_per_hour) AS avg_speed_km_per_hour, nested_0.total_climbing AS total_climbing_in_meters, join_D_USER.name AS name
FROM ( 
  SELECT km_per_hour, total_climbing, runner
  FROM (
    select avg(cast(3600.0/((cast(substr("Avg Moving Paces",3,2) as integer)*60)+cast(substr("Avg Moving Paces",6,2) as integer)) as float)) as km_per_hour,
        sum(cast("Elevation Gain" as integer)) total_climbing,
        runner
        from dremioblogpost.F_ACTIVITY
        where "Avg Moving Paces" != '--'
        and "Elevation Gain" != '--'
        group by runner
  ) nested_0
) nested_0
 INNER JOIN dremioblogpost.D_USER AS join_D_USER ON nested_0.runner = join_D_USER.runner

Conclusions

Dremio is an interesting tool capable of unifying existing repositories of unstructured data. Is Dremio capable of working with any volume of data and complex relationships? Well, I believe that right now the tool isn't capable of this, even with the simple and small data sets used in this example the performance was not great.

Dremio does successfully provide self service access to most platforms meaning that users don't have to move data around before being able to perform any analysis. This is probably the most exciting part of Dremio. It might well be in the paradigm of a "good enough" way to access data across multiple sources. This will allow data scientists to do analysis before the data is formally structured.

Categories: BI & Warehousing

Oracle Analytics Cloud Workshop FAQ

Mon, 2018-07-23 07:59

A few weeks ago, I had the opportunity to present the Rittman Mead Oracle Analytics Cloud workshop in Oracle's head office in London. The aim of the workshop was to educate potential OAC customers and give them the tools and knowledge to decide whether or not OAC was the right solution for them. We had a great cross section of multiple industries (although telecoms were over represented!) and OBIEE familiarity. Together we came up with a series of questions that needed to be answered to help in the decision making process. In the coming workshops we will add more FAQ style posts to the blog to help flesh out the features of the product.

If you are interested in coming along to one of the workshops to get some hands on time with OAC, send an email to training@rittmanmead.com and we can give you the details.

Do Oracle provide a feature comparison list between OBIEE on premise and OAC?

Oracle do not provide a feature comparison between on-premise and OAC. However, Rittman Mead have done an initial comparison between OAC and traditional on-premise OBIEE 12c installations:

High Level
  • Enterprise Analytics is identical to 12c Analytics
  • Only two Actions available in OAC: Navigate to BI content, Navigate to Web page
  • BI Publisher is identical in 12c and OAC
  • Data Visualiser has additional features and a slightly different UI in OAC compared to 12c
BI Developer Client Tool for OAC
  • Looks exactly the same as the OBIEE client
  • Available only for Windows, straightforward installation
  • OAC IP address and BI Server port must be provided to create an ODBC data source
  • Allows to open and edit online the OAC model
  • Allows offline development. Snapshots interface used to upload it to OAC (it will completely replace existing model)
Data Modeler
  • Alternative tool to create and manage metadata models
  • Very easy to use, but limited compared to the BI Developer Client.
Catalog
  • It's possible to archive/unarchive catalog folders from on-premise to OAC.
BAR file
  • It's possible to create OAC bar files
  • It's possible to migrate OAC bar files to OBIEE 12c
Can you ever be charged by network usage, for example connection to an on premise data source using RDC?

Oracle will not charge you for network usage as things stand. Your charges come from the following:

  • Which version of OAC you have (Standard, Data Lake or Enterprise)
  • Whether you are using Pay-as-you-go or Monthly Commitments
  • The amount of disk space you have specified during provisioning
  • The combination of OCPU and RAM currently in use (size).
  • The up-time of your environment.

So for example an environment that has 1 OCPU with 7.5 GB RAM will cost less than an environment with 24 OCPUs with 180 GB RAM if they are up for the same amount of time, everything else being equal. This being said, there is an additional charge to the analytics license as a cloud database is required to configure and launch an analytics instance which should be taken into consideration when choosing Oracle Analytics Cloud.

Do you need to restart the OAC environment when you change the RAM and OCPU settings?

Configuring the number of OCPUs and associated RAM is done from the Analytics Service Console. This can be done during up time without a service restart, however the analytics service will be unavailable:

alt

PaaS Service Manager Command Line Interface (PSM Cli), which Francesco covered here, will allow this to be scripted and scheduled. An interesting use case for this would be to allow an increase in resources during month end processing where your concurrent users are at its highest, whilst in the quieter parts of the month you can scale back down.

This is done using the 'scale' command, this command takes a json file as a parameter which contains information about what the environment should look like. You will notice in the example below that the json file refers to an object called 'shape'; this is the combination of OCPU and RAM that you want the instance to scale to. Some examples of shapes are:

  • oc3 — 1 OCPU with 7.5 GB RAM
  • oc4 — 2 OCPUs with 15 GB RAM
  • oc5 — 4 OCPUs with 30 GB RAM
  • oc6 — 8 OCPUs with 60 GB RAM
  • oc7 — 16 OCPUs with 120 GB RAM
  • oc8 — 24 OCPUs with 180 GB RAM
  • oc9 — 32 OCPUs with 240 GB RAM

For example:

The following example scales the rittmanmead-analytics-prod service to the oc9 shape.

$ psm analytics scale -s rittmanmead-analytics-prod -c ~/oac-obiee/scale-to-monthend.json
where the JSON file contains the following:

{ "components" : { "BI" : "shape" : "oc9", "hosts":["rittmanmead-prod-1"] } } }

Oracle supply documentation for the commands required here: https://docs.oracle.com/en/cloud/paas/java-cloud/pscli/analytics-scale2.html .

How is high availability provisioned in Oracle Analytics Cloud?

Building a high available infrastructure in the cloud needs to take into consideration three main areas:

Server Failure: Oracle Analytics Cloud can be clustered, additional nodes (up to 10) can be added dynamically in the Cloud 'My Services' console should they need to be:

alt

It is also possible to provision a load balancer, as you can see from the screenshot below:

alt

Zone Failure: Sometimes it more than just a single server that causes the failure. Cloud architecture is built in server farms, which themselves can be network issues, power failures and weather anomalies. Oracle Analytics Cloud allows you to create an instance in a region, much like Amazons "availability zone". A sensible precaution would be to create a disaster recover environment in different region to your main prod environment, to help reduce costs this can be provisioned on the Pay-as-you-go license model and therefore only be chargeable when its being used.

Cloud Failure: Although rare, sometimes the cloud platform can fail. For example both your data centres that you have chosen to counter the previous point could be victim to a weather anomaly. Oracle Analytics Cloud allows you to take regular backups of your reports, dashboards and metadata which can be downloaded and stored off-cloud and re implemented in another 12c Environment.

In addition to these points, its advisable to automate and test everything. Oracle supply a very handy set of scripts and API called PaaS Service Manager Command Line Interface (PSM Cli) which can be used to achieve this. For example it can be used to automate backups, set up monitoring and alerting and finally and arguably most importantly it can be used to test your DR and HA infrastructure.

Can you push the user credentials down to the database?

At this point in time there is no way to configure database authentication providers in a similar way to Weblogic providors of the past. However, Oracle IDCS does have a REST API that could be used to simulate this functionality, documentation can be found here: https://docs.oracle.com/en/cloud/paas/identity-cloud/rest-api/OATOAuthClientWebApp.html

You can store user group memberships in a database and for your service’s authentication provider to access this information when authenticating a user's identity. You can use the script configure_bi_sql_group_provider to set up the provider and create the tables that you need (GROUPS and GROUPMEMBERS). After you run the script, you must populate the tables with your group and group member (user) information.

Group memberships that you derive from the SQL provider don't show up in the Users and Roles page in Oracle Analytics Cloud Console as you might expect but the member assignments work correctly.

alt

These tables are in the Oracle Database Cloud Service you configured for Oracle Analytics Cloud and in the schema created for your service. Unlike the on-premises equivalent functionality, you can’t change the location of these tables or the SQL that retrieves the results.
The script to achieve this is stored on the analytics server itself, and can be accessed using SSH (using the user 'opc') and the private keys that you created during the instance provisioning process. They are stored in: /bi/app/public/bin/configure_bi_sql_group_provider

Can you implement SSL certificates in Oracle Analytics Cloud?

The short answer is yes.

When Oracle Analytics Cloud instances are created, similarly to on-premise OBIEE instances, a a self-signed certificate is generated. The self-signed certificate is intended to be temporary and you must replace it with a new private key and a certificate signed by a certification authority. Doc ID 2334800.1 on support.oracle.com has the full details on how to implement this, but the high level steps (take from the document itself) are:

  • Associate a custom domain name against the public ip of your OAC instance
  • Get the custom SSL certificate from a Certificate Authority
  • Specify the DNS registered host name that you want to secure with SSL in servername.conf
  • Install Intermediate certificateRun the script to Register the new private key and server certificate
Can you implement Single Sign On (SSO) in Oracle Analytics Cloud?

Oracle Identity Cloud Service (IDCS) allows administrators to create security providors for OAC, much like the providors in on premise OBIEE weblogic providors. These can be created/edited to include single sign on URLs,Certificates etc, as shown in the screenshot below:

alt

Oracle support Doc ID 2399789.1 covers this in detail between Microsoft Azure AD and OAC, and is well worth the read.

Are RPD files (BAR files) backwards compatible?

This would depend what has changed between the releases. The different version numbers of OAC doesn't necessarily include changes to the OBIEE components themselves (e.g. it could just be an improvement to the 'My Services' UI). However, if there have been changes to the way the XML is formed in reports for example, these wont be compatible with different previous versions of the catalog. This all being said, the environments look like they can be upgraded at any time so you should be able to take a snapshot of your environment and upgrade it to match the newer version and then redeploy/refresh from your snapshot

How do you connect securely to AWS?

There doesn't seem to be any documentation on how exactly Visual Analyzer connects to Amazon Redshift using the 'Create Connection' wizard. However, there is an option to create an SSL ODBC connection to the Redshift database that can then be used to connect using the Visual Analyzer ODBC connection wizard:

alt

Can you still edit instanceconfig and nqsconfig files?

Yes you can, you need to use your ssh keys to sign into the box (using the user 'opc'). They are contained in the following locations:

/bi/domain/fmw/user_projects/domains/bi/config/fmwconfig/biconfig/OBIPS/instanceconfig.xml

/bi/domain/fmw/user_projects/domains/bi/config/fmwconfig/biconfig/OBIS/NQSConfig.INI

Its also worth mentioning that there is a guide here which explains where the responsibility lies should anything break during customisations of the platform.

Who is responsible for what regarding support?

Guide to Customer vs Oracle Management Responsibilities in Oracle Infrastructure and Platform Cloud Services (Doc ID 2309936.1)

Guide to Customer vs Oracle Management Responsibilities in Oracle Infrastructure and Platform Cloud Services

Categories: BI & Warehousing

Oracle Analytics Cloud Workshop FAQ

Mon, 2018-07-23 07:59

A few weeks ago, I had the opportunity to present the Rittman Mead Oracle Analytics Cloud workshop in Oracle's head office in London. The aim of the workshop was to educate potential OAC customers and give them the tools and knowledge to decide whether or not OAC was the right solution for them. We had a great cross section of multiple industries (although telecoms were over represented!) and OBIEE familiarity. Together we came up with a series of questions that needed to be answered to help in the decision making process. In the coming workshops we will add more FAQ style posts to the blog to help flesh out the features of the product.

If you are interested in coming along to one of the workshops to get some hands on time with OAC, send an email to training@rittmanmead.com and we can give you the details.

Do Oracle provide a feature comparison list between OBIEE on premise and OAC?

Oracle do not provide a feature comparison between on-premise and OAC. However, Rittman Mead have done an initial comparison between OAC and traditional on-premise OBIEE 12c installations:

High Level
  • Enterprise Analytics is identical to 12c Analytics
  • Only two Actions available in OAC: Navigate to BI content, Navigate to Web page
  • BI Publisher is identical in 12c and OAC
  • Data Visualiser has additional features and a slightly different UI in OAC compared to 12c
BI Developer Client Tool for OAC
  • Looks exactly the same as the OBIEE client
  • Available only for Windows, straightforward installation
  • OAC IP address and BI Server port must be provided to create an ODBC data source
  • Allows to open and edit online the OAC model
  • Allows offline development. Snapshots interface used to upload it to OAC (it will completely replace existing model)
Data Modeler
  • Alternative tool to create and manage metadata models
  • Very easy to use, but limited compared to the BI Developer Client.
Catalog
  • It's possible to archive/unarchive catalog folders from on-premise to OAC.
BAR file
  • It's possible to create OAC bar files
  • It's possible to migrate OAC bar files to OBIEE 12c
Can you ever be charged by network usage, for example connection to an on premise data source using RDC?

Oracle will not charge you for network usage as things stand. Your charges come from the following:

  • Which version of OAC you have (Standard, Data Lake or Enterprise)
  • Whether you are using Pay-as-you-go or Monthly Commitments
  • The amount of disk space you have specified during provisioning
  • The combination of OCPU and RAM currently in use (size).
  • The up-time of your environment.

So for example an environment that has 1 OCPU with 7.5 GB RAM will cost less than an environment with 24 OCPUs with 180 GB RAM if they are up for the same amount of time, everything else being equal. This being said, there is an additional charge to the analytics license as a cloud database is required to configure and launch an analytics instance which should be taken into consideration when choosing Oracle Analytics Cloud.

Do you need to restart the OAC environment when you change the RAM and OCPU settings?

Configuring the number of OCPUs and associated RAM is done from the Analytics Service Console. This can be done during up time without a service restart, however the analytics service will be unavailable:

alt

PaaS Service Manager Command Line Interface (PSM Cli), which Francesco covered here, will allow this to be scripted and scheduled. An interesting use case for this would be to allow an increase in resources during month end processing where your concurrent users are at its highest, whilst in the quieter parts of the month you can scale back down.

This is done using the 'scale' command, this command takes a json file as a parameter which contains information about what the environment should look like. You will notice in the example below that the json file refers to an object called 'shape'; this is the combination of OCPU and RAM that you want the instance to scale to. Some examples of shapes are:

  • oc3 — 1 OCPU with 7.5 GB RAM
  • oc4 — 2 OCPUs with 15 GB RAM
  • oc5 — 4 OCPUs with 30 GB RAM
  • oc6 — 8 OCPUs with 60 GB RAM
  • oc7 — 16 OCPUs with 120 GB RAM
  • oc8 — 24 OCPUs with 180 GB RAM
  • oc9 — 32 OCPUs with 240 GB RAM

For example:

The following example scales the rittmanmead-analytics-prod service to the oc9 shape.

$ psm analytics scale -s rittmanmead-analytics-prod -c ~/oac-obiee/scale-to-monthend.json where the JSON file contains the following:

{ "components" : { "BI" : "shape" : "oc9", "hosts":["rittmanmead-prod-1"] } } }

Oracle supply documentation for the commands required here: https://docs.oracle.com/en/cloud/paas/java-cloud/pscli/analytics-scale2.html .

How is high availability provisioned in Oracle Analytics Cloud?

Building a high available infrastructure in the cloud needs to take into consideration three main areas:

Server Failure: Oracle Analytics Cloud can be clustered, additional nodes (up to 10) can be added dynamically in the Cloud 'My Services' console should they need to be:

alt

It is also possible to provision a load balancer, as you can see from the screenshot below: 

alt

Zone Failure: Sometimes it more than just a single server that causes the failure. Cloud architecture is built in server farms, which themselves can be network issues, power failures and weather anomalies. Oracle Analytics Cloud allows you to create an instance in a region, much like Amazons "availability zone". A sensible precaution would be to create a disaster recover environment in different region to your main prod environment, to help reduce costs this can be provisioned on the Pay-as-you-go license model and therefore only be chargeable when its being used.

Cloud Failure: Although rare, sometimes the cloud platform can fail. For example both your data centres that you have chosen to counter the previous point could be victim to a weather anomaly. Oracle Analytics Cloud allows you to take regular backups of your reports, dashboards and metadata which can be downloaded and stored off-cloud and re implemented in another 12c Environment. 

In addition to these points, its advisable to automate and test everything. Oracle supply a very handy set of scripts and API called PaaS Service Manager Command Line Interface (PSM Cli) which can be used to achieve this. For example it can be used to automate backups, set up monitoring and alerting and finally and arguably most importantly it can be used to test your DR and HA infrastructure.

Can you push the user credentials down to the database?

At this point in time there is no way to configure database authentication providers in a similar way to Weblogic providors of the past. However, Oracle IDCS does have a REST API that could be used to simulate this functionality, documentation can be found here: https://docs.oracle.com/en/cloud/paas/identity-cloud/rest-api/OATOAuthClientWebApp.html

You can store user group memberships in a database and for your service’s authentication provider to access this information when authenticating a user's identity. You can use the script configure_bi_sql_group_provider to set up the provider and create the tables that you need (GROUPS and GROUPMEMBERS). After you run the script, you must populate the tables with your group and group member (user) information.

Group memberships that you derive from the SQL provider don't show up in the Users and Roles page in Oracle Analytics Cloud Console as you might expect but the member assignments work correctly.

alt

These tables are in the Oracle Database Cloud Service you configured for Oracle Analytics Cloud and in the schema created for your service. Unlike the on-premises equivalent functionality, you can’t change the location of these tables or the SQL that retrieves the results.
The script to achieve this is stored on the analytics server itself, and can be accessed using SSH (using the user 'opc') and the private keys that you created during the instance provisioning process. They are stored in: /bi/app/public/bin/configurebisqlgroupprovider

Can you implement SSL certificates in Oracle Analytics Cloud?

The short answer is yes.

When Oracle Analytics Cloud instances are created, similarly to on-premise OBIEE instances, a a self-signed certificate is generated. The self-signed certificate is intended to be temporary and you must replace it with a new private key and a certificate signed by a certification authority. Doc ID 2334800.1 on support.oracle.com has the full details on how to implement this, but the high level steps (take from the document itself) are:

  • Associate a custom domain name against the public ip of your OAC instance
  • Get the custom SSL certificate from a Certificate Authority
  • Specify the DNS registered host name that you want to secure with SSL in servername.conf
  • Install Intermediate certificateRun the script to Register the new private key and server certificate
Can you implement Single Sign On (SSO) in Oracle Analytics Cloud?

Oracle Identity Cloud Service (IDCS) allows administrators to create security providors for OAC, much like the providors in on premise OBIEE weblogic providors. These can be created/edited to include single sign on URLs,Certificates etc, as shown in the screenshot below:

alt

Oracle support Doc ID 2399789.1 covers this in detail between Microsoft Azure AD and OAC, and is well worth the read.

Are RPD files (BAR files) backwards compatible?

This would depend what has changed between the releases. The different version numbers of OAC doesn't necessarily include changes to the OBIEE components themselves (e.g. it could just be an improvement to the 'My Services' UI). However, if there have been changes to the way the XML is formed in reports for example, these wont be compatible with different previous versions of the catalog. This all being said, the environments look like they can be upgraded at any time so you should be able to take a snapshot of your environment and upgrade it to match the newer version and then redeploy/refresh from your snapshot

How do you connect securely to AWS?

There doesn't seem to be any documentation on how exactly Visual Analyzer connects to Amazon Redshift using the 'Create Connection' wizard. However, there is an option to create an SSL ODBC connection to the Redshift database that can then be used to connect using the Visual Analyzer ODBC connection wizard:

alt

Can you still edit instanceconfig and nqsconfig files?

Yes you can, you need to use your ssh keys to sign into the box (using the user 'opc'). They are contained in the following locations:

/bi/domain/fmw/user_projects/domains/bi/config/fmwconfig/biconfig/OBIPS/instanceconfig.xml

/bi/domain/fmw/user_projects/domains/bi/config/fmwconfig/biconfig/OBIS/NQSConfig.INI

Its also worth mentioning that there is a guide here which explains where the responsibility lies should anything break during customisations of the platform.

Who is responsible for what regarding support?

Guide to Customer vs Oracle Management Responsibilities in Oracle Infrastructure and Platform Cloud Services (Doc ID 2309936.1)

Guide to Customer vs Oracle Management Responsibilities in Oracle Infrastructure and Platform Cloud Services

Categories: BI & Warehousing

DevOps in OAC: Scripting Oracle Cloud Instance Management with PSM Cli

Mon, 2018-06-25 10:18
 Scripting Oracle Cloud Instance Management with PSM Cli

This summer we unselfish Italians decided to not participate to the World Cup to give another country the opportunity to win (good luck with that England!). This decision, which I strongly support, gives me lot of time for blogging!
 Scripting Oracle Cloud Instance Management with PSM Cli

As already written, two weeks ago while in Orlando for Kscope18, I presented a session about DevOps and OBIEE focusing on how to properly source control, promote and test for regression any component of the infrastructure.

 Scripting Oracle Cloud Instance Management with PSM Cli

Development Isolation

One key aspect of DevOps is providing the Development Isolation: a way of allowing multiple development streams to work independently and merging the outcome of the process into the main environment only after this has been tested and validated. This is needed to avoid the standard situation where code promotions are blocked due to different working streams not being in sync: forcing a team to postpone a code release just because another team doesn't have the UAT OK is just an example of non-isolated development platforms.

 Scripting Oracle Cloud Instance Management with PSM Cli

We have been discussing development isolation topic in the past focusing mainly on concurrent repository development and how to integrate it with versioning tools like Git and SVN. The concurrent online editing option is not viable since multiple developers are modifying the same artifact (RPD) without a way of testing for regression the changes or to verifying that what has been done is correct before merging the changes in the RPD.
Alternative solutions of using MUDE (default multi-user development method provided by the Admintool) or pure offline RPD work encounter the same problems defined above: no feature or regression testing available before merging the RPD in the main development environment.

Different RPD development techniques solve only partially the problem: almost any OAC/OBIEE development consist at least in both RPD and catalog work (creation of analysis/dashboards/VA projects) we need an approach which provides Development Isolation at both levels. The solution, in order to properly build a DevOps framework around OAC/OBIEE, it's to provide isolated feature-related full OBIEE instances where the RPD can be edited in online mode, the catalog work can be done independently, and the overall result can be tested and validated before being merged into the common development environment.

Feature-Related Instances

The feature instances, as described above, need to be full OAC/OBIEE development instances where only a feature (or a small set) is worked at the time in order to give the agility to developers to release the code as soon as it's ready and tested. In the on-premises world this can "easily" be achieved by providing a number of dedicated Virtual Machines or, more in line with the recent trends, an automated instance provisioning with Docker using a template image like the one built by our previous colleague Gianni Ceresa.

 Scripting Oracle Cloud Instance Management with PSM Cli

However, when we think about Oracle Analytics Cloud (OAC), we seem to have two problems:

  • There is a cost associated with every instance, thus minimizing the number of instances and the uptime is necessary
  • The OAC provisioning interface is point and click, thus automating the instance management seems impossible

The overall OAC instance cost can be mitigated by the Bring Your Own License (BYOL) licensing method which allows customers to migrate on-premises licenses to the cloud and have discounted prices on the hourly/monthly instance cost (more details here). However, since the target is to minimize the cost thus the # of instances and the uptime, we need to find a way to do so that doesn't rely on a human and a point and click interface. Luckily the PaaS Service Manager Command Line Interface (PSM Cli) allows us to solve this problem by providing a scriptable way of creating, starting and stopping instances.

PaaS Service Manager Command Line Interface

PSMCLI is a command line interface acting as a wrapper over the PaaS REST APIs. Its usage is not limited to OAC, the same interface can be used to create and manage instances of the Oracle's Database Cloud Service or Java Cloud Services amongst the others.
When talking about OAC please keep in mind that, as of now, PSM Cli works only with the non-autonomous version but I believe the Autonomous support will be added soon.

Installing and Configuring PSM Cli

PSMCLI has two prerequisites before it can be installed:

  • cURL - a command line utility to transfer data with URLs
  • Python 3.3 or later

Once both prerequisites are installed PSM can easily be downloaded with the following cURL call

curl -X GET -u <USER>:<PWD> -H X-ID-TENANT-NAME:<IDENTITY_DOMAIN> https://<REST_SERVER>/paas/core/api/v1.1/cli/<IDENTITY_DOMAIN>/client -o psmcli.zip

Where

  • <USER> and <PWD> are the credentials
  • <IDENTITY_DOMAIN> is the Identity Domain ID specified during the account creation
  • <REST_SERVER> is the REST API server name which is:
  • psm.us.oraclecloud.com if you are using a US datacenter
  • psm.aucom.oraclecloud.com if you are in the AuCom region
  • psm.europe.oraclecloud.com otherwise

Next step is to install PSM as a Python package with

pip3 install -U psmcli.zip

After the installation is time for configuration

psm setup

The configuration command will request the following information:

  • Oracle Cloud Username and Password
  • Identity Domain
  • Region, this need to be set to
  • emea if the REST_SERVER mentioned above contains emea
  • aucom if the REST_SERVER mentioned above contains aucom
  • us otherwise
  • Output format: the choice is between short, json and html
  • OAuth: the communication between the CLI and the REST API can use basic authentication (flag n) or OAuth (flag y). If OAuth is chosen then ClientID, Secret and Access Token need to be specified

A JSON profile file can also be used to provide the same information mentioned above. The structure of the file is the following

{ 
    "username":"<USER>",
    "password":"<PASSWORD>",
    "identityDomain":"<IDENTITY_DOMAIN>",
    "region":"<REGION>",
    "outputFormat":"<OUTPUT_FORMAT>",
    "oAuth":{ 
        "clientId":"",
        "clientSecret":"",
        "accessTokenServer":""
    }
}

If the profile is stored in a file profile.json the PSM configuration can be achieved by just executing

psm setup -c profile.json

One quick note: the identity domain Id, shown in the Oracle Cloud header, isn't working if it's not the original name (name at the time of the creation).

 Scripting Oracle Cloud Instance Management with PSM Cli

In order to get the correct identity domain Id to use, check in an Oracle Cloud instance already created (e.g. a database one) and check the Details, you'll see the original identity domain listed there (credits to Pieter Van Puymbroeck).

 Scripting Oracle Cloud Instance Management with PSM Cli

Working With PSM Cli

Once the PSM has been correctly configured it's time to start checking what options are available, for a detailed list of the options check PSM documentation.

The PSM commands are product related, so each command is in the form:

psm <product> <command> <parameters>

Where

  • product is the Oracle cloud product e.g. dbcs, analytics, BigDataAppliance, for a complete list use psm help
  • command is the action to be executed against the product e.g. services, stop, start, create-service
  • parameters is the list of parameters to pass depending on the command executed

The first step is to check what instances I already created, I can do so for the database by executing

psm dbcs services

which, as expected, will list all my active instances

 Scripting Oracle Cloud Instance Management with PSM Cli

I can then start and stop it using:

psm dbcs start/stop/restart -s <INSTANCE_NAME>

Which in my example provides the Id of the Job assigned to the stop operation.

 Scripting Oracle Cloud Instance Management with PSM Cli

When I check the status via the service command I get Maintenance like in the web UI.

 Scripting Oracle Cloud Instance Management with PSM Cli

The same applies to the start and restart operation. Please keep in mind that all the calls are asynchronous -> the command will call the related REST API and then return the associated Job ID without waiting for the command to be finished. The status of a job can be checked with:

psm dbcs operation-status -j <JOB_ID>

 Scripting Oracle Cloud Instance Management with PSM Cli

The same operations described above are available on OAC with the same commands by simply changing the product from dbcs to analytics like:

psm analytics start/stop/restart -s <INSTANCE_NAME>

On top of the basic operation, PSM Cli allows also the following:

  • Service Instance: start/stop/restart, instance creation-deletion
  • Access Control: lists, creates, deletes, enables and disables access rules for a service.
  • Scaling: changes the computer shape of an instance and allows scaling up/down.
  • Storage: extends the storage associated to OAC
  • Backup Configuration: updates/shows the backup configurations
  • Backups: lists, creates, deletes backups of the instance
  • Restore: restores a backup giving detailed information about it and the history of Restores
  • Patches: allows patching, rollbacking, doing pre-checks, and retrieving patching history
Creating an OAC Instance

So far we discussed the maintenance on already created instances with start/stop/restart commands, but PSM Cli allows also the creation of an instance via command line. The call is pretty simple:

psm analytics create-service -c <CONFIG_FILE> -of <OUTPUT_FORMAT>

Where

  • CONFIG_FILE: is the file defining all OAC instance configurations
  • OUTPUT_FORMAT: is the desired output format between short, json and html

The question now is:

How do I create a Config File for OAC?

The documentation doesn't provide any help on this, but we can use the same approach as for on-premises OBIEE and response file: create the first instance with the Web-UI, save the payload for future use and change parameters when necessary.

 Scripting Oracle Cloud Instance Management with PSM Cli

On the Confirm screen, there is the option to Download the REST payload in JSON format

 Scripting Oracle Cloud Instance Management with PSM Cli

With the resulting json Config File being

{
  "edition": "<EDITION>",
  "vmPublicKeyText": "<SSH_TOKEN>",
  "enableNotification": "true",
  "notificationEmail": "<EMAIL>",
  "serviceVersion": "<VERSION>",
  "isBYOL": "false",
  "components": {
    "BI": {
      "adminUserPassword": "<ADMINPWD>",
      "adminUserName": "<ADMINUSER>",
      "analyticsStoragePassword": "<PWD>",
      "shape": "oc3",
      "createAnalyticsStorageContainer": "true",
      "profile_essbase": "false",
      "dbcsPassword": "<DBCSPWD>",
      "totalAnalyticsStorage": "280.0",
      "profile_bi": "true",
      "profile_dv_forced": "true",
      "analyticsStorageUser": "<EMAIL>",
      "dbcsUserName": "<DBUSER>",
      "dbcsPDBName": "<PDBNAME>",
      "dbcsName": "<DBCSNAME>",
      "idcs_enabled": "false",
      "analyticsStorageContainerURL": "<STORAGEURL>",
      "publicStorageEnabled": "false",
      "usableAnalyticsStorage": "180"
    }
  },
  "serviceLevel": "PAAS",
  "meteringFrequency": "HOURLY",
  "subscriptionId": "<SUBSCRIPTIONID>",
  "serviceName": "<SERVICENAME>"
}

This file can be stored and the parameters changed as necessary to create new OAC instances with the command:

psm analytics create-service -c <JSON_PAYLOAD_FILE> -of short/json/html

As shown previously, the result of the call is a Job Id that can be monitored with

psm analytics operation-status -j <JOB_ID>

Once the Job is finished successfully, the OAC instance is ready to be used. If at a certain point, the OAC instance is not needed anymore, it can be deleted via:

psm analytics delete-service -s <SERVICE_NAME> -n <DBA_NAME> -p <DBA_PWD>

Where

  • SERVICE_NAME is the OAC instance name
  • DBA_NAME and DBA_PWD are the DBA credentials where OAC schemas are residing
Summary

Worried about providing development isolation in OAC while keeping the costs down? Not anymore! With PSM Cli you now have a way of creating instances on demand, start/stop, up/down scaling with a command line tool easily integrable with automation tools like Jenkins.

Create an OAC instances automatically only when features need to be developed or tested, stop&start the instances based on your workforce timetables, take the benefit of the cloud minimizing the cost associated to it by using PSM Cli!

One last note; for a full DevOps OAC implementation, PSM Cli is not sufficient: tasks like automated regression testing, code versioning, and promotion can't be managed directly with PSM Cli but require usage of external toolsets like Rittman Mead BI Developer Toolkit. If you are interested in a full DevOps implementation on OAC and understanding the details on how PSM Cli can be used in conjunction with Rittman Mead BI Developer Toolkit don't hesitate to contact us!

Categories: BI & Warehousing

DevOps in OAC: Scripting Oracle Cloud Instance Management with PSM Cli

Mon, 2018-06-25 10:18
 Scripting Oracle Cloud Instance Management with PSM Cli

This summer we unselfish Italians decided to not participate to the World Cup to give another country the opportunity to win (good luck with that England!). This decision, which I strongly support, gives me lot of time for blogging!
 Scripting Oracle Cloud Instance Management with PSM Cli

As already written, two weeks ago while in Orlando for Kscope18, I presented a session about DevOps and OBIEE focusing on how to properly source control, promote and test for regression any component of the infrastructure.

 Scripting Oracle Cloud Instance Management with PSM Cli

Development Isolation

One key aspect of DevOps is providing the Development Isolation: a way of allowing multiple development streams to work independently and merging the outcome of the process into the main environment only after this has been tested and validated. This is needed to avoid the standard situation where code promotions are blocked due to different working streams not being in sync: forcing a team to postpone a code release just because another team doesn't have the UAT OK is just an example of non-isolated development platforms.

 Scripting Oracle Cloud Instance Management with PSM Cli

We have been discussing development isolation topic in the past focusing mainly on concurrent repository development and how to integrate it with versioning tools like Git and SVN. The concurrent online editing option is not viable since multiple developers are modifying the same artifact (RPD) without a way of testing for regression the changes or to verifying that what has been done is correct before merging the changes in the RPD.
Alternative solutions of using MUDE (default multi-user development method provided by the Admintool) or pure offline RPD work encounter the same problems defined above: no feature or regression testing available before merging the RPD in the main development environment.

Different RPD development techniques solve only partially the problem: almost any OAC/OBIEE development consist at least in both RPD and catalog work (creation of analysis/dashboards/VA projects) we need an approach which provides Development Isolation at both levels. The solution, in order to properly build a DevOps framework around OAC/OBIEE, it's to provide isolated feature-related full OBIEE instances where the RPD can be edited in online mode, the catalog work can be done independently, and the overall result can be tested and validated before being merged into the common development environment.

Feature-Related Instances

The feature instances, as described above, need to be full OAC/OBIEE development instances where only a feature (or a small set) is worked at the time in order to give the agility to developers to release the code as soon as it's ready and tested. In the on-premises world this can "easily" be achieved by providing a number of dedicated Virtual Machines or, more in line with the recent trends, an automated instance provisioning with Docker using a template image like the one built by our previous colleague Gianni Ceresa.

 Scripting Oracle Cloud Instance Management with PSM Cli

However, when we think about Oracle Analytics Cloud (OAC), we seem to have two problems:

  • There is a cost associated with every instance, thus minimizing the number of instances and the uptime is necessary
  • The OAC provisioning interface is point and click, thus automating the instance management seems impossible

The overall OAC instance cost can be mitigated by the Bring Your Own License (BYOL) licensing method which allows customers to migrate on-premises licenses to the cloud and have discounted prices on the hourly/monthly instance cost (more details here). However, since the target is to minimize the cost thus the # of instances and the uptime, we need to find a way to do so that doesn't rely on a human and a point and click interface. Luckily the PaaS Service Manager Command Line Interface (PSM Cli) allows us to solve this problem by providing a scriptable way of creating, starting and stopping instances.

PaaS Service Manager Command Line Interface

PSMCLI is a command line interface acting as a wrapper over the PaaS REST APIs. Its usage is not limited to OAC, the same interface can be used to create and manage instances of the Oracle's Database Cloud Service or Java Cloud Services amongst the others.
When talking about OAC please keep in mind that, as of now, PSM Cli works only with the non-autonomous version but I believe the Autonomous support will be added soon.

Installing and Configuring PSM Cli

PSMCLI has two prerequisites before it can be installed:

  • cURL - a command line utility to transfer data with URLs
  • Python 3.3 or later

Once both prerequisites are installed PSM can easily be downloaded with the following cURL call

curl -X GET -u <USER>:<PWD> -H X-ID-TENANT-NAME:<IDENTITY_DOMAIN> https://<REST_SERVER>/paas/core/api/v1.1/cli/<IDENTITY_DOMAIN>/client -o psmcli.zip

Where

  • <USER> and <PWD> are the credentials
  • <IDENTITY_DOMAIN> is the Identity Domain ID specified during the account creation
  • <REST_SERVER> is the REST API server name which is:
    • psm.us.oraclecloud.com if you are using a US datacenter
    • psm.aucom.oraclecloud.com if you are in the AuCom region
    • psm.europe.oraclecloud.com otherwise

Next step is to install PSM as a Python package with

pip3 install -U psmcli.zip  

After the installation is time for configuration

psm setup  

The configuration command will request the following information:

  • Oracle Cloud Username and Password
  • Identity Domain
  • Region, this need to be set to
    • emea if the RESTSERVER mentioned above contains emea
    • aucom if the RESTSERVER mentioned above contains aucom
    • us otherwise
  • Output format: the choice is between short, json and html
  • OAuth: the communication between the CLI and the REST API can use basic authentication (flag n) or OAuth (flag y). If OAuth is chosen then ClientID, Secret and Access Token need to be specified

A JSON profile file can also be used to provide the same information mentioned above. The structure of the file is the following

{ 
    "username":"<USER>",
    "password":"<PASSWORD>",
    "identityDomain":"<IDENTITY_DOMAIN>",
    "region":"<REGION>",
    "outputFormat":"<OUTPUT_FORMAT>",
    "oAuth":{ 
        "clientId":"",
        "clientSecret":"",
        "accessTokenServer":""
    }
}

If the profile is stored in a file profile.json the PSM configuration can be achieved by just executing

psm setup -c profile.json  

One quick note: the identity domain Id, shown in the Oracle Cloud header, isn't working if it's not the original name (name at the time of the creation).

 Scripting Oracle Cloud Instance Management with PSM Cli

In order to get the correct identity domain Id to use, check in an Oracle Cloud instance already created (e.g. a database one) and check the Details, you'll see the original identity domain listed there (credits to Pieter Van Puymbroeck).

 Scripting Oracle Cloud Instance Management with PSM Cli

Working With PSM Cli

Once the PSM has been correctly configured it's time to start checking what options are available, for a detailed list of the options check PSM documentation.

The PSM commands are product related, so each command is in the form:

psm <product> <command> <parameters>  

Where

  • product is the Oracle cloud product e.g. dbcs, analytics, BigDataAppliance, for a complete list use psm help
  • command is the action to be executed against the product e.g. services, stop, start, create-service
  • parameters is the list of parameters to pass depending on the command executed

The first step is to check what instances I already created, I can do so for the database by executing

psm dbcs services  

which, as expected, will list all my active instances

 Scripting Oracle Cloud Instance Management with PSM Cli

I can then start and stop it using:

psm dbcs start/stop/restart -s <INSTANCE_NAME>  

Which in my example provides the Id of the Job assigned to the stop operation.

 Scripting Oracle Cloud Instance Management with PSM Cli

When I check the status via the service command I get Maintenance like in the web UI.

 Scripting Oracle Cloud Instance Management with PSM Cli

The same applies to the start and restart operation. Please keep in mind that all the calls are asynchronous -> the command will call the related REST API and then return the associated Job ID without waiting for the command to be finished. The status of a job can be checked with:

psm dbcs operation-status -j <JOB_ID>  

 Scripting Oracle Cloud Instance Management with PSM Cli

The same operations described above are available on OAC with the same commands by simply changing the product from dbcs to analytics like:

psm analytics start/stop/restart -s <INSTANCE_NAME>  

On top of the basic operation, PSM Cli allows also the following:

  • Service Instance: start/stop/restart, instance creation-deletion
  • Access Control: lists, creates, deletes, enables and disables access rules for a service.
  • Scaling: changes the computer shape of an instance and allows scaling up/down.
  • Storage: extends the storage associated to OAC
  • Backup Configuration: updates/shows the backup configurations
  • Backups: lists, creates, deletes backups of the instance
  • Restore: restores a backup giving detailed information about it and the history of Restores
  • Patches: allows patching, rollbacking, doing pre-checks, and retrieving patching history
Creating an OAC Instance

So far we discussed the maintenance on already created instances with start/stop/restart commands, but PSM Cli allows also the creation of an instance via command line. The call is pretty simple:

psm analytics create-service -c <CONFIG_FILE> -of <OUTPUT_FORMAT>  

Where

  • CONFIG_FILE: is the file defining all OAC instance configurations
  • OUTPUT_FORMAT: is the desired output format between short, json and html

The question now is:

How do I create a Config File for OAC?

The documentation doesn't provide any help on this, but we can use the same approach as for on-premises OBIEE and response file: create the first instance with the Web-UI, save the payload for future use and change parameters when necessary.

 Scripting Oracle Cloud Instance Management with PSM Cli

On the Confirm screen, there is the option to Download the REST payload in JSON format

 Scripting Oracle Cloud Instance Management with PSM Cli

With the resulting json Config File being

{
  "edition": "<EDITION>",
  "vmPublicKeyText": "<SSH_TOKEN>",
  "enableNotification": "true",
  "notificationEmail": "<EMAIL>",
  "serviceVersion": "<VERSION>",
  "isBYOL": "false",
  "components": {
    "BI": {
      "adminUserPassword": "<ADMINPWD>",
      "adminUserName": "<ADMINUSER>",
      "analyticsStoragePassword": "<PWD>",
      "shape": "oc3",
      "createAnalyticsStorageContainer": "true",
      "profile_essbase": "false",
      "dbcsPassword": "<DBCSPWD>",
      "totalAnalyticsStorage": "280.0",
      "profile_bi": "true",
      "profile_dv_forced": "true",
      "analyticsStorageUser": "<EMAIL>",
      "dbcsUserName": "<DBUSER>",
      "dbcsPDBName": "<PDBNAME>",
      "dbcsName": "<DBCSNAME>",
      "idcs_enabled": "false",
      "analyticsStorageContainerURL": "<STORAGEURL>",
      "publicStorageEnabled": "false",
      "usableAnalyticsStorage": "180"
    }
  },
  "serviceLevel": "PAAS",
  "meteringFrequency": "HOURLY",
  "subscriptionId": "<SUBSCRIPTIONID>",
  "serviceName": "<SERVICENAME>"
}

This file can be stored and the parameters changed as necessary to create new OAC instances with the command:

psm analytics create-service -c <JSON_PAYLOAD_FILE> -of short/json/html  

As shown previously, the result of the call is a Job Id that can be monitored with

psm analytics operation-status -j <JOB_ID>  

Once the Job is finished successfully, the OAC instance is ready to be used. If at a certain point, the OAC instance is not needed anymore, it can be deleted via:

psm analytics delete-service -s <SERVICE_NAME> -n <DBA_NAME> -p <DBA_PWD>  

Where

  • SERVICE_NAME is the OAC instance name
  • DBA_NAME and DBA_PWD are the DBA credentials where OAC schemas are residing
Summary

Worried about providing development isolation in OAC while keeping the costs down? Not anymore! With PSM Cli you now have a way of creating instances on demand, start/stop, up/down scaling with a command line tool easily integrable with automation tools like Jenkins.

Create an OAC instances automatically only when features need to be developed or tested, stop&start the instances based on your workforce timetables, take the benefit of the cloud minimizing the cost associated to it by using PSM Cli!

One last note; for a full DevOps OAC implementation, PSM Cli is not sufficient: tasks like automated regression testing, code versioning, and promotion can't be managed directly with PSM Cli but require usage of external toolsets like Rittman Mead BI Developer Toolkit. If you are interested in a full DevOps implementation on OAC and understanding the details on how PSM Cli can be used in conjunction with Rittman Mead BI Developer Toolkit don't hesitate to contact us!

Categories: BI & Warehousing

Kscope18: It's a Wrap!

Thu, 2018-06-21 08:23
 It's a Wrap!

As announced few weeks back I represented Rittman Mead at ODTUG's Kscope18 hosted in the magnificent Walt Disney World Dolphin Resort. It's always hard to be credible when telling people you are going to Disneyworld for work but Kscope is a must-go event if you are in the Oracle landscape.

 It's a Wrap!

In the Sunday symposium Oracle PMs share hints about the products latest capabilities and roadmaps, then three full days of presentations spanning from the traditional Database, EPM and BI tracks to the new entries like Blockchain. On top of this the opportunity to be introduced to a network of Oracle experts including Oracle ACEs and Directors, PMs and people willing to share their experience with Oracle (and other) tools.

Sunday Symposium and Presentations

I attended the Oracle Analytics (BI and Essbase) Sunday Symposium run by Gabby Rubin and Matt Milella from Oracle. It was interesting to see the OAC product enhancements and roadmap as well as the feature catch-up in the latest release of OBIEE on-premises (version 12.2.1.4.0).

As expected, most of the push is towards OAC (Oracle Analytics Cloud): all new features will be developed there and eventually (but assurance on this) ported in the on-premises version. This makes a lot of sense from Oracle's point of view since it gives them the ability to produce new features quickly since they need to be tested only against a single set of HW/SW rather than the multitude they are supporting on-premises.

Most of the enhancements are expected in the Mode 2/Self Service BI area covered by Oracle Analytics Cloud Standard since a) this is the overall trend of the BI industry b) the features requested by traditional dashboard style reporting are well covered by OBIEE.
The following are just few of the items you could expect in future versions:

  • Recommendations during the data preparation phase like GeoLocation and Date enrichments
  • Data Flow enhancements like incremental updates or parametrized data-flows
  • New Visualizations and in general more control over the settings of the single charts.

In general Oracle's idea is to provide a single tool that meets both the needs of Mode 1 and Mode 2 Analytics (Self Service vs Centralized) rather than focusing on solving one need at a time like other vendors do.

Special mention to the Oracle Autonomous Analytics Cloud, released few weeks ago, that differs from traditional OAC for the fact that backups, patching and service monitoring are now managed automatically by Oracle thus releasing the customer from those tasks.

During the main conference days (mon-wed) I assisted a lot of very insightful presentations and the Oracle ACE Briefing which gave me ideas for future blog posts, so stay tuned! As written previously I had two sessions accepted for Kscope18: "Visualizing Streams" and "DevOps and OBIEE: Do it Before it's too late", in the following paragraph I'll share details (and link to the slides) of both.

Visualizing Streams

One of the latest trends in the data and analytics space is the transition from the old style batch based reporting systems which by design were adding a delay between the event creation and the appearance in the reporting to the concept of streaming: ingesting and delivering event information and analytics as soon as the event is created.

 It's a Wrap!

The session explains how the analytics space changed in recent times providing details on how to setup a modern analytical platform which includes streaming technologies like Apache Kafka, SQL based enrichment tools like Confluent's KSQL and connections to Self Service BI tools like Oracle's Data Visualization via sql-on-Hadoop technologies like Apache Drill. The slides of the session are available here.

DevOps and OBIEE: Do it Before it's Too Late

In the second session, slides here, I've been initially going through the motivations of applying DevOps principles to OBIEE: the self service BI wave started as a response to the long time to delivery associated with the old school centralized reporting projects. Huge monolithic sets of requirements to be delivered, no easy way to provide development isolation, manual testing and code promotion were only few of the stoppers for a fast delivery.

 It's a Wrap!

After an initial analysis of the default OBIEE developments methods, the presentation explains how to apply DevOps principles to an OBIEE (or OAC) environment and precisely:

  • Code versioning techniques
  • Feature-driven environment creation
  • Automated promotion
  • Automated regression testing

Providing details on how the Rittman Mead BI Developer Toolkit, partially described here, can act as an accelerator for the adoption of these practices in any custom OBIEE implementation and delivery process.

As mentioned before, the overall Kscope experience is great: plenty of technical presentation, roadmap information, networking opportunities and also much fun! Looking forward to Kscope19 in Seattle!

Categories: BI & Warehousing

Kscope18: It's a Wrap!

Thu, 2018-06-21 08:23
 It's a Wrap!

As announced few weeks back I represented Rittman Mead at ODTUG's Kscope18 hosted in the magnificent Walt Disney World Dolphin Resort. It's always hard to be credible when telling people you are going to Disneyworld for work but Kscope is a must-go event if you are in the Oracle landscape.

 It's a Wrap!

In the Sunday symposium Oracle PMs share hints about the products latest capabilities and roadmaps, then three full days of presentations spanning from the traditional Database, EPM and BI tracks to the new entries like Blockchain. On top of this the opportunity to be introduced to a network of Oracle experts including Oracle ACEs and Directors, PMs and people willing to share their experience with Oracle (and other) tools.

Sunday Symposium and Presentations

I attended the Oracle Analytics (BI and Essbase) Sunday Symposium run by Gabby Rubin and Matt Milella from Oracle. It was interesting to see the OAC product enhancements and roadmap as well as the feature catch-up in the latest release of OBIEE on-premises (version 12.2.1.4.0).

As expected, most of the push is towards OAC (Oracle Analytics Cloud): all new features will be developed there and eventually (but assurance on this) ported in the on-premises version. This makes a lot of sense from Oracle's point of view since it gives them the ability to produce new features quickly since they need to be tested only against a single set of HW/SW rather than the multitude they are supporting on-premises.

Most of the enhancements are expected in the Mode 2/Self Service BI area covered by Oracle Analytics Cloud Standard since a) this is the overall trend of the BI industry b) the features requested by traditional dashboard style reporting are well covered by OBIEE.
The following are just few of the items you could expect in future versions:

  • Recommendations during the data preparation phase like GeoLocation and Date enrichments
  • Data Flow enhancements like incremental updates or parametrized data-flows
  • New Visualizations and in general more control over the settings of the single charts.

In general Oracle's idea is to provide a single tool that meets both the needs of Mode 1 and Mode 2 Analytics (Self Service vs Centralized) rather than focusing on solving one need at a time like other vendors do.

Special mention to the Oracle Autonomous Analytics Cloud, released few weeks ago, that differs from traditional OAC for the fact that backups, patching and service monitoring are now managed automatically by Oracle thus releasing the customer from those tasks.

During the main conference days (mon-wed) I assisted a lot of very insightful presentations and the Oracle ACE Briefing which gave me ideas for future blog posts, so stay tuned! As written previously I had two sessions accepted for Kscope18: "Visualizing Streams" and "DevOps and OBIEE: Do it Before it's too late", in the following paragraph I'll share details (and link to the slides) of both.

Visualizing Streams

One of the latest trends in the data and analytics space is the transition from the old style batch based reporting systems which by design were adding a delay between the event creation and the appearance in the reporting to the concept of streaming: ingesting and delivering event information and analytics as soon as the event is created.

 It's a Wrap!

The session explains how the analytics space changed in recent times providing details on how to setup a modern analytical platform which includes streaming technologies like Apache Kafka, SQL based enrichment tools like Confluent's KSQL and connections to Self Service BI tools like Oracle's Data Visualization via sql-on-Hadoop technologies like Apache Drill. The slides of the session are available here.

DevOps and OBIEE: Do it Before it's Too Late

In the second session, slides here, I've been initially going through the motivations of applying DevOps principles to OBIEE: the self service BI wave started as a response to the long time to delivery associated with the old school centralized reporting projects. Huge monolithic sets of requirements to be delivered, no easy way to provide development isolation, manual testing and code promotion were only few of the stoppers for a fast delivery.

 It's a Wrap!

After an initial analysis of the default OBIEE developments methods, the presentation explains how to apply DevOps principles to an OBIEE (or OAC) environment and precisely:

  • Code versioning techniques
  • Feature-driven environment creation
  • Automated promotion
  • Automated regression testing

Providing details on how the Rittman Mead BI Developer Toolkit, partially described here, can act as an accelerator for the adoption of these practices in any custom OBIEE implementation and delivery process.

As mentioned before, the overall Kscope experience is great: plenty of technical presentation, roadmap information, networking opportunities and also much fun! Looking forward to Kscope19 in Seattle!

Categories: BI & Warehousing

ChitChat for OBIEE - Now Available as Open Source!

Fri, 2018-06-15 03:20

ChitChat is the Rittman Mead commentary tool for OBIEE. ChitChat enhances the BI experience by bridging conversational capabilities into the BI dashboard, increasing ease-of-use and seamlessly joining current workflows. From tracking the history behind analytical results to commenting on specific reports, ChitChat provides a multi-tiered platform built into the BI dashboard that creates a more collaborative and dynamic environment for discussion.

Today we're pleased to announce the release into open-source of ChitChat! You can find the github repository here: https://github.com/RittmanMead/ChitChat

Highlights of the features that ChitChat provides includes:

  • Annotate - ChitChat's multi-tiered annotation capabilities allow BI users to leave comments where they belong, at the source of the conversation inside the BI ecosystem.

  • Document - ChitChat introduces the ability to include documentation inside your BI environment for when you need more that a comment. Keeping key materials contained inside the dashboard gives the right people access to key information without searching.

  • Share - ChitChat allows to bring attention to important information on the dashboard using the channel or workflow manager you prefer.

  • Verified Compatibility - ChitChat has been tested against popular browsers, operating systems, and database platforms for maximum compatibility.

Getting Started

In order to use ChitChat you will need OBIEE 11.1.1.7.x, 11.1.1.9.x or 12.2.1.x.

First, download the application and unzip it to a convenient access location in the OBIEE server, such as a home directory or the desktop.

See the Installation Guide for full detail on how to install ChitChat.

Database Setup

Build the required database tables using the installer:

cd /home/federico/ChitChatInstaller
java -jar SocializeInstaller.jar -Method:BuildDatabase -DatabasePath:/app/oracle/oradata/ORCLDB/ORCLPDB1/ -JDBC:"jdbc:oracle:thin:@192.168.0.2:1521/ORCLPDB1" -DatabaseUser:"sys as sysdba" -DatabasePassword:password -NewDBUserPassword:password1

The installer will create a new user (RMREP), and tables required for the application to operate correctly. -DatabasePath flag tells the installer where to place the datafiles for ChitChat in your database server. -JDBC indicates what JDBC driver to use, followed by a colon and the JDBC string to connect to your database. -DatabaseUser specifies the user to access the database with. -DatabasePassword specifies the password for the user previously given. -NewDBUserPassword indicates the password for the new user (RMREP) being created.

WebLogic Data Source Setup

Add a Data Source object to WebLogic using WLST:

cd /home/federico/ChitChatInstaller/jndiInstaller
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./create-ds.py

To use this script, modify the ds.properties file using the method of your choice. The following parameters must be updated to reflect your installation: domain.name, admin.url, admin.userName, admin.password, datasource.target, datasource.url and datasource.password.

Deploying the Application on WebLogic

Deploy the application to WebLogic using WLST:

cd /home/federico/ChitChatInstaller
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./deploySocialize.py

To use this script, modify the deploySocialize.py file using the method of your choice. The first line must be updated with username, password and url to connect to your Weblogic Server instance. The second parameter in deploy command must be updated to reflect your ChitChat access location.

Configuring the Application

ChitChat requires several several configuration parameters to allow the application to operate successfully. To change the configuration, you must log in to the database schema as the RMREP user, and update the values manually into the APPLICATION_CONSTANT table.

See the Installation Guide for full detail on the available configuration and integration options.

Enabling the Application

To use ChitChat, you must add a small block of code on any given dashboard (in a new column on the right-side of the dashboard) where you want to have the application enabled:

<rm id="socializePageParams"
user="@{biServer.variables['NQ_SESSION.USER']}"
tab="@{dashboard.currentPage.name}"
page="@{dashboard.name}">
</rm>
<script src="/Socialize/js/dashboard.js"></script>

Congratulations! You have successfully installed the Rittman Mead commentary tool. To use the application to its fullest capabilities, please refer to the User Guide.

Problems?

Please raise any issues on the github issue tracker. This is open source, so bear in mind that it's no-one's "job" to maintain the code - it's open to the community to use, benefit from, and maintain.

If you'd like specific help with an implementation, Rittman Mead would be delighted to assist - please do get in touch with Jon Mead or DM us on Twitter @rittmanmead to get access to our Slack channel for support about ChitChat.

Please contact us on the same channels to request a demo.

Categories: BI & Warehousing

ChitChat for OBIEE - Now Available as Open Source!

Fri, 2018-06-15 03:20

ChitChat is the Rittman Mead commentary tool for OBIEE. ChitChat enhances the BI experience by bridging conversational capabilities into the BI dashboard, increasing ease-of-use and seamlessly joining current workflows. From tracking the history behind analytical results to commenting on specific reports, ChitChat provides a multi-tiered platform built into the BI dashboard that creates a more collaborative and dynamic environment for discussion.

Today we're pleased to announce the release into open-source of ChitChat! You can find the github repository here: https://github.com/RittmanMead/ChitChat

Highlights of the features that ChitChat provides includes:

  • Annotate - ChitChat's multi-tiered annotation capabilities allow BI users to leave comments where they belong, at the source of the conversation inside the BI ecosystem.

  • Document - ChitChat introduces the ability to include documentation inside your BI environment for when you need more that a comment. Keeping key materials contained inside the dashboard gives the right people access to key information without searching.

  • Share - ChitChat allows to bring attention to important information on the dashboard using the channel or workflow manager you prefer.

  • Verified Compatibility - ChitChat has been tested against popular browsers, operating systems, and database platforms for maximum compatibility.

Getting Started

In order to use ChitChat you will need OBIEE 11.1.1.7.x, 11.1.1.9.x or 12.2.1.x.

First, download the application and unzip it to a convenient access location in the OBIEE server, such as a home directory or the desktop.

See the Installation Guide for full detail on how to install ChitChat.

Database Setup

Build the required database tables using the installer:

cd /home/federico/ChitChatInstaller  
java -jar SocializeInstaller.jar -Method:BuildDatabase -DatabasePath:/app/oracle/oradata/ORCLDB/ORCLPDB1/ -JDBC:"jdbc:oracle:thin:@192.168.0.2:1521/ORCLPDB1" -DatabaseUser:"sys as sysdba" -DatabasePassword:password -NewDBUserPassword:password1  

The installer will create a new user (RMREP), and tables required for the application to operate correctly. -DatabasePath flag tells the installer where to place the datafiles for ChitChat in your database server. -JDBC indicates what JDBC driver to use, followed by a colon and the JDBC string to connect to your database. -DatabaseUser specifies the user to access the database with. -DatabasePassword specifies the password for the user previously given. -NewDBUserPassword indicates the password for the new user (RMREP) being created.

WebLogic Data Source Setup

Add a Data Source object to WebLogic using WLST:

cd /home/federico/ChitChatInstaller/jndiInstaller  
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./create-ds.py

To use this script, modify the ds.properties file using the method of your choice. The following parameters must be updated to reflect your installation: domain.name, admin.url, admin.userName, admin.password, datasource.target, datasource.url and datasource.password.

Deploying the Application on WebLogic

Deploy the application to WebLogic using WLST:

cd /home/federico/ChitChatInstaller  
$ORACLE_HOME/oracle_common/common/bin/wlst.sh ./deploySocialize.py

To use this script, modify the deploySocialize.py file using the method of your choice. The first line must be updated with username, password and url to connect to your Weblogic Server instance. The second parameter in deploy command must be updated to reflect your ChitChat access location.

Configuring the Application

ChitChat requires several several configuration parameters to allow the application to operate successfully. To change the configuration, you must log in to the database schema as the RMREP user, and update the values manually into the APPLICATION_CONSTANT table.

See the Installation Guide for full detail on the available configuration and integration options.

Enabling the Application

To use ChitChat, you must add a small block of code on any given dashboard (in a new column on the right-side of the dashboard) where you want to have the application enabled:

<rm id="socializePageParams"  
user="@{biServer.variables['NQ_SESSION.USER']}"  
tab="@{dashboard.currentPage.name}"  
page="@{dashboard.name}">  
</rm>  
<script src="/Socialize/js/dashboard.js"></script>  

Congratulations! You have successfully installed the Rittman Mead commentary tool. To use the application to its fullest capabilities, please refer to the User Guide.

Problems?

Please raise any issues on the github issue tracker. This is open source, so bear in mind that it's no-one's "job" to maintain the code - it's open to the community to use, benefit from, and maintain.

If you'd like specific help with an implementation, Rittman Mead would be delighted to assist - please do get in touch with Jon Mead or DM us on Twitter @rittmanmead to get access to our Slack channel for support about ChitChat.

Please contact us on the same channels to request a demo.

Categories: BI & Warehousing

Real-time Sailing Yacht Performance - Kafka (Part 2)

Thu, 2018-06-14 08:18

In the last two blogs Getting Started (Part 1) and Stepping back a bit (Part 1.1) I looked at what data I could source from the boat's instrumentation and introduced some new hardware to the boat to support the analysis.

Just to recap I am looking to create the yachts Polars with a view to improving our knowledge of her abilities (whether we can use this to improve our race performance is another matter).

Polars give us a plot of the boat's speed given a true wind speed and angle. This, in turn, informs us of the optimal speed the boat could achieve at any particular angle to wind and wind speed.

Image Description

In the first blog I wrote a reader in Python that takes messages from a TCP/IP feed and writes the data to a file. The reader is able, using a hash key to validate each message (See Getting Started (Part 1)). I'm also converting valid messages into a JSON format so that I can push meaningful structured data downstream. In this blog, I'll cover the architecture and considerations around the setup of Kafka for this use case. I will not cover the installation of each component, there has been a lot written in this area. (We have some internal IP to help with configuration). I discuss the process I went through to get the data in real time displayed in a Grafana dashboard.

Introducing Kafka

I have introduced Kafka into the architecture as a next step.

Why Kafka?

I would like to be able to stream this data real time and don't want to build my own batch mechanism or create a publish/ subscribe model. With Kafka I don't need to check that messages have been successfully received and if there is a failure while consuming messages the consumers will keep track of what has been consumed. If a consumer fails it can be restarted and it will pick up where it left off (consumer offset stored in Kafka as a topic). In the future, I could scale out the platform and introduce some resilience through clustering and replication (this shouldn't be required for a while). Kafka therefore is saving me a lot of manual engineering and will support future growth (should I come into money and am able to afford more sensors for the boat).

High level architecture

Let's look at the high-level components and how they fit together. Firstly I have the instruments transmitting on wireless TCP/IP and these messages are read using my Python I wrote earlier in the year.

I have enhanced the Python I wrote to read and translate the messages and instead of writing to a file I stream the JSON messages to a topic in Kafka.

Once the messages are in Kafka I use Kafka Connect to stream the data into InfluxDB. The messages are written to topic-specific measurements (tables in InfluxdDB).

Grafana is used to display incoming messages in real time.

Kafka components

I am running the application on a MacBook Pro. Basically a single node instance with zookeeper, Kafka broker and a Kafka connect worker. This is the minimum setup with very little resilience.

Image Description In summary

ZooKeeper is an open-source server that enables distributed coordination of configuration information. In the Kafka architecture ZooKeeper stores metadata about brokers, topics, partitions and their locations.
ZooKeeper is configured in zookeeper.properties.

Kafka broker is a single Kafka server.

"The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk." [1]

The broker is configured in server.properties. In this setup I have set auto.create.topics.enabled=false. Setting this to false gives me control over the environment as the name suggests it disables the auto-creation of a topic which in turn could lead to confusion.

Kafka connect worker allows us to take advantage of predefined connectors that enable the writing of messages to known external datastores from Kafka. The worker is a wrapper around a Kafka consumer. A consumer is able to read messages from a topic partition using offsets. Offsets keep track of what has been read by a particular consumer or consumer group. (Kafka connect workers can also write to Kafka from datastores but I am not using this functionality in this instance). The connect worker is configured in connect-distributed-properties. I have defined the location of the plugins in this configuration file. Connector definitions are used to determine how to write to an external data source.

Producer to InfluxDB

I use kafka-python to stream the messages into kafka. Within kafka-python there is a KafkaProducer that is intended to work in a similar way to the official java client.

I have created a producer for each message type (parameterised code). Although each producer reads the entire stream from the TCP/IP port it only processes it's assigned message type (wind or speed) this increasing parallelism and therefore throughput.

  producer = KafkaProducer(bootstrap_servers='localhost:9092' , value_serializer=lambda v: json.dumps(v).encode('utf-8'))
  producer.send(topic, json_str) 

I have created a topic per message type with a single partition. Using a single partition per topic guarantees I will consume messages in the order they arrive. There are other ways to increase the number of partitions and still maintain the read order but for this use case a topic per message type seemed to make sense. I basically have optimised throughput (well enough for the number of messages I am trying to process).

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wind-json

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic speed-json

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic gps-json 

When defining a topic you specify the replaication-factor and the number of partitions.

The topic-level configuration is replication.factor. At the broker level, you control the default.replication.factor for automatically created topics. [1:1] (I have turned off the automatic creation of topics).

The messages are consumed using Stream reactor which has an InfluxDB sink mechanism and writes directly to the measurements within a performance database I have created. The following parameters showing the topics and inset mechanism are configured in performance.influxdb-sink.properties.

topics=wind-json,speed-json,gps-json

connect.influx.kcql=INSERT INTO wind SELECT * FROM wind-json WITHTIMESTAMP sys_time();INSERT INTO speed SELECT * FROM speed-json WITHTIMESTAMP sys_time();INSERT INTO gps SELECT * FROM gps-json WITHTIMESTAMP sys_time()

The following diagram shows the detail from producer to InfluxDB.

If we now run the producers we get data streaming through the platform.

Producer Python log showing JSON formatted messages:

Status of consumers show minor lag reading from two topics, the describe also shows the current offsets for each consumer task and partitions being consumed (if we had a cluster it would show multiple hosts):

Inspecting the InfluxDB measurements:

When inserting into a measurement in InfluxDB if the measurement does not exist it gets created automatically. The datatypes of the fields are determined from the JSON object being inserted. I needed to adjust the creation of the JSON message to cast the values to floats otherwise I ended up with the wrong types. This caused reporting issues in Grafana. This would be a good case for using Avro and Schema Registry to handle these definitions.

The following gif shows Grafana displaying some of the wind and speed measurements using a D3 Gauge plugin with the producers running to the right of the dials.

Next Steps

I'm now ready to do some real-life testing on our next sailing passage.

In the next blog, I will look at making the setup more resilient to failure and how to monitor and automatically recover from some of these failures. I will also introduce the WorldMap pannel to Grafana so I can plot the location the readings were taken and overlay tidal data.

References
  1. Kafka the definitive guide ↩︎ ↩︎

Categories: BI & Warehousing

Real-time Sailing Yacht Performance - Kafka (Part 2)

Thu, 2018-06-14 08:18

In the last two blogs Getting Started (Part 1) and Stepping back a bit (Part 1.1) I looked at what data I could source from the boat's instrumentation and introduced some new hardware to the boat to support the analysis.

Just to recap I am looking to create the yachts Polars with a view to improving our knowledge of her abilities (whether we can use this to improve our race performance is another matter).

Polars give us a plot of the boat's speed given a true wind speed and angle. This, in turn, informs us of the optimal speed the boat could achieve at any particular angle to wind and wind speed.

Image Description

In the first blog I wrote a reader in Python that takes messages from a TCP/IP feed and writes the data to a file. The reader is able, using a hash key to validate each message (See Getting Started (Part 1)). I'm also converting valid messages into a JSON format so that I can push meaningful structured data downstream. In this blog, I'll cover the architecture and considerations around the setup of Kafka for this use case. I will not cover the installation of each component, there has been a lot written in this area. (We have some internal IP to help with configuration). I discuss the process I went through to get the data in real time displayed in a Grafana dashboard.

Introducing Kafka

I have introduced Kafka into the architecture as a next step.

Why Kafka?

I would like to be able to stream this data real time and don't want to build my own batch mechanism or create a publish/ subscribe model. With Kafka I don't need to check that messages have been successfully received and if there is a failure while consuming messages the consumers will keep track of what has been consumed. If a consumer fails it can be restarted and it will pick up where it left off (consumer offset stored in Kafka as a topic). In the future, I could scale out the platform and introduce some resilience through clustering and replication (this shouldn't be required for a while). Kafka therefore is saving me a lot of manual engineering and will support future growth (should I come into money and am able to afford more sensors for the boat).

High level architecture

Let's look at the high-level components and how they fit together. Firstly I have the instruments transmitting on wireless TCP/IP and these messages are read using my Python I wrote earlier in the year.

I have enhanced the Python I wrote to read and translate the messages and instead of writing to a file I stream the JSON messages to a topic in Kafka.

Once the messages are in Kafka I use Kafka Connect to stream the data into InfluxDB. The messages are written to topic-specific measurements (tables in InfluxdDB).

Grafana is used to display incoming messages in real time.

Kafka components

I am running the application on a MacBook Pro. Basically a single node instance with zookeeper, Kafka broker and a Kafka connect worker. This is the minimum setup with very little resilience.

Image Description

In summary

ZooKeeper is an open-source server that enables distributed coordination of configuration information. In the Kafka architecture ZooKeeper stores metadata about brokers, topics, partitions and their locations. ZooKeeper is configured in zookeeper.properties.

Kafka broker is a single Kafka server.

"The broker receives messages from producers, assigns offsets to them, and commits the messages to storage on disk. It also services consumers, responding to fetch requests for partitions and responding with the messages that have been committed to disk." 1

The broker is configured in server.properties. In this setup I have set auto.create.topics.enabled=false. Setting this to false gives me control over the environment as the name suggests it disables the auto-creation of a topic which in turn could lead to confusion.

Kafka connect worker allows us to take advantage of predefined connectors that enable the writing of messages to known external datastores from Kafka. The worker is a wrapper around a Kafka consumer. A consumer is able to read messages from a topic partition using offsets. Offsets keep track of what has been read by a particular consumer or consumer group. (Kafka connect workers can also write to Kafka from datastores but I am not using this functionality in this instance). The connect worker is configured in connect-distributed-properties. I have defined the location of the plugins in this configuration file. Connector definitions are used to determine how to write to an external data source.

Producer to InfluxDB

I use kafka-python to stream the messages into kafka. Within kafka-python there is a KafkaProducer that is intended to work in a similar way to the official java client.

I have created a producer for each message type (parameterised code). Although each producer reads the entire stream from the TCP/IP port it only processes it's assigned message type (wind or speed) this increasing parallelism and therefore throughput.

  producer = KafkaProducer(bootstrap_servers='localhost:9092' , value_serializer=lambda v: json.dumps(v).encode('utf-8'))
  producer.send(topic, json_str) 

I have created a topic per message type with a single partition. Using a single partition per topic guarantees I will consume messages in the order they arrive. There are other ways to increase the number of partitions and still maintain the read order but for this use case a topic per message type seemed to make sense. I basically have optimised throughput (well enough for the number of messages I am trying to process).

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic wind-json

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic speed-json

kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic gps-json 

When defining a topic you specify the replaication-factor and the number of partitions.

The topic-level configuration is replication.factor. At the broker level, you control the default.replication.factor for automatically created topics. 1 (I have turned off the automatic creation of topics).

The messages are consumed using Stream reactor which has an InfluxDB sink mechanism and writes directly to the measurements within a performance database I have created. The following parameters showing the topics and inset mechanism are configured in performance.influxdb-sink.properties.

topics=wind-json,speed-json,gps-json

connect.influx.kcql=INSERT INTO wind SELECT * FROM wind-json WITHTIMESTAMP sys_time();INSERT INTO speed SELECT * FROM speed-json WITHTIMESTAMP sys_time();INSERT INTO gps SELECT * FROM gps-json WITHTIMESTAMP sys_time()

The following diagram shows the detail from producer to InfluxDB.

If we now run the producers we get data streaming through the platform.

Producer Python log showing JSON formatted messages:

Status of consumers show minor lag reading from two topics, the describe also shows the current offsets for each consumer task and partitions being consumed (if we had a cluster it would show multiple hosts):

Inspecting the InfluxDB measurements:

When inserting into a measurement in InfluxDB if the measurement does not exist it gets created automatically. The datatypes of the fields are determined from the JSON object being inserted. I needed to adjust the creation of the JSON message to cast the values to floats otherwise I ended up with the wrong types. This caused reporting issues in Grafana. This would be a good case for using Avro and Schema Registry to handle these definitions.

The following gif shows Grafana displaying some of the wind and speed measurements using a D3 Gauge plugin with the producers running to the right of the dials.

Next Steps

I'm now ready to do some real-life testing on our next sailing passage.

In the next blog, I will look at making the setup more resilient to failure and how to monitor and automatically recover from some of these failures. I will also introduce the WorldMap pannel to Grafana so I can plot the location the readings were taken and overlay tidal data.

References
Categories: BI & Warehousing

OAC - Thoughts on Moving to the Cloud

Tue, 2018-06-12 12:43

Last week, I spent a couple of days with Oracle at Thames Valley Park and this presented me with a perfect opportunity to sit down and get to grips with the full extent of the Oracle Analytics Cloud (OAC) suite...without having to worry about client requirements or project deadlines!

As a company, Rittman Mead already has solid experience of OAC, but my personal exposure has been limited to presentations, product demonstrations, reading the various postings in the blog community and my existing experiences of Data Visualisation and BI cloud services (DVCS and BICS respectively). You’ll find Francesco’s post a good starting place if you need an overview of OAC and how it differs (or aligns) to Data Visualisation and BI Cloud Services.

So, having spent some time looking at the overall suite and, more importantly, trying to interpret what it could mean for organisations thinking about making a move to the cloud, here are my top three takeaways:

Clouds Come In Different Shapes and Flavours

Two of the main benefits that a move to the cloud offers are simplification in platform provisioning and an increase in flexibility, being able to ramp up or scale down resources at will. These both comes with a potential cost benefit, depending on your given scenario and requirement. The first step is understanding the different options in the OAC licensing and feature matrix.

First, we need to draw a distinction between Analytics Cloud and the Autonomous Analytics Cloud (interestingly, both options point to the same page on cloud.oracle.com, which makes things immediately confusing!). In a nutshell though, the distinction comes down to who takes responsibility for the service management: Autonomous Analytics Cloud is managed by Oracle, whilst Analytics Cloud is managed by yourself. It’s interesting to note that the Autonomous offering is marginally cheaper.

Next, Oracle have chosen to extend their BYOL (Bring Your Own License) option from their IaaS services to now incorporate PaaS services. This means that if you have existing licenses for the on-premise software, then you are able to take advantage of what appears to be a significantly discounted cost. Clearly, this is targeted to incentivise existing Oracle customers to make the leap into the Cloud, and should be considered against your ongoing annual support fees.

Since the start of the year, Analytics Cloud now comes in three different versions, with the Standard and Enterprise editions now being separated by the new Data Lake edition. The important things to note are that (possibly confusingly) Essbase is now incorporated into the Data Lake edition of the Autonomous Analytics Cloud and that for the full enterprise capability you have with OBIEE, you will need the Enterprise edition. Each version inherits the functionality of its preceding version: Enterprise edition gives you everything in the Data Lake edition; Data Lake edition incorporates everything in the Standard edition.

alt

Finally, it’s worth noting that OAC aligns to the Universal Credit consumption model, whereby the cost is determined based on the size and shape of the cloud that you need. Services can be purchased as Pay as You Go or Monthly Flex options (with differential costing to match). The PAYG model is based on hourly consumption and is paid for in arrears, making it the obvious choice for short term prototyping or POC activities. Conversely, the Monthly Flex model is paid in advance and requires a minimum 12 month investment and therefore makes sense for full scale implementations. Then, the final piece of the jigsaw comes with the shape of the service you consume. This is measured in OCPU’s (Oracle Compute Units) and the larger your memory requirements, the more OCPU’s you consume.

Where You Put Your Data Will Always Matter

Moving your analytics platform into the cloud may make a lot of sense and could therefore be a relatively simple decision to make. However, the question of where your data resides is a more challenging subject, given the sensitivities and increasing legislative constraints that exist around where your data can or should be stored. The answer to that question will influence the performance and data latency you can expect from your analytics platform.

OAC is architected to be flexible when it comes to its data sources and consequently the options available for data access are pretty broad. At a high level, your choices are similar to those you would have when implementing on-premise, namely:

  • perform ELT processing to transform and move the data (into the cloud);
  • replicate data from source to target (in the cloud) or;
  • query data sources via direct access.

These are supplemented by a fourth option to use the inbuilt Data Connectors available in OAC to connect to cloud or on-premise databases, other proprietary platforms or any other source accessible via JDBC. This is probably a decent path for exploratory data usage within DV, but I’m not sure it would always make the best long term option.

alt

Unsurprisingly, with the breadth of options comes a spectrum of tooling that can be used for shifting your data around and it is important to note that depending on your approach, additional cloud services may or may not be required.

For accessing data directly at its source, the preferred route seems to be to use RDC (Remote Data Connector), although it is worth noting that support is limited to Oracle (including OLAP), SQL Server, Teradata or DB2 databases. Also, be aware that RDC operates within WebLogic Server and so this will be needed within the on-premise network.

Data replication is typically achieved using Data Sync (the reincarnation of the DAC, which OBIA implementers will already be familiar with), although it is worth mentioning that there are other routes that could be taken, such as APEX or SQL Developer, depending on the data volumes and latency you have to play with.

Classic ELT processing can be achieved via Oracle Data Integrator (either the Cloud Service, a traditional on-premise implementation or a hybrid-model).

Ultimately, due care and attention needs to be taken when deciding on your data architecture as this will have a fundamental effect on the simplicity with which data can be accessed and interpreted, the query performance achieved and the data latency built into your analytics.

Data Flows Make For Modern Analytics Simplification

A while back, I wrote a post titled Enabling a Modern Analytics Platform in which I attempted to describe ways that Mode 1 (departmental) and Mode 2 (enterprise) analytics could be built out to support each other, as opposed to undermining one another. One of the key messages I made was the importance of having an effective mechanism for transitioning your Mode 1 outputs back into Mode 2 as seamlessly as possible. (The same is true in reverse for making enterprise data available as an Mode 1 input.)

One of the great things about OAC is how it serves to simplify this transition. Users are able to create analytic content based on data sourced from a broad range of locations: at the simplest level, Data Sets can be built from flat files or via one of the available Data Connectors to relational, NoSQL, proprietary database or Essbase sources. Moreover, enterprise curated metadata (via RPD lift-and-shift from an on-premise implementation) or analyst developed Subject Areas can be exposed. These sources can be ‘mashed’ together directly in a DV project or, for more complex or repeatable actions, Data Flows can be created to build Data Sets. Data Flows are pretty powerful, not only allowing users to join disparate data but also perform some useful data preparation activities, ranging from basic filtering, aggregation and data manipulation actions to more complex sentiment analysis, forecasting and even some machine learning modelling features. Importantly, Data Flows can be set to output their results to disk, either written to a Data Set or even to a database table and they can be scheduled for repetitive refresh.

For me, one of the most important things about the Data Flows feature is that it provides a clear and understandable interface which shows the sequencing of each of the data preparation stages, providing valuable information for any subsequent reverse engineering of the processing back into the enterprise data architecture.

alt

In summary, there are plenty of exciting and innovative things happening with Oracle Analytics in the cloud and as time marches on, the case for moving to the cloud in one shape or form will probably get more and more compelling. However, beyond a strategic decision to ‘Go Cloud’, there are many options and complexities that need to be addressed in order to make a successful start to your journey - some technical, some procedural and some organisational. Whilst a level of planning and research will undoubtedly smooth the path, the great thing about the cloud services is that they are comparatively cheap and easy to initiate, so getting on and building a prototype is always going to be a good, exploratory starting point.

Categories: BI & Warehousing

OAC - Thoughts on Moving to the Cloud

Tue, 2018-06-12 12:43

Last week, I spent a couple of days with Oracle at Thames Valley Park and this presented me with a perfect opportunity to sit down and get to grips with the full extent of the Oracle Analytics Cloud (OAC) suite...without having to worry about client requirements or project deadlines!

As a company, Rittman Mead already has solid experience of OAC, but my personal exposure has been limited to presentations, product demonstrations, reading the various postings in the blog community and my existing experiences of Data Visualisation and BI cloud services (DVCS and BICS respectively). You’ll find Francesco’s post a good starting place if you need an overview of OAC and how it differs (or aligns) to Data Visualisation and BI Cloud Services.

So, having spent some time looking at the overall suite and, more importantly, trying to interpret what it could mean for organisations thinking about making a move to the cloud, here are my top three takeaways:

Clouds Come In Different Shapes and Flavours

Two of the main benefits that a move to the cloud offers are simplification in platform provisioning and an increase in flexibility, being able to ramp up or scale down resources at will. These both comes with a potential cost benefit, depending on your given scenario and requirement. The first step is understanding the different options in the OAC licensing and feature matrix.

First, we need to draw a distinction between Analytics Cloud and the Autonomous Analytics Cloud (interestingly, both options point to the same page on cloud.oracle.com, which makes things immediately confusing!). In a nutshell though, the distinction comes down to who takes responsibility for the service management: Autonomous Analytics Cloud is managed by Oracle, whilst Analytics Cloud is managed by yourself. It’s interesting to note that the Autonomous offering is marginally cheaper.

Next, Oracle have chosen to extend their BYOL (Bring Your Own License) option from their IaaS services to now incorporate PaaS services. This means that if you have existing licenses for the on-premise software, then you are able to take advantage of what appears to be a significantly discounted cost. Clearly, this is targeted to incentivise existing Oracle customers to make the leap into the Cloud, and should be considered against your ongoing annual support fees.

Since the start of the year, Analytics Cloud now comes in three different versions, with the Standard and Enterprise editions now being separated by the new Data Lake edition. The important things to note are that (possibly confusingly) Essbase is now incorporated into the Data Lake edition of the Autonomous Analytics Cloud and that for the full enterprise capability you have with OBIEE, you will need the Enterprise edition. Each version inherits the functionality of its preceding version: Enterprise edition gives you everything in the Data Lake edition; Data Lake edition incorporates everything in the Standard edition.

alt

Finally, it’s worth noting that OAC aligns to the Universal Credit consumption model, whereby the cost is determined based on the size and shape of the cloud that you need. Services can be purchased as Pay as You Go or Monthly Flex options (with differential costing to match). The PAYG model is based on hourly consumption and is paid for in arrears, making it the obvious choice for short term prototyping or POC activities. Conversely, the Monthly Flex model is paid in advance and requires a minimum 12 month investment and therefore makes sense for full scale implementations. Then, the final piece of the jigsaw comes with the shape of the service you consume. This is measured in OCPU’s (Oracle Compute Units) and the larger your memory requirements, the more OCPU’s you consume.

Where You Put Your Data Will Always Matter

Moving your analytics platform into the cloud may make a lot of sense and could therefore be a relatively simple decision to make. However, the question of where your data resides is a more challenging subject, given the sensitivities and increasing legislative constraints that exist around where your data can or should be stored. The answer to that question will influence the performance and data latency you can expect from your analytics platform.

OAC is architected to be flexible when it comes to its data sources and consequently the options available for data access are pretty broad. At a high level, your choices are similar to those you would have when implementing on-premise, namely:

  • perform ELT processing to transform and move the data (into the cloud);
  • replicate data from source to target (in the cloud) or;
  • query data sources via direct access.

These are supplemented by a fourth option to use the inbuilt Data Connectors available in OAC to connect to cloud or on-premise databases, other proprietary platforms or any other source accessible via JDBC. This is probably a decent path for exploratory data usage within DV, but I’m not sure it would always make the best long term option.

alt

Unsurprisingly, with the breadth of options comes a spectrum of tooling that can be used for shifting your data around and it is important to note that depending on your approach, additional cloud services may or may not be required.

For accessing data directly at its source, the preferred route seems to be to use RDC (Remote Data Connector), although it is worth noting that support is limited to Oracle (including OLAP), SQL Server, Teredata or DB2 databases. Also, be aware that RDC operates within WebLogic Server and so this will be needed within the on-premise network.

Data replication is typically achieved using Data Sync (the reincarnation of the DAC, which OBIA implementers will already be familiar with), although it is worth mentioning that there are other routes that could be taken, such as APEX or SQL Developer, depending on the data volumes and latency you have to play with.

Classic ELT processing can be achieved via Oracle Data Integrator (either the Cloud Service, a traditional on-premise implementation or a hybrid-model).

Ultimately, due care and attention needs to be taken when deciding on your data architecture as this will have a fundamental effect on the simplicity with which data can be accessed and interpreted, the query performance achieved and the data latency built into your analytics.

Data Flows Make For Modern Analytics Simplification

A while back, I wrote a post titled Enabling a Modern Analytics Platform in which I attempted to describe ways that Mode 1 (departmental) and Mode 2 (enterprise) analytics could be built out to support each other, as opposed to undermining one another. One of the key messages I made was the importance of having an effective mechanism for transitioning your Mode 1 outputs back into Mode 2 as seamlessly as possible. (The same is true in reverse for making enterprise data available as an Mode 1 input.)

One of the great things about OAC is how it serves to simplify this transition. Users are able to create analytic content based on data sourced from a broad range of locations: at the simplest level, Data Sets can be built from flat files or via one of the available Data Connectors to relational, NoSQL, proprietary database or Essbase sources. Moreover, enterprise curated metadata (via RPD lift-and-shift from an on-premise implementation) or analyst developed Subject Areas can be exposed. These sources can be ‘mashed’ together directly in a DV project or, for more complex or repeatable actions, Data Flows can be created to build Data Sets. Data Flows are pretty powerful, not only allowing users to join disparate data but also perform some useful data preparation activities, ranging from basic filtering, aggregation and data manipulation actions to more complex sentiment analysis, forecasting and even some machine learning modelling features. Importantly, Data Flows can be set to output their results to disk, either written to a Data Set or even to a database table and they can be scheduled for repetitive refresh.

For me, one of the most important things about the Data Flows feature is that it provides a clear and understandable interface which shows the sequencing of each of the data preparation stages, providing valuable information for any subsequent reverse engineering of the processing back into the enterprise data architecture.

alt

In summary, there are plenty of exciting and innovative things happening with Oracle Analytics in the cloud and as time marches on, the case for moving to the cloud in one shape or form will probably get more and more compelling. However, beyond a strategic decision to ‘Go Cloud’, there are many options and complexities that need to be addressed in order to make a successful start to your journey - some technical, some procedural and some organisational. Whilst a level of planning and research will undoubtedly smooth the path, the great thing about the cloud services is that they are comparatively cheap and easy to initiate, so getting on and building a prototype is always going to be a good, exploratory starting point.

Categories: BI & Warehousing

Why DevOps Matters for Enterprise BI

Tue, 2018-06-12 09:44
Why DevOps Matters for Enterprise BI

Why are people frustrated with their existing enterprise BI tools such as OBIEE? My view is because it costs too much to produce relevant content. I think some of this is down to the tools themselves, and some of it is down to process.

Starting with the tools, they are not “bad” tools; the traditional licensing model can be expensive in today’s market, and traditional development methods are time-consuming and hence expensive. The vendor’s response is to move to the cloud and to highlight cost savings that can be made by having a managed platform. Oracle Analytics Cloud (OAC) is essentially OBIEE installed on Oracle’s servers in Oracle’s data centres with Oracle providing your system administration, coupled with the ability to flex your licensing on a monthly or annual basis.

Cloud does give organisations the potential for more agility. Provisioning servers can no longer hold up the start of a project, and if a system needs to increase capacity, then more CPUs or nodes can be added. This latter case is a bit murky due to the cost implications and the option to try and resolve performance issues through query efficiency on the database.

I don’t think this solves the problem. Tools that provide reports and dashboards are becoming more commoditised, up and coming vendors and platform providers are offering the service for a fraction of the cost of the traditional vendors. They may lack some of the enterprise features like open security models; however, these are an area that platform providers are continually improving. Over the last 10 years, Oracle's focus for OBIEE has been on more on integration than innovation. Oracle DV was a significant change; however, there is a danger that Oracle lost the first-mover advantage to tools such as Tableau and QlikView. Additionally, some critical features like lineage, software lifecycle development, versioning and process automation are not built in to OBIEE and worse still, the legacy design and architecture of the product often hinders these.

So this brings me back round to process. Defining “good” processes and having tools to support them is one of the best ways you can keep your BI tools relevant to the business by reducing the friction in generating content.

What is a “good” process? Put simply, a process that reduces the time between the identification of a business need and the realising it with zero impact on existing components of the system. Also, a “good” process should provide visibility of any design, development and testing, plus documentation of changes, typically including lineage in a modern BI system. Continuous integration is the Holy Grail.

This why DevOps matters. Using automated migration across environments, regression tests, automatically generated documentation in the form of lineage, native support for version control systems, supported merge processes and ideally a scripting interface or API to automate the generation of repetitive tasks such as changing the data type of a group of fields system-wide, can dramatically reduce the gap from idea to realisation.

So, I would recommend that when looking at your enterprise BI system, you not only consider the vendor, location and features but also focus on the potential for process optimisation and automation. Automation could be something that the vendor builds into the tool, or you may need to use accelerators or software provided by a third party. Over the next few weeks, we will be publishing some examples and case studies of how our BI and DI Developer Toolkits have helped clients and enabled them to automate some or all of the BI software development cycle, reducing the time to release new features and increasing the confidence and robustness of the system.

Categories: BI & Warehousing

Why DevOps Matters for Enterprise BI

Tue, 2018-06-12 09:44
Why DevOps Matters for Enterprise BI

Why are people frustrated with their existing enterprise BI tools such as OBIEE? My view is because it costs too much to produce relevant content. I think some of this is down to the tools themselves, and some of it is down to process.

Starting with the tools, they are not “bad” tools; the traditional licensing model can be expensive in today’s market, and traditional development methods are time-consuming and hence expensive. The vendor’s response is to move to the cloud and to highlight cost savings that can be made by having a managed platform. Oracle Analytics Cloud (OAC) is essentially OBIEE installed on Oracle’s servers in Oracle’s data centres with Oracle providing your system administration, coupled with the ability to flex your licensing on a monthly or annual basis.

Cloud does give organisations the potential for more agility. Provisioning servers can no longer hold up the start of a project, and if a system needs to increase capacity, then more CPUs or nodes can be added. This latter case is a bit murky due to the cost implications and the option to try and resolve performance issues through query efficiency on the database.

I don’t think this solves the problem. Tools that provide reports and dashboards are becoming more commoditised, up and coming vendors and platform providers are offering the service for a fraction of the cost of the traditional vendors. They may lack some of the enterprise features like open security models; however, these are an area that platform providers are continually improving. Over the last 10 years, Oracle's focus for OBIEE has been on more on integration than innovation. Oracle DV was a significant change; however, there is a danger that Oracle lost the first-mover advantage to tools such as Tableau and QlikView. Additionally, some critical features like lineage, software lifecycle development, versioning and process automation are not built in to OBIEE and worse still, the legacy design and architecture of the product often hinders these.

So this brings me back round to process. Defining “good” processes and having tools to support them is one of the best ways you can keep your BI tools relevant to the business by reducing the friction in generating content.

What is a “good” process? Put simply, a process that reduces the time between the identification of a business need and the realising it with zero impact on existing components of the system. Also, a “good” process should provide visibility of any design, development and testing, plus documentation of changes, typically including lineage in a modern BI system. Continuous integration is the Holy Grail.

This why DevOps matters. Using automated migration across environments, regression tests, automatically generated documentation in the form of lineage, native support for version control systems, supported merge processes and ideally a scripting interface or API to automate the generation of repetitive tasks such as changing the data type of a group of fields system-wide, can dramatically reduce the gap from idea to realisation.

So, I would recommend that when looking at your enterprise BI system, you not only consider the vendor, location and features but also focus on the potential for process optimisation and automation. Automation could be something that the vendor builds into the tool, or you may need to use accelerators or software provided by a third party. Over the next few weeks, we will be publishing some examples and case studies of how our BI and DI Developer Toolkits have helped clients and enabled them to automate some or all of the BI software development cycle, reducing the time to release new features and increasing the confidence and robustness of the system.

Categories: BI & Warehousing

Pages