Business Intelligence Network business intelligence resources

Barry DevlinBlog: Barry Devlin

Sunday, 28 September 2008

BI and the Financial Crisis

As the worldwide banking crisis continues to escalate, one has to wonder—where was the Business Intelligence in all of this? What happened to Data Quality and Data Management?

First, we had the interesting revelation that the individual banks and lending institutions all seemed to be blissfully unaware of the extent to which they were exposed by lending in the sub-prime mortgage market. It’s difficult to imagine how the information available to decision makers in these companies could have been so scarce or so uninformative. Most, if not all, financial institutions have had extensive and expensive data warehouses in place for many years now. Business Intelligence should easily have warned of the dangers. Was the increasing level of risk unmeasured, overlooked or simply ignored?

More recently, we’ve had the spectacle of banks being unwilling to extend short-term lending facilities to one another for fear that the borrowing institutions could go belly-up in the next few days! Could the lenders not know? Unfortunately, in this case, the answer is probably that they couldn’t. Despite the fact that the worldwide financial market is tightly and instantly interconnected at a transaction level, the truth is that the underlying data remains disconnected and dispersed. Data Management and Data Quality have simply not been considered. Proper business governance in the financial markets as a whole is impossible without a well-defined and credible data foundation.

So, assuming that we can survive the crisis without a meltdown, what has been happening should be a clarion call to Data Management professionals in the financial industry particularly but also beyond. We need to recognize the interconnected and increasingly fragile web of data dependencies that hold the business world together. It’s time to get out there and apply the principles we know and preach already. And we had better get moving quickly.

Friday, 19 September 2008

Reining in the spreadsheets... into Playmarts

Enterprise BI shops and data quality departments regard spreadsheets largely as the work of the devil. Against all the rules of information quality, data in spreadsheets is manipulated by users at will and in private. Then the resulting data and function is distributed, shared and further played around with, until it's anybody's guess whether the results presented at the end bear any relationship to the truth. Data that was pure and clean as it came out of the data warehouse, data mart or approved BI report is now potentially as contaminated as nuclear waste.

And yet, check in with the users. Indeed, check in with yourself. Why is Excel so popular? Because it makes it easy to play with the data, check out hypotheses, get answers otherwise unavailable, and so on. And once you've gotten the answer through the spreadsheet, chances are you won't get the time or the resources to recreate the process in a more auditable, quality-conscious way. It's a real and spreading problem. But, what to do?

This week I had the opportunity to preview a new product called Lyza that's due to launch on Sept. 22. In fact, you can download it and play with it already. Scott Davis, the CEO of Lyzasoft Inc. explained that they had spent a lot of time investigating how business analysts, the power users of spreadsheets, actually work. This is usually a good idea, because you find out what the users really need, and which of your assumptions are right or wrong. It will probably come as no surprise that most analysts approach their work in a highly unstructured and iterative way, pulling bits of relevant data into Excel from a variety of known sources - both official data marts and reports as well as unofficial files, spreadsheets, etc. they happen to have created before or borrowed from trusted colleagues. And they do it in Excel, because that's the only way they can.

What Lyza does is to provide an easier, more intuitive way of pulling data together from diverse sources, combining and manipulating it and creating results and reports for distribution to the business. Well, that's all fine and dandy for the business analysts you may say, but how does it help the BI and data quality departments address the data contagion? The answer is that Lyza tracks and saves an audit trail of every action and every step of the analysis process that the user is building as well as enabling snapshots of the results to be cached and preserved for posterity. Now the data quality folks are beginning to smile. And the BI department? Well, they're less sure: they like the added traceability, but this is still outside their comfortable data mart zone.

However, we could look at it in a different way. We could imagine that Lyza provides a new type of data mart - a "playmart" - a sand box where power analysts can experiment with data and perform all sorts of analyses in a safe, well-managed environment. Now, if only we could evaluate the analysts' logic and productionalize those analyses and reports that are going to be reused and built upon in the future.

Scott's initial answer was that you can certainly do all this within Lyza itself. But a bit of further probing convinced me that the metadata that Lyza stores to describe the analysis processes is probably sufficient to enable the creation of ETL scripts for your ETL engine of choice. This would certainly require further investigation and automation, but it seems like the bones of the idea are there. In this case, the playmart could address a set of business analysts' needs that have been long ignored by the BI departments and by BI vendors as well.

The only real fly in the ointment is whether Lyza will be able to convince the spreadsheet jockeys to get off their current Excel rocking horse and jump on the bright new Lyza pony in the playmart! (And that sentence would work so much better if only Lyza had chosen a mustang for their logo rather than a gecko.)

Sunday, 14 September 2008

Decision Intelligence or Highly Evolved Business

In my last post, I shared some thoughts inspired by the Decision Intelligence article written by Claudia Imhoff and Colin White. There, I suggested we need to really begin to consider all information as a single resource for the whole business. This entails stepping beyond our traditional IT-bounded view of our systems and looking at them with a renewed business vision. If we do this, it will also quickly becomes clear that our view of process needs reworking too.

Claudia and Colin have drawn a box on the left of their architecture picture that arises directly from the insight that operational BI really is a different beast from the traditional BI we've all known and loved over the past 20 years or so. When you deeply consider the implications of building an operational BI system, as Claudia and Colin clearly have, it becomes obvious that operational BI has many of the characteristics of traditional operational or transaction-processing systems. Therefore, from a systems architecture point of view, you put them in the same box, in this case called "Business process intelligence".

There are also some differences, of course. The most important is how the business users interact with these two related types of system. The value proposition of operational BI is that human decision-making skills can improve operational processes. How? Well, there are two very distinct threads here.

One is the proposition that we can apply advanced analytics technology automatically to parts of the operational process. Fraud detection is a good example. Applying advanced analytics on the fly to credit card transactions gives better detection of fraudulent transactions. Note that this type of operational BI is almost completely invisible to the business users: they see the results of more fraud detected or less false positives, but how that happened is both unknown and uninteresting.

The second thread brings users very directly into the loop. Here, the operational BI technology is made part of the users' visible process. Business users are presented with decision support technology that displays trends or exceptions in near real-time data, so that they can potentially choose a different course of action to that embedded in the normal flow. In effect, business users get to change the business process on the fly, rather than doing little more than data input as was previously the case.

Now, keeping this in mind, here's the million dollar question. What's the difference between an operational system and an informational system; how do you distinguish between an operational process and an informational process? In the good old days, it was easy! The operational side was nearly or actually real-time, dealt with individual transactions or data elements according to a predefined process where the users had minimal freedom to act intelligently. Informational systems, in contrast, were centered around users who were expected to make intelligent decisions based on historical data without any clear process to turn those decisions into action.

So, what is the answer today? When we in BI start building operational BI and the operational world starts implementing adaptive SOA-based systems, the distinction between operational and informational more-or-less disappears. This puts operational BI and operational systems together in one box of the architecture. But the deeper and probably longer-term implications of this bold step have not been explicitly called out. In fact, these implications are obscured by the naming of the new architecture as "Decision intelligence", because the top level of this architecture is no longer confined to the world that was formerly BI; it actually becomes the single, common process or interface through which all business users will interact with the underlying IT systems.

Is that scary? Absolutely! But it is a clear and logical consequence of the paths that BI and operational systems are currently on. It means that we in BI are no longer in total control of our destiny. But the same is true of the operational systems. And, although I've not covered it here, collaborative systems (e-mail, office support, etc) are also being drawn inexorably into the same converged path.

It's time we all started to talk to one another! And that does imply that decision intelligence may be too narrow a term for us all to agree on. May I propose again the "Highly Evolved Business"?

Monday, 1 September 2008

Decision Intelligence

Claudia Imhoff and Colin White have a lengthy history of insightful and provocative contributions to the development of Business Intelligence. Their recent article, Decision Intelligence, is no exception. Their thesis is that the IT support needed for decision-making, now known as "Business Intelligence", today extends far beyond the traditional domain of data warehousing and is in need of a new architecture and a new name - Decision Intelligence.

I fully agree. I've been using the terms "Highly Evolved Business" and "Business Insight" over the past year or so to express exactly the same thought. Indeed, Claudia, Colin and I have discussed this whole idea already at length and are very much on the same page. But I hadn't seen their architecture picture before, and it gives me the opportunity to discuss the whole topic from a higher perspective in this and the next post.

Under Decision intelligence, the architecture shows three vertical blocks called “Business process intelligence”, “Business data intelligence” and “Business content intelligence”. The meanings of these blocks are fairly obvious, but take a look at the linked article for a full explanation. My thought is that they are almost too obvious: they closely reflect our current arrangement of systems building blocks in the IT world.

Let’s first examine the data and content blocks. Today, if you look at typical enterprise implementations, you will certainly see databases and separate content stores. You’ll also notice independent systems built upon these separate stores. But, if you step back from the storage and processing issues, it’s pretty difficult to distinguish between the two categories. Try explaining the difference to a business user!

Take an example of a clinician who’s trying to make a treatment decision. She’s looking at a chest x-ray - content in our terms. And she’s also looking at the “structured data” that goes with it: this x-ray is of a 45 year old male, smoker of 20 cigarettes a day for the past 30 years who has been admitted with shortness of breath. Does she see unstructured content and structured data that must somehow be combined in her decision making? I’d argue not. She simply sees a set of information she’s using.

And some of the old barriers between the storage of structured data and unstructured content are breaking down. Where is the EXIF data (structured metadata) of a photo stored? Yes, in the JPG file along with the unstructured content. Where do e-mail systems store the structured metadata about sender, subject, date sent, etc? Sure, in the database with the unstructured e-mail body content.

I could make a similar argument about the lack of distinction between real-time data (or operational) data and historical (data warehouse) data.

My point is that if we want to create a new vision for the future, we need to start seeing the world through non-IT eyes. It’s all information. It’s a single concept; a single category of “stuff”. And we in IT need to start creating the tools and methods that allow us to create, manage and make available all information in a coherent and consistent way. At a conceptual level, that has to be the goal and that should be our first pictorial representation.

Keep that thought in mind. I’ll come back to next time when I look at the process side of the picture.

Sunday, 24 August 2008

Instant Gratification vs. Quality Time

I was browsing through the blogs on B-eye-network.com this morning (Sunday - yeah, sad, I know) and came across two recent entries that spoiled my coffee. Given that I'm no fan of instant gratification (in IT anyway), I'm not going to give you links, so you have to work at finding them yourself. But the phrases that caught my eye were "Instant SOA", "Data marts in about an Hour" and "full EDW's with AS-IS star schemas in 2 weeks".

Now I'm as fond of a shortcut as the next guy, but I've learned the the word "Instant" is not all goodness. When I've bought some instant Spaghetti Bolognese in the local supermarket I've found that the cost is a lot higher than the individual ingredients and the taste, well, leaves a lot to be desired. Sure, I saved some time when I got home, but did I get value for money? And did I end up with what I really wanted? So, why should I expect more from an Instant DW?

"Caveat emptor" as the Romans used to say. Here are a few contra-indications for when instant gratification should not be expected in your next BI (or SOA) project:

  1. The business users are not quite sure what they want.
    Most BI projects start with a vague set of requirements from the potential users. It's going to take some time to hone these down to a usable definition of data and query needs. In the meantime, maybe it's best to let the users continue to play with their instant Excel spreadsheets and look over their shoulders to see what they're doing.
  2. Somebody forgot to document the meanings of the data in the source applications.
    This is the oldest metadata problem. If your data sources have not been properly described, an Instant DW is likely to be instantly dismissed as misleading and inaccurate. Do you want to go there?
  3. Garbage in, garbage out. Or worse...
    If your ingredients (data sources) are contaminated with erroneous data, you're going to end up with a very sick business on your hands if you just take the Instant DW approach. Understanding and fixing dirty data is time-consuming, but mandatory.

It's all about quality time... or quality vs. time. If I bring home my instant Spaghetti Bolognese, I may get it on the table within a few minutes. But, if the kids won't eat it or, worse, throw up that night, I'd argue I've made the wrong trade-off between time and quality. You need to consider the same balance in a BI or SOA project.

Now, I'm off to spend some quality time with my kids :-)

Thursday, 14 August 2008

Access to quality external data

I was at the Business Object Summit this week in Boston, where the main emphasis was on linking strategy to execution and a seeming focus on the larger enterprises. All very SAP-inspired, I thought. And very insightful, especially if you're a large enterprise. There have been some comments in the blogs already on these topics. But it was a small conversation over lunch that caught my interest...

Information OnDemand. No, not the annual IBM Conference in Las Vegas, in October. But a rather low key effort from Business Objects with a website to allow companies to access market data and incorporate it into their BI efforts.

There's a definite growing interest these days in combining external data with the contents of the warehouse. But it does raise some concerns, not least about the reliability of the external data and how to create a valid semantic relationship between the two data sets. In the past, companies have addressed these concerns by obtaining key market and other external data from trusted sources like Dunn and Bradstreet, Reuters and others and then ensuring that such data entered the warehouse via a controlled feed designed by Information Architects who could match the two data sets correctly. After all, such external data is another information source for the warehouse and should be managed like any other.

This method works well for large enterprises with a centrally-controlled approach to the warehouse. And where the value-add derived from or risks incurred by using this data are significant, this method is probably still required. But what if you are a small or medium enterprise? Or what if you really only want to do a couple of once off analyses?

Shopping at the Information OnDemand website appears to be the answer! Here you can buy prebuilt, but customizable, reports combining your data with external market and financial data. You can buy one-time snapshots or subscribe for regular updates.

For larger companies, this could provide a safe and cost-effective way of dipping their toe in the big ocean of external data. For smaller companies, it could be all they need. Sounds like a useful idea to me!

The service has been available since September 2007, but I hadn't come across it before. Maybe there are some similar services I should know about, so please feel free to comment.

Thursday, 7 August 2008

Enterprise search, Web 2.0 and BI

I came across an ad today for a Google Webcast on Universal Search for Business. It contained the phrase "As the volume of information inside enterprises explodes, most executives recognize the importance of a Google-like search solution for business content.", which set me wondering...

A Google-like search solution for business content? What exactly does that mean?

The phrase "Google-like search", of course, covers a multitude of marketing-speak, but let's assume that it includes the patented PageRank technology behind Google's Internet search success. Google itself describes PageRank as follows: "PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value." (http://en.wikipedia.org/w/index.php?title=PageRank&oldid=230400158 as of Aug. 7, 2008). A number of questions arise for me: Does an enterprise intranet usually have a vast link structure? Would business executives really consider the "democratic vote" of the organization as a valid indicator of a document's importance? Indeed, how democratic is the link structure in an intranet?

Google, Wikipedia and many Web 2.0 systems have an underlying belief in James Surowiecki's concept of "the wisdom of crowds". Data warehousing, Business Intelligence and, indeed, all traditional IT development tend to put more faith in experts and their accumulated knowledge. In the BI world, I'm beginning to see some level of acceptance that the so-called experts do not have a monopoly on business knowledge. We see that there is a growing need to allow and, indeed, facilitate the feedback of knowledge that emerges on the fringes of the BI community (the front-line staff and first-line managers) back into the core of the warehouse for wider promulgation and reuse.

But, to what extent does Google and the Web 2.0 community recognize that some knowledge is inherently more useful or valuable (although not necessarily "right") simply based on the authority of its source? And within the tighter and more closed confines of an enterprise, that not all the requirements for wise crowds are met? If not, we may see the many years of careful effort by data modelers and administrators, and information stewards overturned in the rush to Web 2.0. This would not be in anybody's interest.

On the other hand, if I've made the wrong assumption about what "Google-like search" means... Anybody care to comment? Or maybe I'll find time to sign up for the webinar!

Wednesday, 30 July 2008

Reviewing the reviews of the DatAllegro acquisition

I've been meaning to resurrect this blog for some time now, but, hey! life gets in the way. But, the recent coverage of Microsoft's acquisition of DatAllegro proved to be the trigger to get me going, though. It's all about feeds and speeds, bigger volumes, faster access and cheaper warehouses. Debates about how this will help Microsoft move up in the market and how it will impact the other vendors.

That's all very well and good, but, excuse me, have I missed something? Since when did data marts become data warehouses? I know that the appliance vendors tend to label themselves as data warehouse appliances, but I thought we all knew that was marketing. Of course, any appliance will be part of a data warehouse system in the broader sense. But when you look at the features and strengths that appliances have, you can see that they are really data marts. Data "hypermarts" perhaps, but marts nonetheless.

By definition, a data mart is a subset of the data in the enterprise data warehouse that has been optimized for use by a particular set of users. Such optimization includes selecting the data needed for some set of business purposes and structuring it to allow the fastest, most appropriate query access for users. It's all about how you get the data out! Sounds to me like exactly what the appliance vendors emphasize.

On the other hand, the data warehouse focus is on getting the data in. How to cleanse and reconcile the diverse data. How to ensure the cross-source timing is right. How to create a model that reflects the needs of the wider enterprise. And finally to make the consolidated view of the business available to the users - usually through data marts.

So, does the Microsoft acquisition disrupt the entire data warehouse market, sending the large players into a spin? I doubt it. Building a real data warehouse will continue to be as challenging as ever, requiring the same strong integration and project management skills as before as well as the deep database integration and manipulation technology that only the big relational databases possess as of now.

 

Categories