Thursday, March 27, 2008

A Conceptual Framework for Semantic Integration

Let’s take a moment to elaborate on the core premise. What we’re really looking for is a mechanism that can tie together the ever-growing spectrum of specialized information, tools and techniques that we are forced to deal with in a typical enterprise today. The natural trend has pushing away from unified management or control over this environment and we have been playing catch-up for years as we attempt to impose order over enterprise entropy.

This situation applies equally to the development as well as the production environment. The fact is that no single control mechanism has yet been successful at tying together either environment (or both). And as we well know, complexity costs us big – in money, in time and in failure to meet expectations. SOA held promise in that it in contained control elements for both development and production. We are extending this to the next logical step – comprehensive control or at least direction over everything.

But ‘control’ is a frightening term for many, it may imply micromanagement which we know from experience doesn’t work too well in environments where change rules supreme. We don’t wish to stifle innovation or slow down the pace of change – any attempt to do so would surely lead to failure. What we need to do is understand or set the parameters of our known universe, to consciously design it in advance at a high level and guide all of its constituent elements along an evolutionary path that falls with those defined parameters. This could be viewed as ‘loose conformance’ rather than strict compliance.

Our enterprise universe is no longer restricted to any one organization, so the framework for accomplishing this definition must be a shared endeavor. This doesn’t necessarily imply technical standards; technical standards must all be subordinated to the true control mechanism – Semantics. Semantics allows us for the first time to design for scenarios where 100’s or 1000’s of applications share the same type of data and deconflict both their functions and information output.

The conflict between what appears to be redundant functionality and competing data elements is the most serious and complex challenge in every enterprise today. Many solutions have tried to address this by tracking metadata across the enterprise or by imposing governance rules – however both of those attempts still lack the most important key to success – the overall context in which all of the data and rules will operate – across not one but all potential enterprises. Semantic Integration will provide this context.

Copyright 2008, Semantech Inc.

Tuesday, March 25, 2008

Ontology Abstraction

For Ontologies to serve as a mechanism for reconciling diverse systems and as an integration engine, they must be flexible and separable from the rules (and the systems applying those rules) that would define the nature of the interactions. This is what is referred to as "Ontology Abstraction."

Ontologies are also separate and abstracted from meta-data. One reason that most data standardization efforts have either failed or seriously underperformed is the idea that anyone can fully define the nature of all enterprise data and predict how that will evolve over time - it always has been and will remain wholly unrealistic.

What we need instead are less painful ways to capture and characterize the changing nature of our enterprise data environment. Using Ontologies for this will allow us to manage data in human readable formats that can readily be shared with end users as well as functional experts. Those groups will define Ontologies based upon generic long term expectations (formal sets), dynamic evolution and discovery (dynamic sets) and the business logic logic needed to manage both (interaction rules).

One of the key concepts underpinning Semantic Integration is Ontology Abstraction

Copyright 2008, Semantech Inc.

Wednesday, March 19, 2008

Taxonomy & Mind Maps

The other day someone asked me what was the best method to build a taxonomy, my initial response was a high level overview outlining the pragmatic nature of the beast but what I forgot to mention is the specific technique/s I use to get started. Over the years I've tried a number of different approaches, none of them perfect. I've used Powerpoint, the Outline feature in MS Word, a pencil and paper, other EA modeling tools and so on. But about two years ago, I became acquainted with Mind Mapping. Mind maps are very well suited for the initial part of the semantic integration process - they are easy to learn, very quick to use and provide a fairly comfortable visual interface to your resulting "Brainstorm" products.

Most Mind Mapping tools have a bit of a constraint though in that they capture "Dual Hierarchies" - in other words they split out trees in only two directions. There are many times when I had thought that the ability to have one tree, centered, would be preferable, but there are some advantages to the dual tree approach. With two trees you can begin to do some preliminary semantic reconciliation or you can use the the second tree (for me this is usually the left side) to capture a set of assumptions or constraints which underlie whatever is built into the right side tree.

The tool I use for this is Freemind - because as the name implies, it costs nothing. There are many other mind-mapping tools on the market and they have a wide variety of features - but for me this tool now fits a specific niche with my lifecycle. While having the ability to save it as data would be nice, it isn't absolutely essential this early in the process. This is a tool that was designed for brainstorming and it doesn't necessarily need to be linked to the initial design as long as that initial design is then linked through the rest of the lifecycle.

an example of a rapidly produced "Dual Branch" Mind Map Taxonomy

Copyright 2008, Semantech Inc. Response: Open Source CMS

Are corporate intrantets generally custom coded, or are they increasingly using either Open Source CMSs / Wikis or commercial intranet or CMS packages?

Well, the USAF (several 100 thousand users) is deploying its ECM solution using a combination of IBM content management & Sharepoint technology. The Army is using a portal based COTS which is not really mainstream software and the Navy is using Sharepoint in many instances. None of these mega-enterprises has yet fully embraced open source content management. In answer to your question, it makes no sense for a larger or global enterprise to develop it's own intranet - many have done this though over the years because they felt it would be cheaper than paying the license fees.

However, as user expectations become more prolific the development team is faced with higher costs and the cost benefit business case starts to fades away. The case for open source CMSs is compelling and will likely help move more IT shops away from custom development towards packaged applications.”

Copyright 2008, Semantech Inc.

Monday, March 17, 2008

What is a SUO?

A SUO is something that was talked about quite a bit a year or two ago but seems to be fading a bit of late. SUO stands for "Shared Upper Level Ontology" and represents a baseline of sorts for complex semantic mapping activities. The problem I noticed immediately with the concept was two-fold:

1 - High level taxonomies are extremely useful in situations where an organization (or shared community) has the ability to manage it strictly. For example, there is essentially one shared interpretation of the Animal "Kingdom" originally proposed by Linneaus in the 1700s. However once you move from universal consensus to competing interpretations things start to get complicated. How many official variations of English dialects could be recognized as SUOs and how might they relate to an "Oxford" version?

2 - It is unrealistic to expect that a fairly rich understanding of the potential relationships can be captured within a SUO - which means then it becomes less of a true Ontology and more of a taxonomy. Folks working on on SUOs some years back tried to take this into account:

EEE SUO working group

Thusfar the largest SUO project is the Suggested Upper Merged Ontology (SUMO) initiative. I'm not too sure how useful this is though as the standard for exchanging the Ontology data is rather narrowly focused (KIF Knowledge Interchange Format). For us to be able to include semantic integration into larger enterprise integration projects a more standard XML-based approach is required.

Copyright 2008, Semantech Inc.

Thursday, March 13, 2008

The Future of Semantic Technology

I had a very interesting conversation today that got me thinking. The topic revolved around where this particular segment of the larger IT domain might go - in terms of both scope and success.

In many ways, Semantic Technology has been totally defined within the context of the "Semantic Web" and the set of standards related to that W3C initiative. My contention to the colleague I was speaking to was that I had never pegged the nature or scope of what Semantic Technology is to the "Semantic Web" concept. I think they are quite complementary but the Semantic Web represents a narrower view in many respects to how Semantics ought to be viewed in the context of Information Technology as a whole.

Before the latest round of proponents of web-based or web-focused semantic applications, there were folks like Chomsky, who began to illustrate the deeper connections between meaning and representation - some used to refer to this as computational linguistics, but the scope was wide - it necessarily encompassed the entire spectrum of architectures and processes that surround any or every automation solution. I was pleased to see that Steven Pinker, an author coming from that original community of linguistic-focused academics writing a book about Ontology last year. This is a good sign the broader-base of thought leaders may be converging.

The problem that I see with the current practice of Semantic Integration and the current crop of Semantic technologies is that they have been too easily shunted off into their own relatively small niches. Granted, there have been and are still exceptions to this, but for many the notion of building vocabularies, ontologies and so forth seem more or less disconnected from the reality of their everyday challenges. Semantics is not an endeavor that serves itself, if it is viewed as the primary building block for everything in IT – the nature of the products and practices supporting it will change.

I’ll provide a concrete example – why should an enterprise architecture be managed separately from systems requirements and why should those both be separate from the BPEL workflow logic that drives a SOA-based portal? The short answer is they shouldn’t and don’t have to and this is only part of the larger synergy possible. We may be able to begin to apply Semantic Web technology or standards by passing RDFs back and forth, but the real leap we’re making here is more conceptual in nature.

The future of Semantic Technology is entirely dependent on the ability for us to make it relevant to this larger context or higher calling perhaps. I’ve spent more than a decade working in mostly larger, system of systems enterprise integration or transformation initiatives – many have claimed that they had the silver bullets which would simplify this arena (CORBA, J2EE, EAI, SOA etc.). The problem has always been perhaps though that we focused on the bullets instead of the gun…

Copyright 2008, Semantech Inc.

Context Mapping

What if in the same organization a word such as, let’s say “SOA,” meant one thing to the majorityof stakeholders (our best criteria ought to be some sort of official or community endorsement) on May 20th, 2006and then meant something slightly different on November 11th, 2006 and then something completely different on July 10th, 2007? Which meaning is valid? In most cases, we’d simply go with the latest version of the meaning. But life is never that simple is it? It just so happens that we’ve been given the charter to integrate with four other organizations who all have various interpretations of the same word that all have evolved over time. How can we reconcile this; or even study it?

Now you begin to see the value of Context and Dynamic Context. Without the ability to view the variations andtheir respective evolutionary paths it will be difficult for us to determine an appropriate, Integrated reconciliation for that term – one that will be accepted by all the constituents of all groups (or most of them anyway). Context Mapping is the ability to visualize the evolution or comparison of Vocabulary terms, Taxonomies, Ontologies or Sets within or across constituent perspectives, i.e. Contexts. Once a reconciliation has been chosen it can be used to generate a Dynamic Set or Sets. Of course, part of the reconciliation process for Context Mapping could include generating “What If” Dynamic Sets to see how these would perform against Semantic Rules and Formal Sets. This process would be used to determine what types of logic and data will be used to integrate organizations across domains, it will determine the structure of all systems or systems involved as well as the processes which link them.

Put another way, Context Mapping represents one of two core analytical processes involving Contexts; Mapping supports the reconciliation of both Context and Dynamic Contexts into “Integrated Contexts.” Integrated Contexts are the building blocks for Dynamic Sets and Semantic Rules.

Copyright 2008, Semantech Inc.

Wednesday, March 12, 2008

Context & Dynamic Context

Semantics must be able to support analysis as well as application interoperability.Context represents our most powerful analytic mechanism. Generically, Context refers to the specific perspective ofany group, individual or entity in regards to any combination of Semantic information. Dynamic Context takes this astep further by combining any given Context with a unique point in time. The reason for this is clear, Context is not a permanent state; perspectives change with time. The only way to accurately determine true Context is to capture it in its relative state. Context can also be tracked across time to illustrate the evolution of any given perspective.

The reason that most data standardization efforts have failed in IT is that unless one is working in a highly controlled environment, differing Contexts cannot be accommodated. Ultimately, someone or some group always feels left out and in fact they are. When dealing with information across hundreds, thousands or even millions of users or organizations, traditional data standardization methods and techniques can never hope reconcile the differences and support interoperability on a global scale. If Context or Dynamic Context is understood we can then determine how to construct Dynamic Sets that allow us to Interact with Shared Formal Sets. The future of all integration may very well become centered around the creation of Dynamic Sets and Dynamic Semantic Rules. These tools will determine the exact information (structured and unstructured) and logic we may need to accomplish any given task, answer any given question or solve any given problem.

Copyright 2008, Semantech Inc.

Monday, March 10, 2008

Levels of Semantic Information, Boundaries & Interaction

There are far too many collisions occurring on the information superhighway; information and data colliding, combining, losing identity and integrity. The traffic outlook today is grim. There is no effective traffic control or organization mechanism; routers and packet data flow are different than information flow control. Several years ago, the concept of the Semantic web was introduced by the founders of what we consider to be the Internet. It was understood back then that the torrent of information being made available online would soon become unmanageable unless some context was provided.

Context is shared meaning, in order words, Semantics. A lot of work has gone into developing semantic standards that enhance the ability to add meaning to resources available on the web or across any complex environment; examples include XML-based standards such as OWL and RDF. That begins to cover the realm of unstructured data or information, allowing us to build shared ontologies at multiples levels. In the realm of structured data, there is a proliferation metadata mechanisms already in place, some of those are capable of interfacing with semantic resource standards, others aren’t.

What does not yet exist are sets of community-developed, shared information and the ability to define interactions between resources within and across their boundaries under a variety of conditions. Any realistic adoption of this type of framework would necessarily include the ability to define overlapping Boundaries or Dynamic Sets based on real-time discovery or idiosyncratic needs. The key is making sure that one understands which Sets are more or less permanent (and this notion is subject to community consensus) and which ones are created for semantic mining (the near-term exploitation of resources within the dynamic set). This process is something that cannot and should not be managed by any one technology or software product – it does however require a shared capability. That capability involves the ability to define Formal and Dynamic Sets of meaning. Sets contain Semantic Information such as Ontologies, Taxonomies and Vocabularies.

Copyright 2008, Semantech Inc.

Sunday, March 9, 2008

Our Semantic Terminology

As one might expect, the terminology for Semantic Integration is in itself extremely important - it represents our "meta-semantics."

Let’s also explore what these terms signify for us in their enterprise integration

  • Vocabulary – This is the atomic level view and is analogous to a data entity.
  • Taxonomy – This includes the vocabulary, is a straight forward hierarchy and is analogous to earlier DBMS design paradigms.
  • Ontology – This includes both of the above and represents a structure that expresses both a hierarchy and a set of relationships between vocabulary ‘elements’ within that hierarchy. This is roughly analogous to the design paradigms involved in Relational Database technology, although a schema is not necessarily an ontology and tends to be restricted to the system level.
  • Semantic Set – This is the recognition that data design (in fact all design) for the enterprise extends beyond the bounds or scope of any one system. The enterprise must deal with multiple ontologies, taxonomies, and vocabularies and reconcile them on an ongoing or evolutionary basis.

Copyright 2008, Semantech Inc.

What is Semantics?

So, what is Semantics? It is an often misunderstood term; even moreso in regards to its technical applications. In philosophy, Semantics refers to the study of meaning. The representation and dissemination of meaning though is what IT is all about. Every data element, every character in a string, every variable in an equation; they all express meaning in one form or another.

Furthermore that meaning is enhanced through frameworks of syntax and grammar as well as through countless explicit and implicit relationships. All system design is predicated upon a contract of shared understanding between stakeholders, developers and service providers; when something goes wrong this is often the first place to look.

There are a number of specific standards and tools that have emerged over the past few years to support Semantic Integration; however first we need to examine the problem space from a philosophical and business level. To understand how Semantics can be used to facilitate enterprise integration, we must first understand how Semantics relates to the practice of IT. Semantics is heavily focused upon hierarchies of meaning and relationships and as one might expect Semantics has its own hierarchy.

Copyright 2008, Semantech Inc.