Friday, May 30, 2008

Thoughts on the Semantic Technology Conference 2008

Wow! Just wow. What a great conference! You (yes you) NEED to go next year. A high signal to noise ratio. Lots of smart people (about 1,000 attendees). But the most amazing part, for me, was the content. There's a major paradigm change coming to the software industry, and this is it. So, all those RDF/OWL/Semantic Web links coming and going on the link sites, you should pay more attention to them. Really. Here's why: It will change the way you think (unless you are a Lisp developer (but that's another post ;)).

Now, I'm a skeptical person. I don't believe in vampires or silver bullets and I'm not a big standards fan (useful sure, but, I tend to think they are over-hyped and over-complicated). So the quick glances I gave to the semantic web articles never had a chance to take root in my brain. Attending in the Semantic Technology Conference changed things. My first thought was: wow... how could I have missed this? My suspicion is that most software developers are oblivious to the major paradigm earthquake coming. I've confirmed this (at least to myself), by sanity checking with my friends and co-workers over the past week and some google trends searches. My hope here is to convince just a few more people that they need to take a serious look at what's really going on here.

A brief Semantic Web summary:
The main notion of the semantic web is best summarized (IMHO) as: inferencing over an ontology. Sure there's a lot more than that going on, but that's where it starts. Now it's helpful to realize that (as put forth at the conference by Jim Hendler this comes in two flavors: "Big O" Ontology and "Little o" ontology. The "big O" flavor is rather rigid and formalized. Most useful in an enterprise setting where you have full control over it's structure and reliable reasoning is critical. At the other end of the spectrum, little o ontology, is loosely specified, most likely mixed from numerous sources without one place of central control. This variant is flexible, but is prone to conflicts and can be difficult to reason over. Little o is the world of web mash-ups. Each has pros and cons and devote followers.

Okay, big deal, you say. Yeah, I agree, this is kinda interesting, maybe powerful, but hardly a paradigm shift. Dig deeper.

The big hint (for me): The ubiquitous talk of "triple stores".

To everyone at the conference, it was given that you would persist your semantic web data in a triple store (okay, a couple were suggesting anything from 4-8 columns of information), that's on a per attribute instance folks. In other words, every single row of your N column table (in the normal RDBMs world) would be persisted as N+1 rows each with (at least) three columns of data (id, column name, value (though in semantic talk, these are subject, predicate, object)). I can hear it now. Holy cow, that's a lot of rows. Yup. Billions. Psst... these guys are smart and they've been working on these problems for several years now. Haven't you heard of Sesame/Mulgara?! Yeah, me neither. They know what they are doing (still as bit more to go for large data sets, but they'll get there).

Okay, so they use triple stores, seems silly and certainly inefficient, maybe... but, think about it. They have just taken one of the big headaches of relational databases, inflexible schemas and made them about as fluid as water. Sure you've got your ERD and ORM tools to keep your schema's up to date when you make a change. But these guys can change things by a mere insert or delete. Orders of magnitude easier. Whether it's a simple attribute or a relation to another object, they are identical in this world (as they are in RDBMs I s'pose), just and insert or two (I suspect some relationships would be persisted bi-directionally (even if done automatically by the inferencing engine)).

They separate the structure definitions from the instances of those concepts, and they are both data. that's a very powerful concept. In the end, I think it will feel about as comfortable doing this in OO languages as it was to do OO programming in C (yeah, you can, but neither is designed with that purpose in mind). Couple this with the notions of abstracting out the business rules, and I think we have the ingredients for a new programming language.

Well, that's all for now. Take a look at these things (but take off your OO glasses). Start with RDF, OWL and SPARQL. A good starter book (especially for OO developers) is "Semantic Web for the Working Ontologist" by Dean Allemang and Jim Hendler. I've been reading this a bit every day since I bought it at the conference last week. Thanks for stopping by.