RogueWolves

Rogue Wolves is the personal site of .

I'm currently a research scientist with Oculus Info Inc. in Toronto, Ontario Canada.

My research interests include: adaptive user interfaces, machine learning, Bayesian reasoning and distributed artificial intelligence.

Rise of the Machine-oriented Web

Stephen Wolfram proposes a Top Level Domain (TLD) to form a data web:

But wouldn’t it be nice if there was some standard way to get access to whatever structured data any organization wants to expose? […] My concept for the .data domain is to use it to create the “data web”-in a sense a parallel construct to the ordinary web, but oriented toward structured data intended for computational use. The notion is that alongside a website like wolfram.com (http://www.wolfram.com/), there’d be wolfram.data.

Why have a top level domain (TLD) over say a sub-domain like: data.roguewolves.com or sitemap-like construct?

Now of course one could just start a convention that organizations should have a “/datamap.xml” file (or somesuch) in the root of their web domains, just like a sitemap-rather than having a whole separate .data site. But I think introducing a new .data top-level domain would give much more prominence to the creation of the data web-and would provide the kind of momentum that’d be needed to get good, widespread, standards for the various kinds of data. […] If a human went to wolfram.data, there’d be a structured summary of what data the organization behind it wanted to expose. And if a computational system went there, it’d find just what it needs to ingest the data, and begin computing with it

This is an interesting idea. Creating a TLD could help promote organizing the web into a human-oriented web and data web. This would be a good first step in making data available, but to support machine computation we need standards to describe the data. Semantic web standards are complex, hampering their adoption. Recent proposals such as RDFa and Microformats have taken a different approach to the data web. Rather than having separate human/machine representations of web data, instead the data is semantically marked up inline with human readable content. The same data can be presented to humans and machines for consumption. This is a nice approach, but mostly appropriate for document oriented data that is intended for human consumption. Large data-bases of data are highly valuable but rarely exposed as human consumable data. There is a need for a machine oriented data web.

It’s been interesting for me in the past few years to be involved in the emergence of the modern data community. And from what I have seen, I think we’re now just reaching a critical point, where a wide range of organizations are ready to engage in delivering large-scale structured data in standardized forms.

The rise of the machine-oriented web has tremendous potential for data mining, search, automated inferencing and computation. The advanced reasoning capabilities that can be built on top of the machine-oriented web could transform our society. IBM’s Watson, Apple’s Siri and Wolfram Alpha are examples of advanced capabilities that can be built from availability of large structured data.