“Text Mining is the study and practice of extracting information from text using the principles of computational linguistics.”

Sullivan, D., 2000

:: Learn more about the field of text mining at Wikipedia.

 

:: What is The Semantic Web?

The Semantic Web is currently the strongest candidate to become the leading technology of the Web 3.0. It is considered by most to be the ‘new wave’ of the internet. Its application consists of migrating from a network that connects documents to a network that connects information.

Structurally, the Semantic Web can be seen as a layer of semantic definitions beneath the existing internet which would make searches less catalog-like and more guide-like – or even provide a basis for systems that will think in similar fashion to that of humans. That’s right: Artificial Intelligence. The Semantic Web promises the possibility of the understanding of human language by machines in the recovery of information, without the need for the user to know refined search strategies.

The objective of the Semantic Web is to organize the web according to the semantic content of each document, each site and each portal. On the path to this organization, XML is the de facto standard for data mark-up, and using it, RDF is more and more the accepted standard to represent the semantic schema of web pages. RDF uses a trio of artifacts to model the semantics of documents. Each artifact has its own purpose, in analogy to the subject, verb and object in a phrase and each RDF document receives its own URI in order to prevent ambiguity.

The Other Side of the Semantic Web

Several tools are already adapting to the Web 3.0, giving support to, and making use of RDF. However, RDF-enabled sites are few and far between. The problem is how much work it takes to build them manually. Imagine going page by page in a portal, creating documents that model the semantic content of each piece of text!

Cortex’s technology proposes to structure these unstructured data in the Web. We aim at an intelligent system that automatically converts pages and documents to the ‘semantic’ format (i.e. RDF). Thus, the bridge to the Semantic Web will be developed. This is what Cortex Intelligence is prospecting. We have a R&D team working to bridge the Semantic Web gap by automatically enriching text with semantic content for our own applications, and for the ones incoming from the market.