|
Text mining is generally defined as a process that uses methods to navigate, organize, find and discover information in text bases written in natural language. With text mining one can more easily manipulate unstructured information such as news, website texts, blogs and documents in general.
Historically, the importance of text mining gained strength in the nineties, with the growth of digital storage and the internet (web mining). At the same time, analysts began to notice the absence of text mining tools to deal with all the unstructured information.
An important part of the text mining process is text preparation, whose goal is storing an unstrucutred text in a structured data base. This operation is necessary for a computational algorithm to be applied.
The challenge in the development of text mining technologies comes from the need for specific knowledge about distinct areas such as statistics, computer science, linguistics and cognitive science.
The secret of a complete text mining solution is combining Software Engineering techniques, Machine Learning, Information Retrieval and Data Mining.
|