TEXT CLUSTERING BASED ON THE N-GRAMS BY BIO INSPIRED METHOD (IMMUNE SYSTEMS)
Main Article Content
Abstract
In this paper we present the results of unsupervised classification (clustering) of unstructured
data in this case the textual data from Reuters 21578 corpus with a new biomimetic approach using
immune systems. Before to experiment the immune systems, we digitalized our data: textual
documents from the database REUTERS 21,578 corpus by the approach of N-grams. The novelty lies
on the hybridization of the n-grams and immune systems for classification. Section 1 gives an
introduction and state of the art, Section 2 presents representation of texts based on the n grams,
Section 3 describes the approach of immune systems for clustering, Section 4 shows the
experimentation and comparison results and finally Section 5 gives a conclusion and perspectives.