Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations

Temporal data clustering provides underpinning techniques for discovering the intrinsic structure and condensing information over temporal data. In this paper, we present a temporal data clustering framework via a weighted clustering ensemble of multiple partitions produced by initial clustering analysis on different temporal data representations. In our approach, we propose a novel weighted consensus function guided by clustering validation criteria to reconcile initial partitions to candidate consensus partitions from different perspectives, and then, introduce an agreement function to further reconcile those candidate consensus partitions to a final partition.

As a result, the proposed weighted clustering ensemble algorithm provides an effective enabling technique for the joint use of different representations, which cuts the information loss in a single representation and exploits various information sources underlying temporal data. In addition, our approach tends to capture the intrinsic structure of a data set, e.g., the number of clusters. Our approach has been evaluated with benchmark time series, motion trajectory, and time-series data stream clustering tasks. Simulation results demonstrate that our approach yields favorite results for a variety of temporal data clustering tasks. As our weighted cluster ensemble algorithm can combine any input partitions to generate a clustering ensemble, we also investigate its limitation by formal analysis and empirical studies.

Existing System:

Even though dynamic clustering method used in large database like web page collection which yields better clustering, but it needs additional computation which leads to increase in time complexity. And also when dynamic document clustering adopted for real world applications, sometimes it may not yield the desired output. And also dynamic algorithm works like static algorithm in initial clustering.

Proposed System:

An approach for dynamic document clustering based on structured MARDL technique is our objective. At first the documents are clustered in Static method using Bisecting K-means algorithm. For clustering of documents in bisecting K-Means, all documents should be preprocessed in the initial stage. The preprocessing stage includes stop word removal process and stemming process. In stop word removal process, words having negative influence like adverbs, conjunctions are removed and in stemming process root word will find out by removing prefixes and suffixes of the word.

After the preprocessing process, the documents should grouped into desired number of clusters. To make desired number of clusters, bisecting K-Means clustering method is used. In this method, each document is assigning a weight by term frequency and inverse document frequency method using cosine similarity measure. After assigning weight to each document, the documents are first separated into clusters using k-Means method. After clustering of documents using K-means method the largest cluster will split and forms two sub clusters and this step would be repeated for many times until clusters formed are with high similarity.

Modules:

  • Preprocessing
  • Bisecting K-means
  • Proposed Dynamic Algorithm

Tools Used:

Front End : JAVA