چكيده انگليسي :
Linear manifold clustering in high dimensional spaces by stochastic search Babak Tahmasebi Babak tahmasebi@math iut ac ir March 1 2016 Master of Science Thesis in Farsi Departement of Mathematical Sciences Isfahan University of Technology Isfahan 84156 8311 IranSupervisor Dr Sayed GahramanTaherian taherian@cc iut ac irAdvisor Dr Reyhaneh Rikhtegaran rr ikhtehgaran@cc iut ac ir2000 MSC 05C2520B25Keywords Clustering Linear manif old Subspace Histogram thresolding Data exploration distance histogram correlationAbstract This M Sc thesis is based on the following paperLinear manifold clustering in high dimensional spaces by stochastic search European Journalof Combinatorics 30 2009 602 616 The subject of clustering can be defined as the partitioning of a set of points in a multidimensional space intogroups called clusters such that the points in each group are in some sense similar to one another Finding theseclusters is important because their points correspond to observations of different classes of objects that mayhave been previously unknown Classical clustering algorithms are based on the concept that a cluster centeris a single point Clusters which are not compact around a single point are not candidates for classical cluster ing approaches Convenient methods are based on similarity near distance from each other or dissimilarity far distance from each other of data for clustering A kind of latent information that may be of interest arecorrelations in a data set A correlation is a linear dependency between two or more features attributes of thedata set Amongst the dataset there are points that have a kind of correlation such as the points of a line ora plan In this thesis we have tried to cluster a group of data based on a special kind In this new approach the points on a linear manifold situate into a cluster Knowing about the existence of a relationship betweenfeatures may enable us to learn hidden causalities In this thesis we have tried to cluster a group of data basedon a special kind In this new approach the points on a linear manifold situate into a cluster Consider K asthe maximum dimension for clusters to include For k 1 K we select a set of k 1 points from the wholedataset in a way that construct a set of k linearly independent vectors Then using the Gram Schmidt process we deduce an orthonormal basis for a k dimensional manifold Then we calculate the distances of the rest ofpoints in the dataset from the center of this manifold and make the distance histogram for the dataset In thenext step we employ Kittler Illingworth s method to find a minimum error threshold in the histogram Thepoints of the dataset are separated according to this threshold and the points with distances less than it are putinto the k dimension cluster If this separation is good enough the above procedure is repeated for this clusterwith k k 1 to find higher dimensional clusters if any until we cannot find any more clusters Therefore inaddition to construct new clusters we can find clusters that exist in previous clusters The process is restartedfor the rest of the dataset with k 1 Finally outliers are clustered in a single cluster which clearly does form amanifold