پديد آورنده :
موجودي رناني، الناز
عنوان :
انتخاب متغير در خوشه بندي مدل محور با رويكرد منظم سازي
مقطع تحصيلي :
كارشناسي ارشد
گرايش تحصيلي :
اجتماعي و اقتصادي
محل تحصيل :
اصفهان : دانشگاه صنعتي اصفهان
صفحه شمار :
ده، 107ص. : مصور، جدول، نمودار
استاد راهنما :
ريحانه ريختهگران
توصيفگر ها :
خوشهبندي مدلمحور , انتخاب متغير , منظم سازي , لاسو , توزيع آميختهي نرمال , معيار اطلاع بيزي , انتخاب مدل
استاد داور :
محمد محمدي، زهرا صابري
تاريخ ورود اطلاعات :
1399/09/29
تاريخ ويرايش اطلاعات :
1399/10/03
چكيده انگليسي :
Variable selection in model based clustering with a regularization approach Elnaz Mojoudi Renani e mojoudi@math iut ac ir 16 September 2020 Master of Science Thesis in Farsi Departement of Mathematical Sciences Isfahan University of Technology Isfahan 84156 8311 IranSupervisor Dr Reyhaneh Rikhtehgaran r rikhtehgaran@iut ac ir2010 MSC 91C20 62H30 68T10 47A52Keywords Model based clustering Variable selection regularization Lasso Gaussian mixture Bayesian informa tion criterion Model selectionAbstract This M Sc thesis is based on the following papers Celeux G Maugis Rabusseau C Sedki M Variable selection in model based clustering and discriminant analysis with a regularization approach Advances in Data Analysis and Classification 2019 Mar 8 13 1 259 78 Maugis C Celeux G Martin Magniette ML Variable selection in model based clustering A general variable role modeling Computational Statistics Data Analysis 2009 Sep 1 53 11 3872 82 Raftery AE Dean N Variable selection for model based clustering Journal of the American Statistical As sociation 2006 Mar 1 101 473 168 78 Nowadays with the advent of the Internet and the advancement of technology a large number of data and variablesare being produced and collected in many fields Clustering is one of the useful methods for analyzing data with largenumber of variables called high dimensional data The purpose of clustering methods is identifying homogeneousstructures among individuals described by variables There is usually no description or information about the patternand structure of the data before clustering which is a big challenge One of the most popular and important dataclustering approaches is the multivariate model based clustering approach The presence of irrelevant variables thatdo not possess clustering information may obscure the data group structure Therefore it is necessary to delete theirrelevant variables by applying variable selection methods Given to the importance of this issue this thesis dealswith the problem of variable selection in the framework of model based clustering approach by applying Gaussianmixture models For this purpose three methods of variable selection and their related algorithms are described andstudied The first method is the SRUW variable selection method in which the variable selection problem is considered as themodel selection problem In the SRUW method variables have three different possible roles In other words in thismethod a variable can be known as a relevant clustering variable S or be known as a redundant variable U whichare described according to a linear regression model by a subset of the relevant variables R S or be known asindependent variables W and independent of all variables In the SRUW model two embedded backward stepwisealgorithms are used to detect the SRUW sets At each step of the algorithm it compares the two models with the BICcriterion to decide which variables should be removed or added to sets S R U and W An extension of the SRUW variable selection approach is also described in which it first ranks the variables usingthe lasso like regularization method and applying the lasso penalty on the Gaussian mixture parameters Then ituses the variable selection algorithm used in the SRUW approach for these ranked variables to determine the roleof the variables Also another variable selection method called RD is described in which only two possible rolesare considered for the variables In this method the variables are either relevant clustering variables or irrelevantvariables explained by relevant variables according to a linear regression model There is a headlong algorithm forthe RD method In the first step of this algorithm using a BIC criterion the importance of each variable in clustering
استاد راهنما :
ريحانه ريختهگران
استاد داور :
محمد محمدي، زهرا صابري