پديد آورنده :
باقري، ايوب
عنوان :
پياده سازي و بهبود الگوريتم هايي براي مسئله انتخاب ويژگي در رده بندي متون
مقطع تحصيلي :
كارشناسي ارشد
گرايش تحصيلي :
هوش مصنوعي و رباتيك
محل تحصيل :
اصفهان: دانشگاه صنعتي اصفهان ، دانشكده برق و كامپيوتر
صفحه شمار :
ده، 91ص: مصور، جدول، نمودار
يادداشت :
ص.ع: به فارسي و انگليسي
استاد راهنما :
محمد حسين سرائي
استاد مشاور :
مازيار پالهنگ
توصيفگر ها :
الگوريتم سرد شدن شبيه سازي شده , اطلاعات متقابل اصلاح شده
تاريخ نمايه سازي :
21/11/88
دانشكده :
مهندسي برق و كامپيوتر
چكيده فارسي :
به فارسي و انگليسي: قابل رويت در نسخه ديجيتال
چكيده انگليسي :
Implementation and Improvement of Some Algorithms for Feature Selection Problem in Text Classification Ayoub Bagheri a bagheri@ec iut ac ir Date of Submission November 2009 Department of Electrical and Computer Engineering Isfahan University of Technology Isfahan 84156 83111 Iran Degree M sc Language Farsi Supervisor Mohammad H Saraee Saraee@cc iut ac ir Maziar Palhang palhang@cc iut ac irAbstractToday Progress through software and hardware facilities causing easily be storedamounts of data Day by day the number of text documents is increasing e mail webpages texts news and articles are only part of this range of increasing Thus the needfor text mining techniques such as methods for automatic text classification is felt Inthe automatic text classification feature selection from within any text is one of themost important steps Feature selection is using for space dimension reduction because the feature space includes tens of thousands of words that will cause the nextprocesses of system be impossible Different methods to feature selection data textshave been designed each with advantages and disadvantages but the most generalmethod of classification systems used to texts and also is a high efficiency rate has notbeen introduced For improving system performance in text classification in thisthesis two new methods for feature selection is presented The first method is basedon simulated annealing algorithm The simulated annealing algorithm requires anappropriate function for fitness evaluation Therefore the document frequencymethod is used for evaluating the solution of each iteration of simulated annealing Document frequency as a fitness function has little computational cost The secondmethod to select features which presented in this work is based on mutual informationmethod that we named it modified mutual information Finally the performance ofthe proposed methods is compared to methods such as chi square correlationcoefficient simple chi square information gain mutual information documentfrequency and term frequency variance on a collection of Persian texts and reachedthe conclusion that proposed methods have better performance in most cases Amongthe methods studied chi square and correlation coefficient are comparable to theproposed methods In addition the examination of the results show that the proposedmethod based on simulated annealing algorithm will conquer modified mutualinformation method The proposed algorithms have the lowest performance onpolitical class and the highest efficiency on sport class Keywords Text Classification Feature Selection Simulated Annealing Algorithm ModifiedMutual Information
استاد راهنما :
محمد حسين سرائي
استاد مشاور :
مازيار پالهنگ