شماره مدرك :
4046
شماره راهنما :
3824
پديد آورنده :
شمشي نژاد بابكي، پيروز
عنوان :

ارايه يك روش جديد در انتخاب ويژگي بدون نظارت براي دسته بندي متون مبتني بر الگوريتم ژنتيك

مقطع تحصيلي :
كارشناسي ارشد
گرايش تحصيلي :
معماري كامپيوتر
محل تحصيل :
اصفهان: دانشگاه صنعتي اصفهان، دانشكده برق و كامپيوتر
سال دفاع :
1386
صفحه شمار :
ده، [121]، [II] ص.: مصور، نمودار
يادداشت :
ص.ع. به: فارسي و انگليسي
استاد راهنما :
محمد حسين سرائي
استاد مشاور :
مازيار يالهنگ
توصيفگر ها :
شكل شناسي , گرامر , معاني , IR , IE , تابع برازندگي
تاريخ نمايه سازي :
18/04/87
استاد داور :
علي محمد دوست حسيني، فريد شيخ السلام
دانشكده :
مهندسي برق و كامپيوتر
كد ايرانداك :
ID3824
چكيده فارسي :
به فارسي و انگليسي: قابل رؤيت در نسخه ديجيتال
چكيده انگليسي :
AbstractIn recent years vast amount of textual information is collected and stored in variousdatabases around the world including the Internet as the largest database of all Text hasturned into an important and critical resource for discovering information From businessto medical and security worlds for catching up with competitors to treat diseases and toprevent threats against national security necessity to extract information from the text or toestablish textual associations is undeniable This burgeoning growth of published textmeans that even the most avid reader cannot hope to keep up with all the reading in a field and nuggets of insight or new knowledge are at risk of languishing undiscovered in theliterature Text mining offers a solution to this problem by replacing or supplementing thehuman reader with automatic systems undeterred by the text explosion It involvesanalyzing a large collection of documents to discover previously unknown information The information might be relationships or patterns that are buried in the documentcollection and which would otherwise be extremely difficult if not impossible to discover Text clustering is one of the most important areas in text mining which includes textpreprocessing dimension reduction by selecting some terms features and finallyclustering using selected terms Feature selection is appeared to be the most important stepin the process Conventional unsupervised feature selection methods define a measure ofdiscriminating power of terms to select proper terms from corpus Evaluation of terms ingroups has not been investigated in reported works In thesis a new and robust unsupervised feature selection approach is proposed thatevaluates terms in groups Considering terms in group is to find terms that can group withother terms to be used for clustering based only on low power of discriminating Inaddition a new Modified Term Variance measuring method is proposed for evaluatinggroup of terms Furthermore a genetic based algorithm is designed and implemented forcomputing new measure and finding final feature vector use by clustering task In order toevaluate and justify our approach the proposed method and also conventional termvariance method are implemented and tested using corpus collection Reuters 21578 Results of comparing these two methods are very promising and show that our methodproduces better average accuracy and F1 measure than conventional term variance method
استاد راهنما :
محمد حسين سرائي
استاد مشاور :
مازيار يالهنگ
استاد داور :
علي محمد دوست حسيني، فريد شيخ السلام
لينک به اين مدرک :

بازگشت