شماره مدرك :
8002
شماره راهنما :
7437
پديد آورنده :
شيدايي، نويد
عنوان :

متن كاوي متون فارسي: در راستاي پيش پردازش و دسته بندي مقالات خبري فارسي

مقطع تحصيلي :
كارشناسي ارشد
گرايش تحصيلي :
كامپيوتر﴿نرم افزار﴾
محل تحصيل :
اصفهان: دانشگاه صنعتي اصفهان، دانشكده برق و كامپيوتر
سال دفاع :
1391
صفحه شمار :
ده،99ص.: مصور،جدول،نمودار
يادداشت :
ص.ع.به فارسي و انگليسي
استاد راهنما :
محمد حسين سرايي
توصيفگر ها :
ريشه يابي , رده بندي چند برچسبه
تاريخ نمايه سازي :
9/7/92
استاد داور :
رسول موسوي، عبدالرضا ميرزايي
دانشكده :
مهندسي برق و كامپيوتر
كد ايرانداك :
ID7437
چكيده فارسي :
به فارسي و انگليسي: قابل رويت در نسخه ديجيتالي
چكيده انگليسي :
99 Persian Text Mining Towards Preprocessing and categorization of Persian News Articles Navid sheyaei n sheydaei@ec iut ac ir Date of Submission November 2012 Intelligent Databases Data Mining and Bioinformatics Research Laboratory Isfahan University of Technology Isfahan Iran Degree M sc Language Farsi Supervisor Mohammad H Saraee Saraee@cc iut ac ir Abstract Nowadays amount of information and documentation text is spreading day by day E mails Web pages news text scientific papers and are the only part of increasing information These broad information contain a hidden knowledge Providing a tool that can effectively and efficiently identify extract and manage the vast information and knowledge hidden within it is essential One of the important methods in order to meet this need of users is using text mining techniques such as automatic text classification methods With the help of text classification can assign documents to one or more predefined categories Categorizing news documents web pages E mails and filtering are some of the classification s application Due to importance of the topic and works have been done in this field in other languages it is essential to classifying Persian documents In this thesis different aspect of Persian documents classification has been investigated First preprocessing and stemming problem has been studied and some solutions to improve Persian preprocessing are proposed In the proposed method the morphologilcal structure of Persian language is investigated and by help of search tables try to find same root for similar words and results are stored in form of a list of stemmed word Then an algorithm for text categorization is proposed This algorithm is in the category of association classification algorithms In proposed method frequent items related to the class label will find firstly By this work words that have more semantic significance are found Then instead of examining all items in rules production only frequent items for each class label is considered In finding frequent items a type of apriori algorithm is used that first convert database items in bit form and then by applying logical operations try to find frequent items The proposed algorithm has the ability of offering multi labels for an unknown document So it can also be considered in multi label classification algorithms In order to evaluate the performance of the proposed methods compare them with some known algorithms in each part After evaluating results we got this point that proposed methods have very good performance Keywords Text Mining Stemming PreProcessing Text Classification Multi Label ClassificationPDF created with pdfFactory trial version www pdffactory com
استاد راهنما :
محمد حسين سرايي
استاد داور :
رسول موسوي، عبدالرضا ميرزايي
لينک به اين مدرک :

بازگشت