پديد آورنده :
فرزانه فر، حامد
عنوان :
ارائه روش هايي براي مسئله ريشه يابي در زبان فارسي
مقطع تحصيلي :
كارشناسي ارشد
گرايش تحصيلي :
هوش مصنوعي
محل تحصيل :
اصفهان: دانشگاه صنعتي اصفهان، دانشكده برق و كامپيوتر
صفحه شمار :
ده،95ص.: مصور،جدول،نمودار
يادداشت :
ص.ع.به فارسي و انگليسي
استاد راهنما :
محمد حسين سرايي، محمدرضا احمدزاده
توصيفگر ها :
متن كاوي , رده بندي متون , بازيابي اطلاعات , پيش پردازش
تاريخ نمايه سازي :
18/5/89
دانشكده :
مهندسي برق و كامپيوتر
چكيده فارسي :
به فارسي و انگليسي: قابل رويت در نسخه ديجيتالي
چكيده انگليسي :
Abstract Nowadays improvement of hardware and software tools redounds to ease of vast data storing Day by day the number of text documents is increasing e mail web pages texts news and articles are only part of this range of increasing Hence the need for text mining techniques like automated methods for text classification and information retrieval is felt In text mining problems stemming is one of the most important steps In linguistics stem is a form that unifies the elements in a set of morphologically similar words therefore stemming is the operation which determines the stem of a given word In other words the goal of a stemming algorithm is to reduce variant word forms to a common morphological root called stem There are many common approaches that are used in stemming such as suffix stripping lookup table statistical and hybrid methods Suffix stripping depends on the morphological structure of the language The stem is obtained by removing some morphemes from the one or both sides of the word In lookup table approach each word and its related stem are stored in some kind of structured form Consequently for each stored word we find its stem However the approach needs more space Also for each new word table must be updated manually In statistical methods through a process of inference and based on a corpus rules are formulated regarding word formation This approach does not require any linguistic knowledge whatsoever being totally independent of the morphological structure of the target language Stemming is used for improvement of performance of text mining techniques It is also used for space dimension reduction because the feature space includes tens of thousands of words that will cause the next processes of system be impossible Different methods to stemming have been designed in various languages each with advantages and disadvantages In Persian also some algorithms for stemming have been proposed that have their own advantages and disadvantages but there is not a general method for text mining in Persian that has high performance For improvement of stemming in Persian in this thesis two new methods are presented The first method is based on study of Persian morphological structure The proposed approach is a hybrid method In this method a lookup table and automata are used for finding stems This method is a static method and it lacks flexibility So it has some errors in stemming Second approach also like first stemming method is a hybrid method First step of this method uses a lookup table Second step of this method implemented with decision tree algorithm Since learner methods are dynamic some parts of disadvantages will cover In the end for comparison of performance one of general Persian stemmers has been chosen In this thesis also a complete preprocessing method for Persian documents is proposed For examination of performance text classification with multiple usual algorithms and information retrieval is used Study of results shows that the proposed algorithms have better performance in most cases Also studies reached the conclusion that proposed preprocessing have very well effect on performance of text classifier and information retrieval system Keywords Text Mining Text Classification Information retrieval Stemming Preprocessing
استاد راهنما :
محمد حسين سرايي، محمدرضا احمدزاده