موذني قهدريجاني، فرشته

عنوان

خلاصه‌سازي متون فارسي مبتني بر گراف با تمركز بر انتخاب ويژگي‌ها

مقطع تحصيلي

كارشناسي ارشد

گرايش تحصيلي

علوم‌داده

محل تحصيل

اصفهان : دانشگاه صنعتي اصفهان

سال دفاع

1402

صفحه شمار

سيزده، 91 ص. : مصور جدول، نمودار

توصيفگر ها

متن‌كاوي , استخراج كلمات‌كليدي , كاهش افزونگي , حفظ موضوعات سند , خلاصه‌سازي متون

تاريخ ورود اطلاعات

1402/12/19

كتابنامه

رشته تحصيلي

مهندسي كامپيوتر

دانشكده

مهندسي برق و كامپيوتر

تاريخ ويرايش اطلاعات

1402/12/21

كد ايرانداك

22997744

چكيده فارسي

در چند سال اخير ميزان داده هاي توليد شده متني با سرعت چشم گيري درحال افزايش هستند. از مهم ترين داليل آن مي توان به تأثير پيشرفت تكنولوژي بر نحوه عملكرد و سرعت منابع توليد محتوا اشاره كرد. استخراج اطالعات مفيد در زمان موردنظر براي انسان به صورت دستي، فعاليتي دشوار و گاهي غيرممكن است. از اين رو كاربران و همچنين متخصصان ناچارند زمان زيادي را به بررسي محتواي متني براي استخراج اطالعات مفيد، اختصاص دهند. هدف اساسي از خالصه سازي متن، كاهش محتوا با كمترين ميزان افزونگي و حفظ و پوشش تمام موضوعات متن است. به منظور غلبه بر اين مسئله، رويكردهاي مختلفي براي خالصه سازي خودكار متن ارائه شده اند. در سال هاي اخير پژوهشهايي در حوزه خالصه سازي متن فارسي انجام شده است. هر كدام سعي كرده اند به نحوي به پوشش و حفظ بيشتر موضوعات متن بپردازند. در اين بين فقط يك پژوهش به حفظ پوشش موضوعات و كاهش افزونگي با استفاده از خوشهبندي پرداخته است. از طرفي بررسي دو چالشِ حفظ و پوشش موضوعات و كاهش افزونگي، با استفاده از يك الگوريتم، باعث مي شود به هر دو چالش به اندازه كافي توجه نشود. در اين پژوهش، سعي شده است با استخراج ويژگي هاي متنوع و نمايش سند با استفاده از هر يك ازآن ها به استخراج اطالعات غني متن پرداخته شود. با توجه به اينكه معموالً ابتدا و انتهاي متن به ترتيب به بيان مسئله و خالصه مطالب متن مي پردازد، بنابراين براي جمالت اين بخشها، امتياز بيشتري درنظر گرفته شده و به موجب اين امر، پوشش بيشتر موضوعات متن انجام شده است. از اهداف ديگر خالصه سازي، كاهش افزونگي يعني تالش براي كمترين تكرار اطالعات در خالصه است. از اين جهت از الگوريتم حداكثر ارتباط حاشيهاي به منظور جلوگيري از حضور جمالت مشابه در خالصه استفاده شده است. مقايسه كارايي و دقت اين سيستم نسبت به ساير كارهاي شاخص اين حوزه، نشاندهنده برتري اين سيستم در خالصه سازي خودكار متون فارسي بوده است. اين سيستم توانسته است مقدار Measure-F معيارهاي -1Rouge، -2Rouge و L-Rouge به ترتيب ،0.5265 0.3240 و 0.4911 را در اين عمليات كسب نمايد. اين سيستم نيز توانسته مقدار measure-F معيار -1Rouge را 0.0545 بهبود دهد.

چكيده انگليسي

In the last few years, the amount of generated textual data is increasing at a remarkable speed. One of the most important reasons is the impact of technological progress on the performance and speed of content production resources. Extracting useful information at the desired time for humans manually is a difficult and sometimes impossible activity. Therefore, users as well as experts have to devote a lot of time to examining textual content to extract useful information. The main goal of summarizing the text is to reduce the content with the least amount of redundancy and to preserve and cover all the topics of the text. In order to overcome this problem, various approaches have been proposed for automatic text summarization. In recent years, researches have been conducted in the field of Persian text summarization. Each of them has tried to somehow cover and preserve most of the topics of the text. Meanwhile, only one research has dealt with maintaining the coverage of topics and reducing redundancy by using clustering. On the other hand, examining the two challenges of maintaining and covering topics and reducing redundancy, using an algorithm, causes both challenges to not be paid enough attention. In this research, it has been tried to extract the rich information of the text by extracting various features and displaying the document using each of them. Considering that the beginning and the end of the text usually deals with the problem statement and the summary of the text content respectively, therefore more points are given for the sentences of these sections and thus, more topics of the text are covered. Another goal of summarizing is to reduce redundancy, which means trying to minimize repetition of information in the summary. For this reason, the maximum marginal relevance algorithm has been used in order to prevent the presence of similar sentences in the summary. Comparing the efficiency and accuracy of this system compared to other prominent works in this field, has shown the superiority of this system in automatic summarization of Persian texts. This system has been able to obtain the value of F-Measure of Rouge-1, Rouge-2 and Rouge-L criteria respectively 0.5265, 0.3240 and 0.4911 in this operation. This system has also been able to improve the F-measure of Rouge-1 by 0.0545

استاد راهنما

ناصر قديري مدرس

استاد مشاور

عليرضا بصيري

استاد داور

حسين فلسفين , زينب زالي

لينک به اين مدرک

https://library.iut.ac.ir/dl/search/default.aspx?Term=19302&Field=0&DTC=107