صادقي، الهام

عنوان

طراحي و پياده سازي شتاب‌دهنده سخت‌افزاري مبتني بر مدل YOLOv3-tiny براي تشخيص اشيا به صورت بلادرنگ

مقطع تحصيلي

كارشناسي ارشد

گرايش تحصيلي

مدار مجتمع الكترونيك

محل تحصيل

اصفهان : دانشگاه صنعتي اصفهان

سال دفاع

1404

صفحه شمار

سيزده، 89ص. :مصور، جدول، نمودار

توصيفگر ها

تشخيص اشياء , FPGA , YOLOv3-tiny , شتاب‌دهنده سخت‌افزاري , پردازش بلادرنگ , هوش مصنوعي لبه

تاريخ ورود اطلاعات

1405/01/18

كتابنامه

رشته تحصيلي

مهندسي برق

دانشكده

مهندسي برق و كامپيوتر

تاريخ ويرايش اطلاعات

1405/01/19

كد ايرانداك

23202532

چكيده فارسي

با گسترش كاربردهاي بينايي ماشين در سامانه‌هاي بلادرنگ و مبتني بر لبه، نياز به پياده‌سازي كارآمد الگوريتم‌هاي تشخيص اشياء با مصرف توان كم و تأخير پايين، بيش از پيش احساس مي‌شود. شبكه‌هاي عميق تشخيص اشياء، علي‌رغم دقت بالا، به دليل پيچيدگي محاسباتي و نياز به منابع پردازشي سنگين، اجراي مستقيم بر روي پردازنده‌هاي عمومي را براي بسياري از كاربردهاي بلادرنگ با محدوديت مواجه مي‌كنند. در اين راستا، استفاده از شتاب‌دهنده‌هاي سخت‌افزاري مبتني بر FPGA به ‌عنوان راهكاري انعطاف‌پذير و كم‌مصرف مورد توجه قرار گرفته است. در اين پايان‌نامه، يك سامانه‌ي تشخيص بلادرنگ اشياء مبتني بر شبكه‌ي YOLOv3-tiny به صورت سخت افزاري طراحي شده و بر روي FPGA پياده‌سازي شده است. ابتدا ساختار شبكه YOLOv3-tiny به‌صورت دقيق تحليل شد و به صورت كاملا نرم افزاري پياده‌سازي گرديد. با شبيه‌سازي مدل نرم‌افزاري، گلوگاه‌هاي محاسباتي شبكه شناسايي شد. براي بهبود كارايي شبكه ايده‌هايي نظير كوانتيزاسيون داده‌ها و وزن‌ها، موازي‌سازي عمليات كانولوشن، طراحي ماژولار و بهينه‌سازي دسترسي به حافظه ارائه شد و اثربخشي آن¬ها با استفاده از مدل نرم‌افزاري پياده‌سازي شده مورد ارزيابي قرار گرفت. پس از حصول طرح مناسب بر اساس شبيه‌سازي‌هاي نرم افزاري، يك معماري سخت‌افزاري سفارشي براي شتاب‌دهي شبكه طراحي شد. در اين مرحله نيز راهكارهايي براي غلبه بر محدوديت منابع در دسترس ارائه گرديد. شتاب‌دهنده سخت افزاري بر روي برد Xilinx Kria KV260 پياده‌سازي شد و عملكرد اجزاي مختلف آن شامل واحدهاي ورودي داده، فيلتر، پردازش و خروجي به‌صورت كامل مورد ارزيابي قرار گرفت. اعتبارسنجي عملكرد شتاب‌دهنده انجام شد و معيارهايي نظير دقت تشخيص، نرخ فريم، مصرف منابع سخت‌افزاري و توان مصرفي مورد بررسي قرار گرفتند. نتايج حاصل نشان داد كه معماري پيشنهادي با دستيابي به سرعت پردازش 66 فريم بر ثانيه و توان مصرفي حدود 3٫4 وات، ضمن حفظ دقت مناسب، عملكرد بهتري نسبت به پژوهش¬هاي مشابه ارائه مي‌دهد. همچنين ساختار ماژولار و انعطاف‌پذير معماري پيشنهادي، امكان توسعه و به‌كارگيري آن را براي ساير شبكه‌هاي مبتني بر كانولوشن و كاربردهاي بلادرنگ لبه‌محور فراهم مي‌سازد.

چكيده انگليسي

With the rapid growth of computer vision applications in real-time an‎d edge-based systems, the need for efficient implementation of object detection algorithms with low power consumption an‎d low latency has become increasingly important. Despite their high accuracy, deep object detection networks face significant challenges when deployed directly on general-purpose processors due to their high computational complexity an‎d substantial processing resource requirements. In this context, FPGA-based hardware accelerators have attracted considerable attention as a flexible an‎d energy-efficient solution. In this thesis, a real-time object detection system based on the YOLOv3-tiny network is designed an‎d implemented as a hardware accelerator on an FPGA platform. First, the structure of the YOLOv3-tiny network is thoroughly analyzed an‎d fully implemented in software. By simulating the software model, the computational bottlenecks of the network are identified. To improve network efficiency, techniques such as data an‎d weight quantization, convolution operation parallelization, modular design, an‎d memory access optimization are proposed, an‎d their effectiveness is eva‎luated using the developed software model. After obtaining a suitable design based on software simulations, a custom hardware architecture is developed to accelerate the network. At this stage, additional strategies are introduced to overcome the limitations of available hardware resources. The proposed hardware accelerator is implemented on the Xilinx Kria KV260 board, an‎d the performance of its main components, including data input, filtering, processing, an‎d output units, is thoroughly eva‎luated. The accelerator is validated using stan‎dard eva‎luation metrics, including detection accuracy, frame rate, hardware resource utilization, an‎d power consumption. Experimental results demonstrate that the proposed architecture achieves a processing speed of 66 frames per second with a power consumption of approximately 3.4 W, while maintaining acceptable detection accuracy, an‎d delivers superior performance compared to similar works reported in the literature. Moreover, the modular an‎d flexible structure of the proposed architecture enables its extension an‎d adaptation to other convolutional neural networks an‎d real-time edge-oriented applications.

استاد راهنما

وحيد غفاري نيا , حسين نيك ائين

استاد داور

حسين فرزانه فرد , مسعود سيدي

لينک به اين مدرک

https://library.iut.ac.ir/dl/search/default.aspx?Term=20962&Field=0&DTC=107