توصيفگر ها :
يادگيري بدون نظارت , تشخيص ناهنجاري , شبكه عصبي گرافي , فاصله ژئودزيكي , خودرمزنگار
چكيده فارسي :
ﮐﺎﻫﺶ ﺑﻌﺪ، ﯾﮑﯽ ﺍﺯ ﻣﺮﺍﺣﻞ ﺍﺳﺎﺳﯽ ﺩﺭ ﭘﺮﺩﺍﺯﺵ ﺩﺍﺩﻩ ﻭ ﯾﺎﺩﮔﯿﺮﯼ ﺑﺪﻭﻥ ﻧﻈﺎﺭﺕ ﺍﺳﺖ ﮐﻪ ﺩﺭ ﺍﻧﻮﺍﻉ ﻣﺨﺘﻠﻔﯽ ﺍﺯ ﻭﻇﺎﯾﻒ ﺍﺯ ﺟﻤﻠﻪ ﺗﺸﺨﯿﺺ ﻧﺎﻫﻨﺠﺎﺭﯼ ﻭ ﺧﻮﺷﻪﺑﻨﺪﯼ ﻣﻮﺭﺩ ﺍﺳﺘﻔﺎﺩﻩ ﻗﺮﺍﺭ ﻣﯽﮔﯿﺮﺩ. ﻫﺪﻑ ﺍﺯ ﮐﺎﻫﺶ ﺑﻌﺪ، ﮐﺎﻫﺶ ﺗﻌﺪﺍﺩ ﻣﺘﻐﯿﺮﻫﺎ ﯾﺎ ﻭﯾﮋﮔﯽﻫﺎﯼ ﻣﻮﺟﻮﺩ ﺩﺭ ﺩﺍﺩﻩﻫﺎ ﺑﺎ ﺣﻔﻆ ﺍﻃﻼﻋﺎﺕ ﻣﻬﻢ ﻭ ﺍﺻﻠﯽ ﺁﻥﻫﺎ ﺍﺳﺖ. ﺍﯾﻦ ﮐﺎﺭ ﺍﻏﻠﺐ ﺑﻪ ﻣﻨﻈﻮﺭ ﺳﺎﺩﻩﺳﺎﺯﯼ ﻭﭘﯿﺶﭘﺮﺩﺍﺯﺵ ﺩﺍﺩﻩﻫﺎ ﺑﺮﺍﯼ ﻭﻇﺎﯾﻒ ﺑﻌﺪﯼ ﻣﺎﻧﻨﺪ ﺗﺸﺨﯿﺺ ﺍﻟﮕﻮﻫﺎ ﻭ ﺳﺎﺧﺘﺎﺭﻫﺎﯼ ﭘﻨﻬﺎﻥ ﺍﻧﺠﺎﻡ ﻣﯽﺷﻮﺩ. ﺩﺭ ﺩﺍﺩﻩﻫﺎﯾﯽ ﮐﻪ ﺩﺍﺭﺍﯼ ﺍﺑﻌﺎﺩ ﺑﺎﻻﯾﯽ ﻫﺴﺘﻨﺪ ﻭ ﺳﺎﺧﺘﺎﺭ ﻣﻨﯿﻔﻮﻟﺪ ﻏﯿﺮﺧﻄﯽ ﺩﺍﺭﻧﺪ، ﮐﺎﻫﺶ ﺑﻌﺪ ﺑﺎ ﺍﺳﺘﻔﺎﺩﻩ ﺍﺯﻣﺘﺮﯾﮏﻫﺎﯼ ﺧﺎﺹ ﻣﯽﺗﻮﺍﻧﺪ ﺑﻬﯿﻨﻪﺗﺮ ﺍﻧﺠﺎﻡ ﺷﻮﺩ. ﺑﻪﻃﻮﺭ ﺧﺎﺹ، ﺍﺳﺘﻔﺎﺩﻩ ﺍﺯ ﻣﺘﺮﯾﮏﻫﺎﯾﯽ ﻣﺎﻧﻨﺪ ﻓﺎﺻﻠﻪ ﮊﺋﻮﺩﺯﯾﮑﯽ ﻣﻨﺎﺳﺐ ﺍﺳﺖ ﮐﻪ ﻗﺎﺩﺭ ﺑﻪ ﺗﺼﻮﯾﺮﺳﺎﺯﯼ ﺍﺭﺗﺒﺎﻃﺎﺕ ﻏﯿﺮﺧﻄﯽ ﺑﯿﻦ ﺩﺍﺩﻩﻫﺎ ﺩﺭ ﻓﻀﺎﯼ ﮐﻢ ﺑﻌﺪ ﺑﺎﺷﺪ. ﻣﺘﺮﯾﮏﻫﺎﯼ ﮊﺋﻮﺩﺯﯾﮑﯽ ﻣﺎﻧﻨﺪ ISOⅯAP ﻭ ﻣﺘﺮﯾﮏﻫﺎﯼ ﻣﺸﺎﺑﻪ ﺍﺯ ﺍﻟﮕﻮﺭﯾﺘﻢﻫﺎﯾﯽ ﺍﺳﺘﻔﺎﺩﻩ ﻣﯽﮐﻨﻨﺪ ﮐﻪ ﺳﺎﺧﺘﺎﺭ ﻣﻨﯿﻔﻮﻟﺪ ﺩﺍﺩﻩ ﺭﺍ ﺩﺭ ﻓﻀﺎﯼ ﺑﻌﺪﺑﺎﻻ ﮐﺸﻒ ﻣﯽﮐﻨﻨﺪ ﻭ ﺁﻥ ﺭﺍ ﺩﺭ ﻓﻀﺎﯼ ﮐﻢ ﺑﻌﺪ ﺑﺎﺯﻧﻤﺎﯾﯽ ﻣﯽﮐﻨﻨﺪ. ﯾﮑﯽ ﺍﺯ ﺭﻭﺵﻫﺎﯾﯽ ﮐﻪ ﺑﺮﺍﯼ ﺍﯾﻦ ﻣﻨﻈﻮﺭ ﺍﺳﺘﻔﺎﺩﻩ ﻣﯽﺷﻮﺩ، ﺍﻟﮕﻮﺭﯾﺘﻢ ﺩﺭﺧﺖ ﭘﻮﺷﺎﯼ ﮐﻤﯿﻨﻪ (MST) ﺍﺳﺖ. ﺍﯾﻦ ﺍﻟﮕﻮﺭﯾﺘﻢ ﻣﺒﺘﻨﯽ ﺑﺮ ﮔﺮﺍﻑ، ﺑﻪ ﺗﻘﺮﯾﺐ ﺳﺎﺧﺘﺎﺭ ﻣﺤﻠﯽ ﻫﻤﺴﺎﯾﻪﻫﺎ ﮐﻤﮏ ﻣﯽﮐﻨﺪ ﻭ ﺍﺯ ﺁﻥ ﺑﺮﺍﯼ ﺍﯾﺠﺎﺩ ﻓﺎﺻﻠﻪﻫﺎﯾﯽ ﺑﺮﺍﯼ ﺣﻔﻆ ﺳﺎﺧﺘﺎﺭ ﺑﯿﻦ ﻧﻘﺎﻁ ﺍﺳﺘﻔﺎﺩﻩ ﻣﯽﺷﻮﺩ. ﻣﺘﺮﯾﮏ ﻓﺎﺻﻠﻪ ﻣﺒﺘﻨﯽ ﺑﺮ MST، ﺟﺎﯾﮕﺰﯾﻨﯽ ﻣﻨﺎﺳﺐ ﺑﺮﺍﯼ ﻓﺎﺻﻠﻪ ﺍﻗﻠﯿﺪﺳﯽﺍﺳﺖ ﮐﻪ ﻗﺎﺩﺭ ﺑﻪ ﺍﻧﺪﺍﺯﻩﮔﯿﺮﯼ ﻓﻮﺍﺻﻞ ﻏﯿﺮﺧﻄﯽ ﺑﯿﻦ ﻧﻘﺎﻁ ﺍﺳﺖ. ﺑﺎ ﺗﻮﺟﻪ ﺑﻪ ﺍﯾﻦ ﺗﺤﻠﯿﻞﻫﺎ، ﺷﺒﮑﻪﻫﺎﯼ ﻋﺼﺒﯽ ﮔﺮﺍﻓﯽ ﻣﻮﺭﺩ ﺍﺳﺘﻔﺎﺩﻩ ﻗﺮﺍﺭ ﻣﯽﮔﯿﺮﻧﺪ ﺗﺎ ﻫﻤﺴﺎﯾﮕﯽ ﻧﻤﻮﻧﻪﻫﺎ ﺭﺍ ﺩﺭ ﻧﻈﺮﺑﮕﯿﺮﻧﺪ ﻭ ﺳﭙﺲ ﺍﺯ ﯾﮏ ﺧﻮﺩﺭﻣﺰﻧﮕﺎﺭ ﻣﻨﻈﻢ ﮔﺮﺍﻓﯽ (GAE) ﺑﺮﺍﯼ ﺗﺸﺨﯿﺺ ﺍﻟﮕﻮﻫﺎﯼ ﻧﺎﻫﻨﺠﺎﺭﯼ ﻭ ﻃﺒﻘﻪﺑﻨﺪﯼ ﺩﺍﺩﻩﻫﺎ ﺍﺳﺘﻔﺎﺩﻩ ﻣﯽﺷﻮﺩ. ﺷﺒﮑﻪﻫﺎﯼ ﻋﺼﺒﯽ ﮔﺮﺍﻓﯽ ﻗﺎﺩﺭ ﺑﻪ ﻣﺪﻝﺳﺎﺯﯼ ﻭ ﺗﺤﻠﯿﻞ ﺍﺭﺗﺒﺎﻃﺎﺕ ﭘﯿﭽﯿﺪﻩ ﻭ ﻏﯿﺮﺧﻄﯽ ﺑﯿﻦ ﻧﻘﺎﻁ ﻫﺴﺘﻨﺪ ﻭ ﺍﺯ ﺍﯾﻦ ﺭﻭ، ﻣﻨﺎﺳﺐ ﺑﺮﺍﯼ ﻭﻇﺎﯾﻔﯽ ﻫﻤﭽﻮﻥ ﺗﺸﺨﯿﺺ ﻧﺎﻫﻨﺠﺎﺭﯼ ﻭ ﺗﺸﺨﯿﺺ ﺍﻟﮕﻮﻫﺎ ﺩﺭ ﺩﺍﺩﻩﻫﺎﯼ ﺑﺎ ﺍﺑﻌﺎﺩﺑﺎﻻ ﻭ ﺳﺎﺧﺘﺎﺭ ﻣﻨﯿﻔﻮﻟﺪ ﻏﯿﺮﺧﻄﯽ ﻣﯽﺑﺎﺷﻨﺪ. ﺑﺎ ﺗﻮﺟﻪ ﺑﻪ ﻣﻔﺎﻫﯿﻢ ﻓﻮﻕ، ﺍﺳﺘﻔﺎﺩﻩ ﺍﺯ ﺷﺒﮑﻪﻫﺎﯼ ﻋﺼﺒﯽ ﮔﺮﺍﻓﯽ ﺑﻪﻋﻨﻮﺍﻥ ﯾﮑﯽ ﺍﺯ ﺍﺑﺰﺍﺭﻫﺎﯼ ﺍﺻﻠﯽ ﺩﺭ ﮐﺎﻫﺶ ﺑﻌﺪ ﻭﺗﺤﻠﯿﻞ ﺩﺍﺩﻩﻫﺎ ﺑﺎ ﺍﺑﻌﺎﺩ ﺑﺎﻻ ﻭ ﺳﺎﺧﺘﺎﺭ ﻣﻨﯿﻔﻮﻟﺪ ﻏﯿﺮﺧﻄﯽ، ﻣﻮﺭﺩ ﺗﻮﺟﻪ ﻭﺍﻗﻊ ﻣﯽﺷﻮﺩ. ﺍﯾﻦ ﺭﻭﺵﻫﺎ ﻗﺎﺩﺭ ﺑﻪ ﺍﺳﺘﺨﺮﺍﺝﻭﯾﮋﮔﯽﻫﺎﯼ ﻣﻬﻢ ﺍﺯ ﺩﺍﺩﻩﻫﺎ ﻫﺴﺘﻨﺪ ﻭ ﺍﻣﮑﺎﻥ ﺗﻔﮑﯿﮏ ﺍﻟﮕﻮﻫﺎ ﻭ ﺳﺎﺧﺘﺎﺭﻫﺎﯼ ﭘﯿﭽﯿﺪﻩ ﺭﺍ ﻓﺮﺍﻫﻢ ﻣﯽﮐﻨﻨﺪ. ﺍﺯ ﻃﺮﻑ ﺩﯾﮕﺮ، ﺍﺳﺘﻔﺎﺩﻩ ﺍﺯ ﺧﻮﺩﺭﻣﺰﻧﮕﺎﺭﯼ ﻣﻨﻈﻢ ﮔﺮﺍﻓﯽ ﺑﻪ ﻋﻨﻮﺍﻥ ﯾﮏ ﻣﮑﺎﻧﯿﺰﻡ ﺑﺮﺍﯼ ﺗﺸﺨﯿﺺ ﻧﺎﻫﻨﺠﺎﺭﯼ ﻭ ﺗﻤﺎﯾﺰ ﺑﯿﻦ ﺍﻟﮕﻮﻫﺎﯼ ﻣﺨﺘﻠﻒ ﺩﺭ ﺩﺍﺩﻩﻫﺎ ﺍﺯ ﺍﻫﻤﯿﺖ ﺑﺎﻻﯾﯽ ﺑﺮﺧﻮﺭﺩﺍﺭ ﺍﺳﺖ. ﺍﯾﻦ ﺭﻭﺵﻫﺎ ﻣﯽﺗﻮﺍﻧﻨﺪ ﺑﻪ ﻋﻨﻮﺍﻥ ﺍﺑﺰﺍﺭﻫﺎﯼ ﻗﺪﺭﺗﻤﻨﺪﯼ ﺑﺮﺍﯼ ﭘﯿﺶﭘﺮﺩﺍﺯﺵ ﻭ ﺗﺤﻠﯿﻞ ﺩﺍﺩﻩﻫﺎﯼ ﭘﯿﭽﯿﺪﻩ ﻭ ﺑﺰﺭﮒ ﺑﺎ ﺍﺑﻌﺎﺩ ﺑﺎﻻ ﻭ ﺳﺎﺧﺘﺎﺭ ﻣﻨﯿﻔﻮﻟﺪ ﻏﯿﺮﺧﻄﯽ ﻣﻮﺭﺩ ﺍﺳﺘﻔﺎﺩﻩ ﻗﺮﺍﺭ ﮔﯿﺮﻧﺪ ﻭ ﺩﺭ ﺗﺼﻤﯿﻢﮔﯿﺮﯼﻫﺎﯼ ﻣﺨﺘﻠﻔﯽ ﻣﺎﻧﻨﺪ ﺗﺸﺨﯿﺺ ﺍﻟﮕﻮﻫﺎ ﻭ ﺍﻟﮕﻮﻫﺎﯼ ﻧﺎﻫﻨﺠﺎﺭ ﻭ ﻫﻤﭽﻨﯿﻦ ﺩﺭ ﻣﺴﺎﺋﻞ ﭘﯿﺶﺑﯿﻨﯽ ﻭﺗﺼﻤﯿﻢﮔﯿﺮﯼ ﺑﻬﺒﻮﺩ ﻣﻌﻨﺎﺩﺍﺭﯼ ﺍﯾﺠﺎﺩ ﮐﻨﻨﺪ. ﺩﺭ ﺍﯾﻦ ﭘﺎﯾﺎﻥﻧﺎﻣﻪ، ﺍﺯ ﯾﮏ ﺧﻮﺩﺭﻣﺰﻧﮕﺎﺭ ﺍﺳﺘﻔﺎﺩﻩ ﮐﺮﺩﯾﻢ ﮐﻪ ﻻﯾﻪﻫﺎﯼ ﺁﻥ ﻫﻢ ﺷﺒﮑﻪﻫﺎﯼ ﻋﺼﺒﯽ ﮔﺮﺍﻓﯽ ﻭ ﻫﻢ ﻻﯾﻪﻫﺎﯼﺳﺎﺩﻩ ﻫﺴﺘﻨﺪ. ﺗﺎﺑﻊ ﻫﺰﯾﻨﻪ ﺍﯾﻦ ﻣﺪﻝ ﺍﺯ ﺩﻭ ﻗﺴﻤﺖ ﺗﺸﮑﯿﻞ ﺷﺪﻩ ﺍﺳﺖ: ﻗﺴﻤﺖ ﺍﻭﻝ ﺑﺎ ﻫﺪﻑ ﺣﻔﻆ ﺳﺎﺧﺘﺎﺭ ﺩﺭ ﺑﻌﺪﭘﺎﯾﯿﻦ ﻭ ﻗﺴﻤﺖ ﺩﻭﻡ ﺑﺎ ﻫﺪﻑ ﮐﻤﯿﻨﻪ ﮐﺮﺩﻥ ﺍﺧﺘﻼﻑ ﻣﺮﺑﻊ ﻭﺭﻭﺩﯼ ﻭ ﺧﺮﻭﺟﯽ ﺧﻮﺩﺭﻣﺰﻧﮕﺎﺭ ﺍﺳﺖ. ﭘﺲ ﺍﺯ ﺁﻣﻮﺯﺵ، ﺍﺧﺘﻼﻑﻫﺎﯼ ﻣﺮﺑﻌﯽ ﺑﯿﻦ ﻭﺭﻭﺩﯼ ﻭ ﺧﺮﻭﺟﯽﻫﺎ ﺭﺍ ﻣﺮﺗﺐ ﮐﺮﺩﻩ ﻭ n ﻣﻘﺪﺍﺭ ﺍﻭﻝ ﺭﺍ ﺑﻪ ﻋﻨﻮﺍﻥ ﻧﺎﻫﻨﺠﺎﺭﯼ ﺩﺭ ﻧﻈﺮ ﮔﺮﻓﺘﯿﻢ. ﺍﯾﻦ ﺭﻭﯾﮑﺮﺩ ﺑﺎ ﺗﺮﮐﯿﺐ ﺷﺒﮑﻪﻫﺎﯼ ﻋﺼﺒﯽ ﮔﺮﺍﻓﯽ ﻭ ﺧﻮﺩﺭﻣﺰﻧﮕﺎﺭ ﺑﺎ ﺗﺎﺑﻊ ﻫﺰﯾﻨﻪ ﻣﺒﺘﻨﯽ ﺑﺮ ﮊﺋﻮﺩﺯﯾﮑﯽ، ﺑﻪ ﺑﻬﺒﻮﺩ ﺗﺸﺨﯿﺺﻧﺎﻫﻨﺠﺎﺭﯼ ﺩﺭ ﺩﺍﺩﻩﻫﺎ ﮐﻤﮏ ﻣﯽﮐﻨﺪ.
چكيده انگليسي :
Dimensionality reduction is a crucial step in data processing and unsupervised learning, finding widespread applications in tasks like anomaly detection and clustering. The primary goal is to reduce the number of variables or features in the data while retaining essential information, simplifying and preprocessing data for subsequent tasks such as recognizing patterns and hidden structures. In dealing with high-dimensional data characterized by nonlinear manifold structures, the utilization of specific metrics becomes imperative. Metrics like geodesic distance are well-suited for visualizing nonlinear relationships be- tween data points in a low-dimensional space. Geodesic distance metrics and similar approaches employ algorithms, such as ISO-MAP, to discover the manifold structure of data in a higher-dimensional space and represent it in a lower-dimensional space. The Minimum Spanning Tree (MST) algorithm stands out as one effective method, utilizing a graph-based approach to approximate local neighborhood structures. MST aids in creating suitable replacements for Euclidean distances, preserving the structure between points. Distance metrics based on geodesic distances further enhance the ability to measure nonlinear distances between points. In the context of these analyses, Graph Neural Networks (GNNs) emerge as powerful tools for considering the proximity of sample neighbors in anomaly detection and data classification, particularly through techniques like Graph Autoencoders (GAE). The incorporation of regularized graph autoencoders enhances their effectiveness. Graph Neural Networks excel at modeling and analyzing complex and nonlinear relationships between points, making them highly suitable for tasks such as anomaly detection and pattern recognition in high-dimensional and nonlinear manifold- structured data. Against this backdrop, the significance of employing Graph Neural Networks as a primary tool in dimensionality reduction and data analysis for high dimensions and nonlinear manifold structures becomes evident. These methods excel in extracting crucial features from data, facilitating the separation of complex patterns and structures. Moreover, the application of regularized graph autoencoders as a mechanism for anomaly detection and pattern distinction is of paramount importance. Such methods serve as robust tools for preprocessing and analyzing complex and large data with high dimensions and nonlinear manifold structures, significantly enhancing various tasks like pattern and anomaly detection, prediction, and decision-making. The work conducted for this thesis involved the utilization of a self-encoder with layers comprising both graph neural networks and simple layers. The loss function of the model featured two components: the first preserving structure in the lower dimension, and the second minimizing the squared difference between input and output of the autoencoder. Post-training, anomalies were detected by sorting the squared differences, considering the top n as anomalies. This innovative approach effectively detects anomalies through a combination of graph neural networks and a self-encoder with a geodesic-based loss function. In conclusion, the fusion of graph neural networks, geodesic distance metrics, and regularized graph autoencoders presents a robust framework for tackling dimensionality reduction and analyzing high-dimensional, nonlinear manifold- structured data. This integrated approach offers a powerful solution for tasks such as anomaly detection, pattern recognition, and decision-making in complex datasets, laying the foundation for advancements in data processing and unsupervised learning.