توصيفگر ها :
داده كاوي , زيرگراف كاوي , زيرگراف مكرر , رابط , مجموعه اقلام
چكيده فارسي :
ﯽﻃ ﺳﺎل ﻫﺎي ﮔﺬﺷﺘﻪ داده ﮐﺎوي، ﯾﻌﯽﻨ ﻓﺮآﯾﺪﻨ ﯾﺎﻓﺘﻦ و اﺳﺘﺨﺮاج اﻃﻼﻋتﺎ ﻣﻔﺪﯿ و اﻟﮕﻮﻫﺎي ﺟﺎﺐﻟ زا ﺣﺠﻢ
اﻧﺒﻮﯽﻫ زا داده ﻫﺎ، ﻪﺑ ﻋﻨﻮان ﮏﯾ زﻣﯿﻨﮥ ﺗﺤﻘﯿﻖ و ﺗﻮﺳﻌﻪ دﻫﻨﺪۀ ﺑﺮﻧﺎﻣﻪ ﻫﺎي ﮐﺎرﺑﺮدي رد زﻧﺪﮔﯽ واﻗﻌ،ﯽ ﻣﻮدر ﺗﻮﻪﺟ
.ﻗﺮرا ﮔﺮﻓﺘﻪ اﺳ.ﺖ رد ﻋﻠﻢ داهد ﺑﯿﺸﺘﺮ داده ﻫﺎﯽﯾ ﻪﮐ ﺎﺑ آﻧﺎﻬ ﺳﺮوﮐرﺎ دارﯾﻢ ﻣﯽ ﺗﻮاﻧﺪﻨ رد ﻗﺎﺐﻟ ﮏﯾ ﮔﺮفا ﻣﺪل ﺷﻮﺪﻧ
ﮔﺮاف ﺎﻫ رد ﻣﺪل ﺳﺎيز و ﻧﻤﺎﯾﺶ اﻃﻼﻋتﺎ ﺑﺴﯿﺎر ﻗﺎﻞﺑ اﻫﻤﯿﺖ ﻫﺴﺘﻨ.ﺪ ﮏﯾ ﮔﺮفا ﻪﺑ ﻋﻨﻮان ﮏﯾ ﺳﺎﺧﺘرﺎ ﮐﻠﯽ
زا داده ﻫﺎ، ﻣﯽ ﺗﻮاﻧﺪ ﺑﺮيا ﻣﺪل ﺳﺎيز ﺑﺴﯿﺎير زا رواﺑﻂ ﭘﯿﭽﯿهﺪ رد ﺑﯿﻦ داده ﺎﻫ اﺳﺘﻔﺎده ﺷﻮ.د ﮐﺎرﺑدﺮ اﺻﯽﻠ ﻧﻈﺮﯾﮥ
ﮔﺮفا رد داده ﮐﺎوي، ﮔﺮاف ﮐﺎيو اﺳ.ﺖ ﯾﮑﯽ زا ﻣﻬﻤﺘﺮﯾﻦ ﻣﻔﺎﻫﻢﯿ رد ﮔﺮاف ﮐﺎيو ﭘﯿاﺪ ﮐﺮند زﯾﺮﮔﺮاف ﻫﺎي ﺗﮑﺮاير ﺎﯾ
.ﻣﮑرﺮ اﺳ.ﺖ اﯾﻦ ﮔﺮاف ﺎﻫ ار ﻣﯽ ﺗﻮنا ﺎﺑ اﺳﺘﻔﺎده زا اﻟﮕﻮرﯾﺘﻢ ﺎﻫ و روش ﻫﺎي ﻣﻮﺟﻮد رد ﻧﻈﺮﯾﮥ ﮔﺮاف، اﺳﺘﺨﺮاج ﮐﺮد
ﻣﻬﻤﺘﺮﯾﻦ ﻣﺰﺖﯾ اﺳﺘﻔﺎده زا زﯾﺮﮔﺮاف ﺎﻫ ﺳﺮﺖﻋ ﺑﺨﺸﯿنﺪ ﻪﺑ ﺟﺴﺘﺠﻮ و ﯾﺎﻓﺘﻦ ﺷﺒﺎﻫﺖ ﺎﻫ و ﻣﺸﺨﺼتﺎ ﮔﺮاف ﺎﻫ و
.ﻃﺒﻘﻪ ﺑﻨيﺪ آﻧﺎﻬ اﺳﺖ
اﻟﮕﻮرﯾﺘﻢ ﻫﺎي داده ﮐﺎيو رد ﮔﺮفا ﻪﺳ روﯾﮑدﺮ ﯾﺎدﮔﯿﺮي ﻣﺨﺘﻠﻒ ار دﻧﺒﺎل ﻣﯽ ﮐﻨ.ﺪ ﯾﺎدﮔﯿﺮي ﺎﺑ ﻧﻈﺎر،ت ﻪﮐ ﺎﺑ
ﻣﺠﻤﻮﻋﻪ ﻫﺎﯽﯾ زا ﻣﺜﺎل ﺎﻫ ﺎﺑ ﺑﺮﭼﺴﺐ ﻣﺸﺺﺨ ﮐﺎر ﻣﯽ ﮐﻨ.ﺪ )ﺑﺮﭼﺴﺐ ﺎﻫ ﻣﯽ ﺗﻮاﻧﺪﻨ ارشز اﺳﯽﻤ رد ﺣﺎﺖﻟ ﻃﺒﻘﻪ ﺑﻨيﺪ
و ارشز ﻋﺪيد رد ﺣﺎﺖﻟ رﮔﺮﺳﯿﻮن داﺷﺘﻪ ﺑﺎﺷﺪ.( ﯾﺎدﮔﯿﺮي ﺑﺪنو ﻧﻈﺎر،ت ﻪﮐ ﺑﺮﭼﺴﺐ ﻫﺎي ﻧﻤﻮﻧﻪ ﺎﻫ رد ﻣﺠﻤﻮﮥﻋ
.داهد ﻧﺎﻣﺸﺺﺨ اﺳ؛ﺖ و اﻟﮕﻮرﯾﻢﺘ ﺗﻼش ﻣﯽ ﮐﻨﺪ ﻪﮐ ﻧﻤﻮﻧﻪ ﺎﻫ ار ﺮﺑ اﺳسﺎ ﺷﺒﺎﻫﺖ وﯾﮋﮔﯽ ﻫﺎﯾﺸنﺎ ﮔﺮوه ﺑﻨيﺪ ﮐﻨﺪ
رد ﻧﻬﺎﯾﺖ ﯾﺎدﮔﯿﺮي ﻧﯿﻤﻪ ﻧﻈﺎرﺗﯽ، زﻣﺎﻧﯽ اﺳﺘﻔﺎده ﻣﯽ ﺷﻮد ﻪﮐ زﯾﺮﻣﺠﻤﻮﻋﻪ ﻫﺎي ﮐﻮﭼﮑﯽ زا ﻧﻤﻮﻧﻪ ﻫﺎي ﺑﺮﭼﺴﺐ دار ﺎﺑ
.ﺗﻌﺪاد زﯾﺎدي زا ﻧﻤﻮﻧﻪ ﻫﺎي ﺑﺪنو ﺑﺮﭼﺴﺐ ﻣﻮﺟﻮد ﺑﺎﺪﺷ
ﮏﯾ ﺗﺎﻊﺑ اﺳ.ﺖ ﺑﺮيا 2 I ﻣﺠﻤﻮﻪﻋ اﻗمﻼ و I ﮔ،ﺮفا ﺑﺮيا ﺳﻪ ﺗﺎﯽﯾ
ار راﻂﺑ ﻣﯽ ﻧﺎﻣﯿﻢ اﮔﺮ )(1 زﯾﺮﮔﺮاف اﻟﻘﺎﯽﯾ ﮏﯾ زﯾﺮﻣﺠﻤﻮﻋﮥ رﺋسﻮ
. ﺎﯾ ﻧﺎﻫﻤﺒﻨﺪ ﺑﺎﺪﺷ ﺎﯾ ﻫﻤﺒﻨﺪ ﺑﺎﺪﺷ )(2 ﺑﺮيا
ﺗﻤﺮﮐﺰ اﺻﯽﻠ اﯾﻦ ﭘﺎﯾﺎن ﻧﺎﻣﻪ، ﺑﯿنﺎ اﻟﮕﻮرﯾﺘﻢ ﻫﺎﯽﯾ ﺑﺮيا ﯾﺎﻓﺘﻦ راﺑﻂ ﻫﺎﺖﺳ ﻪﮐ رد نآ زا زﯾﺮﮔﺮاف ﻣﮑرﺮ اﺳﺘﻔﺎده ﺷﺪه است.
چكيده انگليسي :
Dataminingiscomprisedofmanydataanalysistechniques. Itsbasicobjectiveistodiscoverthehiddenandusefuldata
patternfromverylargesetofdata. Graphmining,whichhasgainedmuchattentioninthelastfewdecades,isoneofthe
novelapproachesforminingthedatasetrepresentedbygraphstructure. Graphminingfindsitsapplicationsinvarious
problem domains, including: bioinformatics, chemical reactions, Program flow structures, computer networks, social
networks etc. Different data mining approaches are used for mining the graph based data and performing useful
analysis on these mined data.
Graph mining techniques have been categorized into following groups. (1) Graph clustering; is the task of grouping
the vertices of the graph into clusters taking into consideration the edge structure of the graph in such a way that there
should be many edges within each cluster and relatively few between the clusters? Graph clustering in the sense of
grouping thevertices of a given input graph into clusters graph clustering is based on unsupervised learning technique
in which the classes are not known in prior to clustering. The graph clusters are formed based on some similarities
in the underlying graph structured data graph. (2) Graph Classification; in graph classification the main task is to
classify separate, individual graphs in a graph database into two or more categories/classes. Classification is based
on supervised/semi supervised learning technique in which the classes of the data are defined in prior. (3) Sub graph
mining; sub graph is a graph whose vertices and edges are subsets of another graph. The frequent sub graph mining
problem is to produce the set of sub graphs occurring in at least some given threshold of the given n input example
graphs.
In this thesis,is considered the graph mining problem of enumerating what we call connectors. Suppose that given a
data set (G,I,σ) that consists of a graph , an item set I , and a function
we define . A vertex subset X is called a connector if (i) the subgraph G[X] induced from
G by X is connected; and (ii) for any ] is disconnected or .
To enumerate all connectors,it has been proposed a novel algorithm named COOMA (components overlaid mining
algorithm). The algorithm mines connectors by “overlaying” an already discovered connector on a certain subgraph
of G iteratively. By overlaying, it mean taking an intersection between the connector and connected components
of a certain induced subgraph. Interestingly, COOMA is a total-polynomial time algorithm, i.e., the running time is
polynomially bounded with respect to the input and output size.