Source :
Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing
Abstract :
Sequential pattern mining is an important data
mining problem with broad application. Most of the
previously developed sequential pattern mining
methods need to scan the database many times. In this
study, STMFP algorithm based on improved FP-tree is
presented for sequential pattern mining. By improving
the FP-tree structure, every node of the tree can store
a set of items instead of one item. After scanning the
sequential database once time, the tree can store all
the sequences. In addition, a novel mining method,
combining nodes from leaf to root which helps mining
sequential patterns, is proposed. The cost of mining
pattern sequence is divided into two parts. One is to
construct STMFP Tree. The cost of this part associates
with the size of sequential database. Another one is to
find random assembled nodes from leaf to root in every
path of STMFP tree. Because the maximal length of
path is bounded by the maximal length of one
transaction, and there are exiting common nodes
which help reduce the number of leaf nodes, so the cost
of this part must be much less than the size of the
database. Compared with other methods which need to
scan the sequential database many times, the cost of
our method must be less than two passes of the
database. Through the whole mining process, it only
needs scan the database once time.
Author/Authors :
Yi Sui , FengJing Shao , RenCheng Sun , JinLong Wang