Apriori Sets And Sequences - Keith's MS Thesis
Apriori Sets And Sequences
About
Code
Performance Data Collection
·
Data Sets
Documents
Results
·
                                          
Printer Friendly Version
performance-improvements

Intro ] [ check-subsets ] [ post-candidate-generation-garbage-collection ] [ prune-candidate-generation ] [ remove-candidates ] [ template ]

Itemset Prefix Tree ] [ Previous Duplicate Hash ] [ Up: Results ]

Description of feature:

Attempts to recover memory used during candidate generation.

After candidate generation finishes an explicit call is made to
conduct garbage collection. 

This is usually not recommended. Given
the large data structures that could be used during candidate
generation and all the candidates that might get generated but later
removed because of the Apriori Prune it seemed a worth while idea to
ensure garbage collection was done before support counting. In
practice about 5% of the current memory in use was recovered and had
little effect on mining. This often increased the time it took to mine
since garbage collection can be slow.

History of this feature:

Since the introduction of event items into Apriori memory usage has
been a problem. Apriori Sets and Sequences often exhausts the memory
resources on the machine it is running on. This behavior depends
on the data set being mined but it is prohibitive to running the system
with any significant amount of data. The system would simply run out
of memory before mining was done. There have been many improvements
since this was originally implemented.

Testing this feature:

There are no plans to test this feature. The default for this feature
is off. It remains a compile time option in case someone gets curious
about it.

by: Keith A. Pray
Last Modified: July 4, 2004 8:03 AM
© 2004 - 1975 Keith A. Pray.
All rights reserved.

Current Theme: