Evaluation of Dynamic Clustering Algorithms

This page offers supplementary information and data for our work concerning evaluation of design and analysis of algorithms for clustering dynamic graphs.

The technical report on our studies can be found here: Technical Report Archives

A student research project has been conducted as a basis for this work, the elaboration of which can be found here: Studienarbeit Christian Staudt

Instances

All instances we used (except the email network) are contained in the large download further below. In the following we explain them.

Dynamic Clustered Random Graphs

The Generator

At this site our generator for dynamic clustered random graphs can be downloaded, alongside a technical report which acts as both a user's manual and a technical description.

Instances We Used

We used many different setups for the generator, resulting in many different graphs, the two most prominent ones used for illustrations are as follows:

G1 (Figures 5 and 6) was created with the following non-standard parameters: t=2000, n=1000, k=20, eta=10, p_omega=0.001, pin=0.1, pout=0.005. The actual instance is here.

GR01 (Figure 9) was created with the following non-standard parameters: t=10k, n=1k, k=20, pin=0.1, pout=0.002, chi=0.9, nu=0.55, beta=0.5, omega=0.001. The actual instance is here.

Dynamic arXiv Collaboration Graphs

The Collecting Spider

The arXiv API provides an interface to the arXiv database, returning query results in the format of an Atom XML feed. Our arXiv spider is a simple Python program which automates querying the database and parsing the results.

Download: arxivspider.zip

License: GPL

The Scheduler

Our tools for extracting a stream of graph events from arXiv dumps are available here. Note that for outputting intermediate graphs (can be enabled in the source code), the yfiles graph libraries are required, otherwise the corresponding functions will need to be removed from the source files.

Download: scheduler.zip

License: GPL

Dynamic email Correspondence Graphs

These instances are still subject to data privacy issues. But we are confident that soon we will be able to make available an anonymized version of our dynamic email data.

Clustering Algorithms

The source code for our actual evaluation can be obtained here, however, for actual usage, it requires the yfiles libraries. Since this is a (very good) commercial product which we license, we cannot include it for download. The full and a trial version can be obtained from yWorks - The Diagramming Company. Moreover, our evaluation frontend relies on the (commercial) software Mathematica.

Download: dynclust.zip

License: GPL

The above zip-file contains the following:

  • a framework for handling, measuring and operating on clusterings of (yFiles-)graphs (directories base, extension)
  • implementations of a number of clustering algorithms, both static and dynamic, including all algorithms used in submissions (directory dynamicClustering)
  • modules for most of the abovementioned algorithms, which can be plugged into yEd (part of the above)
  • the mathematica source-files which we used for the evaluation (directory dynamicClustering)
  • the source code of the generator for randomized preclustered dynamic graphs (directory FailSafeBranchDynamicGraphGenerator)

For having things built, apart from obtaining the yFiles library, you will have to adjust project properties and build paths in the above, and you will have to enter the Mathematica source files DynamicClustering2.nb and EvaluationFrontend2.nb, to adjust the variable $HOME which holds the above java files. This is necessary for Mathematica to find the Java classes it calls.

Once all is set up, you should compile and run src/de/uka/algo/dynclustering/test/FrontendTest2.java, which ensures that all is compiled and ready for Mathematica to use. If this runs without errors, you can load DynamicClustering2.nb into Mathematica, evaluate initialization cells, choose an input graph (either created by the generator or as a list file with a structure like the above arXiv files hav) and a batch size and hit the button which starts a new evaluation. Then start selecting participating algorithms and measures for your evaluation, and finally hit the run evaluation button. Things may then take a while, but you can observe the progress in your console. Finally, evaluate the cell PlotAll, below the selection mask, und your results are nicely plotted.

Additional Downloads

The following files contain all the material, both inputs and outputs, we used in our evaluations:

  • our generated input instances and Mathematica notebooks with results on them and on the email-graph: Download
  • arxiv graphs, snapshots and Mathematica notebooks with results on them: Download