Institut für Theoretische Informatik, Algorithmik

Skeleton-based Clustering in Big and Streaming Social Networks

Project Members

Prof. Dr. Ulrik Brandes (Universität Konstanz), Michael Hamann, M.Sc. (KIT), Mark Ortmann, M.Sc. (Universität Konstanz), Prof. Dr. Dorothea Wagner (KIT)

Funding

Subproject of the Priority Programme 1736 Algorithms for Big Data of the Deutsche Forschungsgemeinschaft (German Research Foundation).

Summary

We intend to devise novel methods to cluster large-scale static and dynamic online social networks. Our approach is based on skeleton structures that represent and amplify variation in local cohesion, and that are defined locally to facilitate efficient computation. In addition to simplifying the clustering problem, they shall also provide a novel understanding of community dynamics, capturing more directly the agency of social actors. Finding patterns in online social relationships and interactions is one of the most prominent applications of big data today, largely driven by the relative ease of data collection and its economic and political value.

While this process-generated data is massive, it stems from the actions of individuals with bounded knowledge of the entire system. Our working hypothesis is therefore that local structures capture major trends to an extent sufficient to inform algorithms for partitioning or overlapping clusters. Moreover, we expect the integration of individual attribute data to boost empirical relevance, and the definition of persistent skeleton structures to yield more reliable concepts of community dynamics. Combining sparsification with imputation and nodal attributes, we will advance recent approaches to locally-determined skeleton structures and study their utility for novel and approximate graph clustering algorithms that scale to big data. We are aiming at algorithms with nearlinear or even sub-linear worst-case running times by exploiting special characteristics of social networks or settling for approximate results. While overall algorithmic efficiency is important, we will place additional emphasis on algorithm engineering to increase and exploit locality. Consideration of streaming data will be a necessary requirement as online social networks typically generate sequences of dyadic events. Due to the pervasive nature of clustering problems, the expected outcome of this project includes tools for big data analysis in other application areas.