GitHub - NimrahMustafa/SocialCommunityBasedCaching

#Introduction

This is a prototype implementation of Bingo, a proactive content caching scheme that leverages the presence of interest groups in online social network, with reference to the following paper:

Social Groups Based Content Caching in Wireless Networks. Nimrah Mustafa, Imdad Ullah Khan, Muhammad Asad Khan, and ZartashAfzal Uzmi. ACM International Symposium on Mobility Management and Wireless Access(MobiWac ’21).

Due to lack of real data as explained in the paper, we generate social group structures which are then in turn used to simulate of a stream of request arrivals at the base station by users who may or may not be members of the generated group structures in order to evaluate the performance of the proposed caching engine.

#Group Structure Generation

We opt for the scale-free (SC) graph model in which the node degrees follow a 'long-tailed' power law distribution as it is well-known that the degree distribution of most social networks (Twitter, Facebook, Whatsapp, etc.) follows a power law distribution.

We model the social community structure among the users as a bipartite graph $G=(U,\mathcal{C},E)$. The vertex bipartitions represent the user set $U$ and community set $\mathcal{C}$ and an edge $(u_i,C_k) \in E$ represents membership of $u_i$ in $C_k$. Clearly, $deg(C_k)=|C_k|$ and $deg(u_i) = |\Lambda_i|$ i.e. the number of communities to which $u_i$ belongs. The sizes of the $K$ communities are sampled from the Pareto distribution given the minimum and maximum sizes, $|C|{min}$ and $|C|{max}$. The edge and user sets are populated using the community affiliation graph model (AGM). Alternately, we could have opted for other benchmark community generation models such as LFR, which generates {\it user-user} networks. We chose the AGM model instead since a {\it user-community} bipartite membership network suffices for our purpose as we only require the community structure and do not use the pairwise links between users. At each step, a new community node $C_k$ is added to $G$. Each of the $|C_k|$ edges is generated in either of two ways, determined by $\lambda\in(0,1)$.

With probability $\lambda$, an existing user $u_i$ in $U$ is connected to $C_k$ (the edge $(u_i,C_k)$ is added to $E$.) This $u_i$ is chosen independently from $U$ with probability proportional to $deg(u_i)$.
With probability $1-\lambda$, a new user $u'$ is added to $U$ and the edge $(u',C_k)$ is added to $E$.

#Request Arrival Simulation

With the community structure in place, the request log is generated as a time series of batches of requests. Each batch of $R$ requests ${r_1,r_2,\cdots,r_R}$, in order of arrival at the base station, is generated by the following process.

Naturally larger communities are more likely to be more 'active', as the chances of some content posted to and within such communities are higher. Each batch of requests is first viewed as a randomly ordered sequence of distinct (community, file) pairs $(C_k,f_j)$. A community $C_k$ is chosen randomly from $\mathcal{C}$ with weight proportional to $|C_k|$.

A file $f_j \in {\mathcal F}$ is chosen randomly weighted by popularity $p_j$ sampled from the Zipf distribution, which is well known for modeling file popularity. $p_j$ is defined as $p_j = \frac{j^{-\alpha}}{\sum_{l \in {\mathcal F}}; l^{-\alpha}}$ where $\alpha \geq 0$ is the steepness parameter of the popularity curve. A smaller value of $\alpha$ implies that fewer files are more popular. ${\mathcal F}$ is a static set of $1000000$ unique files for simulation purposes.

Next, each pair $(C_k,f_j)$ (community $C_k$ interested in file $f_j$) is replaced by $|C_k|$ (user, file) pairs $(u_i,f_j)$ for each $u_i \in C_k$. The first request for a file by a community member is considered to be the time when the file is 'shared' with the community. Requests by individual users, not originating as part of a community, are modelled by introducing additional $\eta \in (0,1)$ $(u_i,f_j)$ pairs where $u_i$ and $f_j$ are selected at random and by popularity, respectively.

We model user request behaviour to resemble the real-life dynamics of group-based communication as follows. Each user $u_i$ has an associated response parameter $a_i$, which models how 'active' the user is as part of a community. A more active user in a community is more likely to request a file sooner.% i.e. respond faster to the content made available to the community than a less active user.

We observe that in reality, few users are very active or inactive, and most users are moderately active. Thus, $\forall i \in [1,N]$, $a_i$ is a random numberdrawn from the normal distribution $\mathcal{N}(0.5,0.15)$. Users are classified into ordinal categories by thresholding the value of the user's response parameter by a constant $\delta$. Users with $a_i<\delta$, $\delta< a_i<1-\delta$ and $a_i >\delta$, respectively show high, moderate, and low activity. Requests by users within each category are ordered randomly.

For realistic interleaving of requests, overlap of two adjacent categories' requests is controlled by the fraction $\gamma$. Finally, the process is repeated for each batch of requests, and the batches are concatenated to form the final request log. The number of requests in each batch can be set corresponding to the time of day to simulate varying amounts of network traffic across the day.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
cachingEngineSimulation.py		cachingEngineSimulation.py
groupStructureGeneration.py		groupStructureGeneration.py
minheap.py		minheap.py
omegaIndex.py		omegaIndex.py
readme.tex.md		readme.tex.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

NimrahMustafa/SocialCommunityBasedCaching

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages