Skip to content

Commit

Permalink
Updates
Browse files Browse the repository at this point in the history
  • Loading branch information
burcuku committed Oct 7, 2024
1 parent 445e82d commit ff502c6
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 11 deletions.
12 changes: 6 additions & 6 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -576,22 +576,22 @@ <h2>Contents and tentative schedule</h2>
<tr class="even">
<td align="left">6</td>
<td align="left">11/10</td>
<td align="left"><a href="streaming.html">Stream processing</a></td>
<td align="left"><a href="graphs.html">Graph Processing</a></td>
<td align="left">BKO</td>
<td align="left">Flink (02/11)</td>
<td align="left"></td>
</tr>
<tr class="odd">
<td align="left">7</td>
<td align="left">16/10</td>
<td align="left"><a href="streaming.html">Stream processing
systems</a></td>
<td align="left"><a href="streaming.html">Stream processing</a></td>
<td align="left">BKO</td>
<td align="left"></td>
<td align="left">Flink (02/11)</td>
</tr>
<tr class="even">
<td align="left">7</td>
<td align="left">18/10</td>
<td align="left"><a href="graphs.html">Graph Processing</a></td>
<td align="left"><a href="streaming.html">Stream processing
systems</a></td>
<td align="left">BKO</td>
<td align="left"></td>
</tr>
Expand Down
25 changes: 20 additions & 5 deletions spark.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

<meta name="author" content="Georgios Gousios and Burcu Kulahcioglu Ozkan" />

<meta name="date" content="2023-10-09" />
<meta name="date" content="2024-10-06" />

<title>Spark: RDDs and Pair RDDs</title>

Expand Down Expand Up @@ -501,7 +501,7 @@

<h1 class="title toc-ignore">Spark: RDDs and Pair RDDs</h1>
<h4 class="author">Georgios Gousios and Burcu Kulahcioglu Ozkan</h4>
<h4 class="date">09 October 2023</h4>
<h4 class="date">06 October 2024</h4>

</div>

Expand Down Expand Up @@ -541,12 +541,27 @@ <h2>Map/Reduce</h2>
<p>Map/Reduce is a general computation framework, loosely based on
functional programming<span class="citation">[1]</span>. It assumes that
data exists in a K/V store</p>
<p>The <code>map</code> and <code>reduce</code> functions supplied by
the user have associated types:</p>
<ul>
<li><code>map((K1, V1), f: (K1, V1) -&gt; (K2, V2)): List[(K2, V2)]</code></li>
<li><code>reduce((K2, List[V2])): List[(K3, V3)]</code></li>
<li><code>map(K1, V1) -&gt; List[(K2, V2)]</code></li>
<li><code>reduce((K2, List[V2])) -&gt; List[(K3, V3)]</code></li>
</ul>
<p>Map/Reduce as a system was proposed by Dean &amp; Ghemawat <span class="citation">[2]</span>, along with the GFS.</p>
</div>
<div id="examples-for-mapreduce-use-cases" class="section level2">
<h2>Examples for MapReduce Use Cases:</h2>
<ul>
<li><p>Number of occurrences of each word in a large set of documents to
compute <code>(word, N)</code></p></li>
<li><p>Count of URL Access Frequency to compute
<code>(targetURL, N)</code></p></li>
<li><p>Reverse Web-Link Graph to compute
<code>(targetURL, List&lt;sourceURLS&gt;)</code></p></li>
<li><p>Filtering records</p></li>
<li><p>Top <code>k</code> records</p></li>
</ul>
</div>
<div id="mapreduce-execution-overview" class="section level2">
<h2>Map/Reduce execution overview</h2>
<ul>
Expand Down Expand Up @@ -889,7 +904,7 @@ <h2 class="unnumbered">Transformations on Pair RDDs</h2>
<h2 class="unnumbered">Pair RDD examples: <code>groupByKey</code> and
<code>reduceByKey</code></h2>
<div class="sourceCode" id="cb17"><pre class="sourceCode scala"><code class="sourceCode scala"><span id="cb17-1"><a href="#cb17-1" tabindex="-1"></a><span class="kw">val</span> odyssey <span class="op">=</span> sc<span class="op">.</span><span class="fu">textFile</span><span class="op">(</span><span class="st">&quot;sample.txt&quot;</span><span class="op">).</span><span class="fu">flatMap</span><span class="op">(</span>_<span class="op">.</span><span class="fu">split</span><span class="op">(</span><span class="st">&quot; &quot;</span><span class="op">))</span></span>
<span id="cb17-2"><a href="#cb17-2" tabindex="-1"></a><span class="kw">val</span> words <span class="op">=</span> odyssey<span class="op">.</span><span class="fu">flatMap</span><span class="op">(</span>_<span class="op">.</span><span class="fu">split</span><span class="op">(</span><span class="st">&quot; &quot;</span><span class="op">)).</span><span class="fu">map</span><span class="op">(</span>c <span class="op">=&gt;</span> <span class="op">(</span>c<span class="op">,</span> <span class="dv">1</span><span class="op">))</span></span></code></pre></div>
<span id="cb17-2"><a href="#cb17-2" tabindex="-1"></a><span class="kw">val</span> words <span class="op">=</span> odyssey<span class="op">.</span><span class="fu">map</span><span class="op">(</span>c <span class="op">=&gt;</span> <span class="op">(</span>c<span class="op">,</span> <span class="dv">1</span><span class="op">))</span></span></code></pre></div>
<p>Word count using <code>groupByKey</code>:</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode scala"><code class="sourceCode scala"><span id="cb18-1"><a href="#cb18-1" tabindex="-1"></a><span class="kw">val</span> counts <span class="op">=</span> words<span class="op">.</span><span class="fu">groupByKey</span><span class="op">()</span> <span class="co">// RDD[(String, Iterable[Int])]</span></span>
<span id="cb18-2"><a href="#cb18-2" tabindex="-1"></a> <span class="op">.</span><span class="fu">map</span><span class="op">(</span>row <span class="op">=&gt;</span> <span class="op">(</span>row<span class="op">.</span>_1<span class="op">,</span> row<span class="op">.</span>_2<span class="op">.</span>sum<span class="op">))</span> <span class="co">// RDD[(String, Int)]</span></span>
Expand Down

0 comments on commit ff502c6

Please sign in to comment.