VOOZH about

URL: https://www.infoq.com/news/2016/09/JavaOne-2016-parallel-streams/

โ‡ฑ JavaOne 2016 - Day 2 "Thinking in Parallel" - InfoQ


BT

InfoQ Homepage News JavaOne 2016 - Day 2 "Thinking in Parallel"

JavaOne 2016 - Day 2 "Thinking in Parallel"

Sep 26, 2016 3 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.
Get in touch

Once again day two at JavaOne featured live coverage of four session rooms. Links to the videos for those sessions are provided at the end of this article.

InfoQ attended the session โ€˜Thinking in Parallelโ€™, which we now discuss (see video link under Ballroom 4 below). This session was presented by Stuart Marks and Brian Goetz of Oracle. The tag-team did a wonderful job of explaining parallelism with streams where Marks concentrated on why we would use streams and Goetz concentrated on why we would use parallelism.

Marks started with an example of converting an array of strings into upper case and showed the conventional approach as well as the streams approach.

The conventional approach used a for-loop as shown here:

๐Ÿ‘ Image

Here the input array is processed sequentially in a left to right order. But the computation is just the โ€œtoUpperCaseโ€ call and each computation is independent of the other. Also, the records can be processed in any sequence or in parallel. So, how about using Streams?

๐Ÿ‘ Image

In the streams case, the array input is mapped to upper case and then back into an array. The input and the output array can be in any order and the code didnโ€™t need to specify any ordering; the entire array was uppercased.

Marks claims that the Streams version is better due to the higher level of abstraction, independence of computation, and aggregation.

๐Ÿ‘ Image

The next example demonstrated splitting a list at a location specified by a predicate, as shown below:

๐Ÿ‘ Image

Marks pointed out that the subList method takes arguments that are indexes into the original list. To accomplish the requirement our algorithm must split at the location of the #'s. To emphasize, Marks put bars at the #s, which would become the edges of the sublist, and then synthesized split points at each end to create exterior bounds as shown below.

๐Ÿ‘ Image

Implementation of this algorithm using a conventional approach yields:

๐Ÿ‘ Image

There are some accidental complexities, for example: start = cur + 1; and the call to result.add outside the for-loop. Hence, Marks observed โ€“

๐Ÿ‘ Image

Marks contrasted that to the streams approach, revisiting the problem where we are only interested in indexes. So, instead of the iterative for-loop, we can stream over the indexes. Since the predicate and the resulting split computations are independent of each other, we can perform the predicate test on each element and everything else can be independent. The resulting code (after eliminating the array copying to add to the array) would look like this:

๐Ÿ‘ Image

Marks concluded with his observation that the Streams approach is clearer -

๐Ÿ‘ Image

Goetz then complemented Marks's setup by considering the advantages and disadvantages of going parallel. He started off with the slide below talking about parallelism which is a tradeoff between burning more resources and getting the result faster.

๐Ÿ‘ Image

Goetz emphasized that parallel computation will always need more task parallelism work, such as dividing the work, managing and coordinating the tasks and completion. Consequently if the problem is to succeed it must be parallelizable and there must be a good parallel framework (e.g. the fork-join library). Finally, there should be enough data to be processed in parallel. Goetz then provided examples where parallelism can be useful and showed the audience the divide-and-conquer code:

๐Ÿ‘ Image

Goetz highlighted that all parallel solutions essentially look like the above divide and conquer code. Partitioning the problem avoids synchronization overhead. For an efficient parallel algorithm, the problem should be divided up early. Goetz then provided an example of summing an array in parallel and gave some performance considerations:

๐Ÿ‘ Image

Finally, Goetz went on to consider parallel stream performance. Streams are easy to parallelize, but not all sources may be suitable for parallelization.

Array-based sources are best

According to Goetz, the "NQ model" is a simple model to calculate a rule of thumb likelihood of parallel speedup:

 ๐Ÿ‘ Image

Goetz talked about "source splitting efficiency", saying that arrays split evenly and cheaply due to their underlying representation. Also keep in mind that some operations on the pipeline are dependent on the sequence encountered; for example - a "limit of 10" operation is obviously applicable to the first 10, in order. Another consideration is that sometimes merging can be expensive.

Goetz summarized the session as follows:

๐Ÿ‘ Image

Links:

Ballroom 4

Ballroom 6

Cyrill Magnin II/III

Embarcadero

This content is in the Java topic

Related Topics:

The InfoQ Newsletter

A round-up of last weekโ€™s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

BT