Sunday, May 22, 2016

Rx.NET community improvments

In recent months (early to mid 2016) there has been rising confusion and frustration with the lack of openness and direction of the Rx.NET code base. Code being open-sourced and hosted on GitHub in late 2012 was cause for celebration. The community could see the code without the need for a decompiler, they could raise issues and could they even submit code changes via a Pull Request.

Bright future

Over time, the community created issues and Pull Requests accumulated, but activity in the master branch seemed to come to a halt in Oct 2015. However anticipation was in the air as one of the members of the original team (and who I think is the lead of the Rx team inside Microsoft), Bart De Smet announced that Rx3.0 was underway at NDC in Mid 2015. Features seemed to include cross platform version of expression trees - Bonsai Trees. "We are going to ship this later this year" 19:33. The community was very excited.

Another key event was happening around the same time; Microsoft was making a large change to .NET. .NET Core was to be a massive undertaking that allowed .NET code to run cross platform, not just Windows, Windows RT and Windows Phone, but on Linux and OS X too.

On top of these key events, there was the growing organic momentum of the Reactive movement. A movement that in my opinion the Rx.NET team had a large hand in popularizing and setting the standard for. Rx had matured in the eyes of the development community. It had been ported from the original implementation in .NET to JVM, JavaScript, C++, Ruby and others. This widespread adoption across the industry gave developers and managers alike confidence in the tool.

Bright future?

However, as time passed new cautious developers looking to adopt Rx wanted to know that this technology wasn't at a dead end. With recent effective end-of-life for technologies such as browser plugins (Silverlight, Flash etc) and Parse, it is understandable why people with long term plans are looking for assurances. Even intrepid developers were looking for some assurances that things were still moving. After the announcement of Rx3.0 and it being inferred that we would see a change in late 2015, in conjunction with the .NET Core activity questions started coming.

In the gitter chat room for Rx.NET, questions about the next release started as far back as Sept 2015. Then in Jan 2016 more questions about the next release and if Rx is still a supported library. And again in February, March, April and May. The tone of the chat room seemed glummer and glummer each time the question was asked, and we realized that no-one from the Rx Team was answering. Even Tamir Dresher who was writing the next book on Rx couldn't get any response from Microsoft. To add to the feeling of rejection the community was feeling, we became aware of a private chat room that members of the Rx Team were engaging with a select few people from outside of Microsoft. While I was aware of one person in this chat who I would consider a key member of the community, most of the other people I would consider key members with a wealth of experience in Rx and relevant streaming technologies were left in the dark, excluded.

Just because we can't get a transparent and comforting answer about the future of Rx from someone at Microsoft doesn't mean the project is dead. However the 18 Pull Requests that have had no interaction from the Rx Team, and the lack of any commits to master in 7 months did leave a lot of people pretty uneasy about what was happening. Not only were we getting radio silence, but this was being contrasted with what appeared to be massive amounts of activity happening in the RxJs space.

Bright Future!

And then this tweet happened
The future of Rx .NET is still bright, and yes we are transitioning it to the .NET Foundation
To me it seemed to be a self congratulatory tweet. In my opinion, there wasn't much worth celebrating. Some people did some good work porting some Rx code to CoreCLR and did so on a private fork/repo. Not on a feature branch on the main repo where others could watch, but somewhere else. A heated exchange followed on twitter, for which I apologize. It is not the forum for an exchange like that. But it did prompt me to write what I hope is constructive feedback below.

A more collaborative and transparent future?

The Rx code base is open source, which I understand doesn't mean a free-for-all. However, I think it is reasonable to expect a little more from a project than what we are getting. Here are some things I think are reasonable expectations:

Be clear about the status of the Repo/Project

In the front page (Readme.md) have the status of the repo. This should probably include which version is the current latest, which is the current pre-release. Link to them in nuget and their tags/branches (like RxJs). Highlight what the roadmap is (like ASP.NET).

Instead of people having to ask in chat rooms, twitter and forums about the status of the project it should be right their front and center. Something like
We're hoping to release a new batch of functionality later this year, based on internal developments that took place within the Bing organization. As part of this, we're looking at doing future developments in the area of our reactive programming cloud platform in the open here on GitHub. Currently, we're working out the details to transition parts of the technology to the .NET Foundation.
Followed up with something like "Transition to the .NET foundation can be slow, so this may take til Q3 2016. After that we can start porting our internal work in Q4 2016. In the meantime we would love to get your help on the issues that are up for grabs." (quote my own and made up). Now we would all know what was going on. It would take 10min to update that readme.md file.

Be clear about how to contribute

Contribution is not just about signing a CLA (Contributor License Agreement). It should include some guidance on how to raise an issue, link it to a PR and create some dialogue to validate that the change is inline with the direction of the project. This should include which branch to target 'master', 'develop' or perhaps a feature branch? It should also set the expectation of what the project custodians will do for the contributor with regards to labeling, setting milestones or closing the issue.

Documentation

The readme, the wiki and the reactivex.io documentation is sub-par. Allow the community to add more links to other helpful resources. There are 2 PRs to update documentation, but they appear to have been ignored.

If this is Microsoft's documentation for Rx, then maybe the link should be front and center. However, it suggests that you "download" the Rx SDK from the Microsoft download center instead of using Nuget, and the samples kick you straight off in the wrong direction by sticking subjects in your face as the first example. I would imagine that there would be links to external resources like the various books and websites that are out there. A good starting point for the loads of options could be this compilation -http://stackoverflow.com/questions/1596158/good-introduction-to-the-net-reactive-framework/. A link to ReactiveX.io would make sense. On that note, the ReactiveX.io samples for .NET are incomplete. Even the basic `subscribe` and `create` methods are missing. 

Clean up the branches

It appears that master is the only real branch (and gh-pages). So can 'develop' and 'BetterErrors' be merged, deleted or documented?

Creating a tag that relates to the last commit that was used to create a package is a helpful thing to do. This allows people to identify bugs for specific versions and see if a fix is already in, or if they should consider raising an issue or a PR.

Engage with issues

In the last few days there appears to be a flurry of activity, however I think we can improve in this area. Issues seem to be one of three broad categories: "Acceptable", "Maybe Later" and "Not acceptable". For things that won't get accepted into the repository, let's be honest and transparent. Say this doesn't follow the direction of the project, but thanks for your contribution. Then close the issue, or give the author a few days for rebuttal or to close it themselves. For issues that look like they match to the current planned release or would be suitable for a future release then label it as such. The CoreFx team do a great job of this in their issues

Labels are not the only tool the team could use to communicate with. Milestones also are a great way to give visibility to what is current and what is for later. They also allow you to see how close to complete a milestone is. It seems that this was a prime opportunity to use them. Milestone 2.2.6 could have had just two issues. CI and CoreCLR build. These issues could have been labeled as Up for grabs. The community would have jumped at the task. However with 18 of the current 23 PRs having had no official interaction, you can see why the community appears aloof when the task is significant, but may not even get noticed.

As a concrete challenge to the Rx.NET team: Aim to close or label every issue in the Repo. 

Maybe create some new issues, and mark them as up for grabs. Watch the community jump to action.

Which brings me on to my last ask, which is basically the same as for issues, but for PRs. Engage. People that have raised a PR, probably have used the library heavily, found an opportunity for change, then forked and cloned the repo. From here figured out how to get it build (because it doesn't out of the box in Debug or Release targets). Figured out the code base, made a change and ideally some tests to support it. Finally they have submitted the PR. Best case scenario, I think would be 4hrs of their life they could have been doing something else. I imagine it would be quite a disappointing feeling to have 0 comments on your PR.

Symbiotic Relationship

I am not hating on Rx, nor am I hating on @ReactiveX quite the opposite. I think what the Rx (Volta) team have done is brilliant. I enjoy it so much I poured 6 months into writing documentation for it and a Christmas holiday writing a PR for it. There are others too out there that are equally as passionate.

But like in a loving family, sometimes you need to be the unpopular one and point out that hey, this is not okay.
All I am asking from the Rx.NET team is a little back-and-forth, a little transparency. I would think they have far more to gain from doing this than anyone else. I have worked with hundreds of people that have used Rx. A handful have contributed to RxJava and RxJs but none to Rx.NET, because it is too hard. I think there is a great opportunity for positive change here.

RxJS is almost 100% community run at this point with Microsoft helping steer the project

Can Rx.NET take a step in this direction? With brains like Bart's at the helm and the army of the willing already out there, then yes, we would have a bright future indeed. Just tell us what you want us to do.

Friday, March 18, 2016

Measuing latency with HdrHistogram

I had the pleasure last year to meet with Gil Tene, an authority on building high performance software and specifically high performance JVM implementations. He gave a brilliant presentation at React San Francisco and then again at YOW in Australia on common mistakes made when measuring performance. He he explained that measuring latency is not about getting a number, but identifying behavior and characteristics of a system.

Often when we set out to measure the performance of our software we can be guided by NFR (Non-Functional Requirements) that really don't make too much sense. More than once I have been presented with a requirement that the system must  process x requests per time-period e.g 5 messages per second. However as Gil points out this single number is either unreasonable, or misleading. If the system must always operate in a state to support these targets then it may be cost prohibitive. This requirement must also define 100% up-time. To work around that, some requirements specify that the mean response time should be y. However this is potentially less useful. By definition what we are really specifying is the 50% of requests must see worse performance than the target.

A useful visualization for pointing out the folly of chasing a mean measurement is illustrated below.

File:Anscombe's quartet 3.svg
[Source - https://en.wikipedia.org/wiki/Anscombe%27s_quartet]

All of these charts have the same mean value, but clearly show different shapes of data. If you measuring latency in your application and were targeting a mean value, you may be able to hit these targets but still have unhappy customers.

When discussing single value targets, a mean value can be thought of as just the 50th percentile. In the first case the requirement was for the 100th percentile.

Perhaps what is more useful is to measure and target several values. Maybe the 99th percentile plus targets at 99.9% and 99.99% etc is what you really are looking for.

Measuring latency with histograms

Instead of capturing a count and a sum of all latency recorded to then calculate a mean latency, you can capture latency values and assign them to a bucket. The assignment of this value to a bucket is to simply increment the count of that bucket. This now allows us to analyse the spread of latency recordings.

The example of a histogram from Wikipedia shows how to represent heights by grouping into buckets of 5cm ranges. For each value of the 31 Black Cheery Trees measured, the height is assigned to the bucket and the count for that bucket increased. Note that the x axis is linear.

An example histogram of the heights of 31 Black Cherry trees

A naive implementation of a histogram however, may require you to pre-plan your number and width of your buckets. Gil Tene has helped out here by creating an implementation of a histogram that specifically is design for high dynamic ranges, hence its name HdrHistogram.

When you create an instance of an HdrHistogram you simply specify
  1. a maximum value that you will support
  2. the precision you want to capture as the number of significant digits
  3. optionally, the minimum value you will support
The internal data structures of the HdrHistogram are such that you can very cheaply specify a maximum value that is an order of magnitude larger than you will expect, thus giving you enough headroom for your recorded values. As the HdrHistogram is designed to measure latency a common usage would be to measure a range from the minimum supported value for the platform (nanoseconds on JVM+Linux, or ticks on .NET+Windows) up to an hour, with a fidelity of 3 significant figures.


For example, a Histogram could be configured to track the counts of observed integer values between 0 and 36,000,000,000 while maintaining a value precision of 3 significant digits across that range. Value quantization within the range will thus be no larger than 1/1,000th (or 0.1%) of any value. This example Histogram could be used to track and analyze the counts of observed response times ranging between 1 tick (100 nanoseconds) and 1 hour in magnitude, while maintaining a value resolution of 100 nanosecond up to 100 microseconds, a resolution of 1 millisecond(or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At it's maximum tracked value(1 hour), it would still maintain a resolution of 3.6 seconds (or better).

Application of the HdrHistogram

When Matt (@mattbarrett) and I presented Reactive User Interfaces, we used the elements of drama and crowd reaction to illustrate the differences between various ways of conflating fast moving data from a server into a client GUI application. To best illustrate the problems of flooding a client with too much data in a server-push system, we used a modestly powered Intel i3 laptop. This worked fairly well in showing the client application coming to its knees when overloaded. However it also occasionally showed Windows coming to its knees too, which was a wee bit too much drama to have on stage during a live presentation.

Instead we thought it better to provide a static visualization of what was happening in our system when it was overloaded with data from the server. We could then contrast that with alternative implementations showing how we can perform load-shedding on the client. This also meant we could present with a single high powered laptop, instead of bringing the toy i3 along with us just to demo.

We added a port of the original Java HdrHistogram to our .NET code base. We used it to capture the latency of prices from the server, to the client, and then the additional latency for the client to actually dispatch the rendering of the price. As GUI applications are single threaded, if you provided more updates than the GUI can render, there are two things that can happen:

  • updates are queued
  • updates are conflated
What you do in your client application depends on your requirements. Some systems will need to process every message. In this case they may choose to just allow the updates to be queued. Other systems may allow updates to be conflated. Conflation is the act of taking many and reducing to one. So for some systems, they maybe able to conflate many updates and average them or aggregate them. For other systems, it may only be the last message that is the most important, so the conflation algorithm here would be to only process the last message. Matt discusses this in more detail on the Adaptive Blog.

In the demo for ReactiveTrader we demo queuing all updates and 3 styles of conflation. When we applied the HdrHistogram to our code base, we were quick to see we actually had a bug in our code base.



We had two problems. The first problem was an assumption that what worked for Silverlight, would also work for WPF. As WPF has two threads dedicated to presentation (a UI thread and a dedicated Render thread), we were actually only measuring how long it took for us to put a price on another queue! You can see that the ObserveLatest1 and ObserverLatest2 (red and yellow) lines show worse performance than just processing all items on the dispatcher. I believe this is due to us just doing more work to conflate before sending to the render thread. Unlike in Silverlight, once we send something to the Render thread in WPF we can no longer measure the time taking to actually render the change. So our measurements here were not really telling us the full story.

The second problem we see is that there was actually a bug in the code we copied from our original silverlight (Rx v1) code. The original code (red line) accidentally used a MultipleAssignmentDisposable instead of a SerialDisposable. The simple change gave us the improvements seen in the yellow line.

We were happy to see that the Conflate and ConstantRate algorithms were measuring great results, which were clearly supported visually when using the application.



To find out more about the brilliant Gil Tene
I am currently working on the final details of a complete port of the original Java HdrHsitogram to .NET. You can see my work here - https://github.com/LeeCampbell/HdrHistogram.NET