Profiling .NET performance issues

In this post I want to talk about a frustrating problem most developers will encounter sometimes during their career – Performance issues.
You write some code, you test and run it locally and it works fine- but once it is deployed , bad things start to happen.
It just refuses to give you the performance you expect to…
Besides doing the obvious (which is calling the server some bad names) – what else can you do?
In the latest use case we encountered, one of our Sw. engineers was working on merging the code from a few processes into a single process. We expected the performance to stay the same or improve (no need for inter-process communication) – and in all of the local tests it did.
 
However, when deployed to production, things started getting weird:
At first the performance was great but than it started deteriorating for no apparent reason,
CPU started to spike and the total throughput went down to about 25% worse than the original throughput.
The SW. engineer, which was the assigned to investigate the issue, started by digging into the process performance indicators, using ELK.
Now, we are talking about a deployment of multiple processes per server and of multiple servers- so careful consideration should go into aggregating the data.
Here is a sample of some interesting charts:
 
Profiling .NET performance issues
Analyzing the results, we realized the problem happened on all of the servers intermittently.
We also realized that some inputs will cause the problem to be more serious than others.
We used Ants profiling tool on a single process and fed it with some “problematic” inputs and the results were surprisingly, not very promising…:
 
a. There were no unexpected hotspots.
b. There were no memory leaks.
c. Generation 2 collection was not huge, but it had a lot of data- more than gen1 (but less than gen0).
 
Well this got us thinking, might our problem be GC related?
We now turned to the Perfmon tool.

Analyzing the %time in GC metric revealed that some processes spent as much as 50% of their time doing GC.

Profiling .NET performance issues
Now the chips started falling-
One of our original processes used to do some bookkeeping, holding some data in memory for a long duration. Another type of a process was a typical worker: doing a lot of calculations using some byte arrays and than quickly dumping them.
When the two processes were merged we ended up with a lot of data in gen2 , and also with many garbage collection operations because of the byte arrays – and that resulted in a performance hit.
 
Well, once we knew what was the problem, we had to resolve it – but this is an entirely different blog post altogether…

Computer vision application – Challenges of learning ordinary concepts

In the last four years convolution neural networks (CNNs) have gained vast popularity in computer vision application.

Basic systems can be created from off the shelf components allowing solving in a relative easy task problems of detection (“what is the object appearing in the image?”), localization (“where in the image there is a specific object?”) or a combination of both.

computer vision applicationcomputer vision application
Above: Two images from the ILSVRC14 Challenge

Most systems capable to create product level accuracies are limited to a fixed set of different predetermined concepts, and are also limited by the inherently assumption that a representing database of all possible appearance of the required concepts can be collected.
The above two limitations should be considered when designing such a system as concepts and physical objects used in everyday life may not be easily fitted to these limitations.

Even though CNN based systems that perform well are quite new, the fundamental questions outlined below relate to many other Computer Vision systems.

 

One consideration is that some objects may have different functionality (and hence a name) whereas they have the same appearance. 

For example, the distinction between a teaspoon, tablespoon, serving spoon, and a statue of a spoon is related to their size and usage context. We should note that in such case the existence and definition of the correct system output is highly depending on the system’s requirements.

computer vision application     computer vision application.

 

In general, plastic artistic creations, raises the philosophical question of what is the shown object (and hence the required system’s output). For example – is there a pipe shown in the below image?

computer vision application

  When defining a system to recognize an object, another issue is the definition of the required object. Even for a simple daily object, different definitions will result in different set of concepts. For example, considering a tomato, one may ask what appearances of a tomato are required to be identified as a tomato.

Clearly, this is a tomato:computer vision application

 

But what about the following? When does the tomato cease to be a tomato and becomes a sauce? Does it always turns to a sauce?

computer vision application       computer vision application
computer vision application

Since this kind of Machine Learning systems learns from examples, different systems will behave differently. One may use all examples of all states of a tomato as one concept, whereas another may split it to different concepts (that is, whole tomato, half a tomato, rotten tomato, etc.). In both cases, tomato that has a different appearance and is not included in none of the concepts (say, shredded tomato) will not be recognized.

Other daily concepts have a meaning functional (i.e. defined by the question “what is it used for?”) whereas the visual cues may be limited. For example, all of the objects below are belts. Except for the typical possible context (possible location around the human body, below the hips) and/ or functional (can hold a garment), there is no typical visual shape. We may define different types of belts that interest us, but then we may need to handle cases of objects which are similar to two types and distinctively belongs to one type.

computer vision application     computer vision application

computer vision application 

Other concept definition considerations that should be addressed may be:

– Are we interested in the concept as an object, location, or both? As an object (on the left) it can be located in the image, whereas the question “where is the bus” is less meaningful for the right image.
computer vision application     computer vision application

 

Theses ambiguities are not always an obstacle. When dealing with cases when the concepts have a vague definitions or a smooth transition from one concept to the other, most system outputs may be considered satisfactory. For example, if an emotion detection system’s output on the image below is “surprise”, “fear” or “sadness” – it is hard to argue that it is a wrong output, no matter what were the true feeling of the person when the image was taken.
 
computer vision application

 

Written by Yohai Devir.

Logentries – Multi-platform Centralized Logging

Background:

Many times we develop websites that receive a request and perform multiple operations, including using other services which are not located under the same website.

The problem:

When we analyze error causes and unexpected results we need to track the flow from the request send, through the different processing stages up to the point where the response is received.
This requires the inspection of the different logs of the various system components.
We wanted to have a centralized logging where we could see the different logs in one location.
The problem was that the system components are in different platforms and languages (Windows/Linux, JS/.Net/C++).
We already use Kibana as a centralized logging for our in-house applications but here we have a website that is accessed from all over the world and logging data to the Kibana requires the exposure of an external endpoint.

The solution:

We chose to use Logentries.com site where log data from any environment is centralized and stored in the cloud. One can then search and monitor it.
It is very easy to use and provides two ways to achieve this; either by combining libraries directly in the application (e.g Javascript, .Net log4net Appender) or by adding an agent that listens to the application log file and sends it to the centralized logging site.
Logentries has simple and user-friendly UI with some useful features such as Aggregated Live-Tail Search, Custom Tags, Context View, Graphs.
 
This solution certainly meets our requirements.

.NET Compiler Platform “Roslyn”

At PicScout, we use .NET Compiler Platform, better known by its codename “Roslyn“, an open sourcecompiler and code analysis API for C#. (https://roslyn.codeplex.com/)

The compilers are available via the traditional command-line programs and also as APIs, which are available natively from within .NET code.

Roslyn exposes modules for syntactic (lexical) analysis of code, semantic analysis, dynamic compilation to CIL, and code emission.

Recently, I used Roslyn for updating hundreds of files to support an addition to our logging framework.

I needed to add a new member in each class that was using the logger and modify all the calls to the logger, of which there were several hundreds.

We came up with two ways of handling this challenge.

After reading a class into an object representing it, one possible way of adding amember is to supply a syntactical definition of a new member, and then re-generate back the source code for the class.

The problem with this approach was the relative difficulty of configuring a new member correctly.

Here is how it might look:

Generating the code:

private readonly OtherClass _b = new OtherClass(1, “abc”);

Another option, which is more direct, was to simply get the properties of the class and use them.

For example, we know where the class definition ends and we can append a new line containing the member definition.

Here is how it looks:

Get class details:Insert the new line (new member):After that, replacing the calls to the new logger is a simple matter of search – replace.

PhantomJS

The Problem:
When given a website and an image on that website, the task is to take a screen capture of that image on the page it appears on the site.
 
Solution 1:
Manually: enter the website, find the given image (scroll if needed), take the screenshot and save it to the disk.
But what can you do when you have thousands of screenshots to take per hour?
You can employ hundreds of people to handle this scale, but…
This is kind of expensive and what should you do if your scale increases or decreases?
 
Solution 2:
Automate it: if only we could write a piece of software that could do exactly what we need…
So what do we actually need? Something that can:
1) Imitate a browser
2) Find an image on a webpage
3) Take the screenshot
 
Let me introduce PhantomJS:
PhantomJS is commonly known as Headless Web Kit with JavaScript API.
Headless refers to the fact that the program can be run from the command line without a window system.
JavaScript API means that we can easily write scripts that interact with PhantomJS which is useful if one needs to find an image on a webpage for instance.
Web Kit is the open-source web browsing engine that powers popular browsers like Chrome.
 
How to use it?
 
There are many ways to use PhantomJS, here at PicScout we use Selenium Web Driver to run PhantomJS. Selenium can control PhantomJS in the same way that it does any other browser.
 
How does it help me to take a screen capture?
 
As we said before, PhantomJS can run JavaScript, so all we have left to do is to write a short script that searches for the image location on the page and let PhantomJS run the script.
After receiving the location we can use PhantomJS to take a screen capture, despite the fact that PhantomJS is a Headless browser it still can render a web page as well as a web driver.
 
Code sample:
 
 
Running several instances of PhantomJS – problems and solutions
 
Problem #1: zombie processes. In our app we create and kill PhantomJS processes, we have noticed that after some time there are many zombie instances of PhantomJS.
 
Solution #1: for unknown reasons, occasionally we are unable to create a new PhantomJS instance. This happen when an exception is thrown and a new PhantomJS process starts. Now we need to manually find the process id and kill it.
 
Problem #2: low success rate on high CPU usage – when CPU reached 100% we were receiving a lot of errors from PhantomJS.
 
Solution #2: number of PhantomJS instances should be set according to ‘computing power’. Notice that most of the time PhantomJS won’t consume much CPU but there are websites for which this isn’t the case, you should take this into consideration when you decide how many PhantomJS processes you would like to run.
 
Problem #3: sometimes the screen capture fails without any apparent reason.
 
Solution #3: we were able to increase the screen capture success rate by using a retry mechanism.

Why you should use reCAPTCHA in public websites?

 
The problem:
 
We’ve developed an API which allows users to search and upload Images.
Any application that wants to query it uses an API key which allows it to perform different actions according to its permissions. 

Recently we started to expose some of the API’s abilities in public websites.
For example, see the PicScout Search Tool on www.picscout.com(and press the “Launch Tool” button).
Here’s the issue: Exposing the key to unknown users can make us vulnerable to spam and abuse.

 
The solution:
 

In order to overcome this problem we decided to use Google reCaptcha.

Using this tool means that only real people can pass through the system, as opposed to malicious bots.                         
Reaching this solution included client and server side adaptations. On the client side, we added support to the reCaptcha widget. This widget is shown to the users before their first action in the site and afterwards only if their token has expired. On the server side, we added a second layer of authentication. This authentication is enforced only on API keys that are public, meaning those used on public sites. When making a request, the users must send an API key as well as a token supplied to them by Google reCaptchaThe server verifies this token combined with some secret agreed between the server and Google. If this information is successfully verified, the resource is returned to the user. Otherwise, the request fails. 
 
That’s about it on how we use  reCAPTCHA at PicScout

Riddle me this

We recently published some code riddles,
It was really fun writing them and we had a lot of good responses from Software developers who enjoyed solving them.
The solvers of the riddles got a nice T-Shirt:
 
 
For those of you who enjoy riddles here they are:
 
  1. Follow the bread crumb trail from here:
  2. Go to http://riddle.guru 

Enjoy 🙂

 

Managing dependent jobs in Jenkins

The problem
 
It is well known that Jenkins can handle job dependencies based on maven dependencies.

But, how can we manage dependencies between Jenkins jobs that are based on .Net code?
Of course, we can manage job dependencies manually, marking in each job what are its dependent jobs, this takes a lot of time to maintain and is also error prone.

We looked into some options (there weren’t many) and couldn’t find anything that quite fitted our needs.
This is because here in Picscout, each of our Jenkins jobs is mapped to a single solution.
It can run MSBuild , or unit tests on each relevant project in the solution.
If, for example, we take NDepend powerful API and try to adjust it to our needs, we can use it to know what assemblies are referenced by our solution’s projects.
But what are the solutions that hold those assemblies? and by what order should we build them? this is for left for us to implement.
Surely there must be a better alternative.
 
The solution
 
We decided to write something of our own.
We wanted a tool that will integrate well with Jenkins.
While Jenkins can run any type of script or executable, if you run your code within Jenkins itself (using groovy system script) you will be able to use its internal API, to do things like start new jobs, wait for them to complete and analyze their build result quite effortlessly.
This however means you have to write your code on a JVM based language-
That is why we opted to write our tool in Java.
 
The dependency tool we wrote can map solution and project files in given folders and build a dependency chain for each project and solution.
Given a project, it can tell you what projects (by order) will be affected as a result of a change in this project and same is true for solutions.
Now, all that is left to do is to integrate it with Jenkins and Git.
 
Git/Jenkins Integration
 
The process we built is made of a few steps:
1. We use git-hooks to identify changed folders, we than find what solutions are in those folders.
2. We send the list of changed solutions to a special job we wrote in Jenkins.
3. The job runs a small Groovy system script , this script finds the dependent jobs using the dependency tool, runs them and and decides if the git commit can be accepted to our master branch (if all the jobs passed).
 
finally, the dependency tool can also be integrated with Jenkins dependency grpah view plugin to draw the dependencies: 
 

(we are still working on this one)

Source/Deployment
 
The dependency tool can be found here along with some example usage projects and groovy scripts.
The tool is also deployed in Maven repo.
Please notice it is licensed under the MIT license.

PicScout is Hiring SW engineers apply now

Hi all,

PicScout is looking for top notch SW engineers that would like to join an extremely innovative team.
If you consider yourself as one that can fit the team, please solve the below quiz and if your solution is well written – expect a phone call from us. You can choose any programming language you like to write the code.

Don’t forget to also attach your CV along with the solution.
Candidates that will finally be hired will win a 3K NIS prize.

So here it is:
We want to build a tic–tac–toe server.
The server will enable two users to log in. It will create a match between them and will let them play until the match is over.

The clients can run on any UI framework or even as console applications (we want to put our main focus on the server for this quiz). Extra credit is given for: good design, test coverage, clean code.

SW engineers

Answers can be sent to: omer.schliefer@picscout.com

Running UI tests in parallel mode

At Picscout, we use automated testing.
Running tests is an integral part of our Continuous Integration and Continuous Deployment workflow.
It allows us to implement new features quickly, since we are always confident that the product still works as we expect.
The automation tool that we are using is Selenium (Selenium is an open source set of tools for the automatic running of browser-based applications).
Despite the many benefits of Selenium, there is one drawback that constitutes a bottleneck: the running time of the tests.
For example, in one of our projects, we need to run 120 UI tests, which takes 75 minutes – a long time to wait for a deployment.
To handle this situation, we developed a tool that runs the tests in parallel mode.
Using this tool, we were able to reduce the run time to 12 minutes of tests.

How it works:
Two tests can be run in parallel mode as long as they are not affected by each other.
We avoid writing tests that use and modify the same data, since they cannot be run in parallel.
The assumption is that two tests can be run in parallel mode if each test has a different id 

UI tests
When writing a new Selenium test, we add a “Test Case” attribute to the test.
The attribute will be the id of the test. 
 
The tool reads the Selenium project dll file,
Separates the tests into different queues according to their test cases and each queue runs in parallel mode.
 
13:43:51 Parallel Nunit Console
13:43:52
13:43:52 – Number of parallel queues: 6, number of tests: 29
13:43:52
13:43:52 – Number of serial queues: 1, number of tests: 3
 
The tool runs 10-15 threads of tests, instead of one thread in serial mode (one by one).
There are two more options in the tool.
The first option is “update local DB.”
This creates or updates the local DB (a minimized DB, just for running UI tests).
The second option is for the UI tool to run the tests in parallel mode.
These two features allow us to run the tests on a developer’s station before the code is pushed, and on our build machine after the code is pushed.
 


That’s how we run UI tests these days.