August 4, 2013

TFS is huge in China – the work item journey with 20 million records (Part I)

Adam Cogan Scrum, SSW Projects, TFS 10 Comments

Update: Lei Xu has also posted this in Chinese.

Recently Lei Xu and I completed a hair-raising TFS 2012 project. We hit some snags trying to optimize Work Items with 20,000,000 records. Let me tell you the story…

It was completed shortly after arriving home from the MVP Summit in Redmond. It was lucky we were full of information from Brian Harry and his team. This job turned out to be one of the most challenging that I’ve ever done, pushing the performance limits of Team Foundation Server 2012 (these tips apply to TFS 2013 and 2010 as well).

brian-lei — **Figure: Brian Harry and Lei Xu @ MVP Summit**

First some background: the client runs one of the biggest development teams in the world. They have over 20,000 developers and have a lot of experience gathering, analyzing and acting on performance metrics acquired while testing software prior to wide scale deployment. The system we needed to implement and customize had to cope with a massive number of concurrent requests, of course in very timely fashion. We used the TFS Integration Platform and the TFS Object Model to implement most of the functionality required.

Initially we thought TFS should be able to handle the load without too many problems, because Microsoft has been dogfooding TFS in their developer division for a long time, with great results.

However, like every story, things will never run as you expect. Once the coding was done, with all the data access, business logic and interface implementations on top of TFS Object Model, it was time for the 1^st performance tests.

The initial results were disappointing:

Test Case	Target	Initial time (Before tuning) Red if target missed
Create Operation
No concurrency	200 ms	1200 ms
10 concurrency	500 ms	2700 ms
100 concurrency	1500 ms	>3000 ms
Query Operation
No concurrency	200 ms	234 ms
10 concurrency	500 ms	565 ms
100 concurrency	1500 ms	3000 ms

Figure: The initial performance results were disappointing – some being 5-6 times worse than the target

It was powered by a great beefy SQL Server, so even though the TFS collection database had 20 million work items in it, I was shocked.

But also like every story, there is a happy ending, so here is the result after our tuning:

Test Case	Target	After Tuning
Create Operation
No concurrency	200 ms	87 ms
10 concurrency	500 ms	399 ms
100 concurrency	1500 ms	2500 ms (single server) or 1500 ms (NLB)
Query Operation
No concurrency	200 ms	28 ms
10 concurrency	500 ms	32 ms
100 concurrency	1500 ms	200 ms

Figure: The client was happy with the results, we made our target in each case. That said, I think we were pushing TFS limits

More information:

There were many lessons that we learned and many people who helped. Let me summarize the lessons.

Lesson 1: Team Work

I put team work as the top one, as great software development is never a one man job, especially when you are dealing with a complex system. This system has many moving parts, including lots of performance tuning to TFS 2012, SQL Server, Windows Server 2012, IIS 8, the majority of TFS web services and the TFS Object Model. We were lucky enough to have experts for each of these parts and when put together, we achieved our goal.

During this project I got help from guys at Microsoft, my colleagues at SSW and couple of MVPs around the world. These included Brian Harry (Microsoft Technical Fellow, father of TFS), Aaron Hallberg (TFS DevTeam), Tiago Pascoal (ALM MVP), Ramesh Rajagopal (DevDiv from MS Dev Center), Julia Liuson (Manager of TFS DevTeam), Yongming Yi (MS Technical Specialist) … and more.

In short, if you want to do the job right, you need the right people. Having such a great team was essential for the end result.

Lesson 2: Performance testing should be done as early as possible

We used Scrum for this project and we built in unit tests and load tests from the very 1^st sprint. One large impediment we had was the hardware. We didn’t get the right hardware until end of the project.

The results from the initial performance tests were poor. Thankfully this was not a too big of surprise for the client because one benefit of Agile methodology like Scrum is being transparent. This transparency led to understanding from the client.

The other main benefit of implementing performance testing early was that we had enough time to contact helpful people to gain support.

As you see the first 2 lessons are really not technical lessons. My next blog post will cover the technical lessons…

Read part II here.

Cheers,
@AdamCogan

10 Comments

TFS is huge in China - the work item journey with 20 million records (Part II) - Adam Cogan's Blog
August 4, 2013 @ 5:39 AM

[…] Let’s continue the story of a hair raising project. Nothing is small in China! Read Part I here. […]

TFS 在中国（第一部分） | ALM Networks Lei Xu
August 4, 2013 @ 5:43 AM

[…] 更新：Adam Cogan 发布了本篇博客的英文版。 […]

TFS is huge in China - the work item journey with 20 million records (Part II) - Adam Cogan
January 13, 2015 @ 11:00 PM

[…] Let’s continue the story of a hair raising project. Nothing is small in China! Read Part I here. […]

TFS 在中国（第一部分） – ANB.IO
February 3, 2015 @ 7:11 AM

[…] 更新：Adam Cogan 发布了本篇博客的英文版。 […]

TFS is Huge in China – part 1 – ANB.IO/EN
February 3, 2015 @ 7:12 AM

[…] Update: Adam Cogan has also posted this on his blog. […]

TFS is Huge in China – Part 1 | ANB.IO/EN TEST SITE
February 6, 2015 @ 3:57 PM

[…] Update: Adam Cogan has also posted this on his blog. […]

Mooncake festival and how you can Chinafy your app and enter the Chinese market - Adam Cogan
September 25, 2015 @ 8:48 AM

[…] Beijing, China for almost 10 years. Working both on local projects and offshore projects, we have awealth of experience doing some of the largest projects around. SSW Beijing has established strength in technique, local marketing, and communication with […]

TFS is Huge in China – Part 1 | DevOps Hub English Site
October 20, 2015 @ 7:21 AM

[…] Update: Adam Cogan has also posted this on his blog. […]

TFS 在中国（第一部分） | DevOps Hub
October 20, 2015 @ 7:57 AM

[…] 更新：Adam Cogan 发布了本篇博客的英文版。 […]

Julia Liuson: “Microsoft must transform from a company that throws a box with software into the market … into a company that offers pure services” | Experiencing the Cloud
October 25, 2015 @ 9:38 AM

[…] and cross-platform developer tools belong to her as well. See the announcement below. Note that in August 2013 she was the manager of Brian Harry’s TFS (Team Foundation Server) DevTeam. In her …get […]