Defiance /

Community

Theme of day 3, don’t do the same thing over and over and expect different results

Posted on:

This is very much how we’re operating with our largest operational issue right now, the XBOX lag. We are constantly trying new approaches and changes to hone in on what is the cause. This isn’t a single bug or one problem. It’s not a programmer making a spelling error somewhere or someone stumbling over cables.

The XBOX cluster is a very complicated but wonderful beast, with very many moving parts. We are fiddling with all these moving parts, we take down the cluster, we reconfigure, add hardware to different functions, do server code updates, analyze network traffic, change the load balancers, change server protocols, move gateways around, poke the sgörneebörní (that denoted the end of my technical prowess).

For short, we have a lot of people analyzing, monitoring, watching constantly so many different types of logs, server metrics, dials and blinking lights (I’m on the blinking lights detail for some reason). We have external professionals as well, from networking specialist to Microsoft swat teams helping us out. That’s why you’ll see a XBOX client patch later tonight, it’s one attempt to change something which will remove the need for one of the load balancers amongst other things. Fewer moving parts, less things that can blow up.

We’re incredibly sorry for this state. This was by no means foreseeable. If this was just Game Server code, it’d be easy to diagnose. It isn’t. This is intra-server protocols between the entire cluster. But we’re on it. We didn’t scale these servers up from 2 people to 10s of thousands of people by luck alone.

But if you’ve been following how we do things from our last blog, here is the overview of today’s focus.

  • Continued XBOX issues – Lag and server crashes. Our server operations team is throwing everything and the sink at it. We even had to stop one of them from throwing a kitchen sink at a server. Yes, we’re frustrated. Don’t ever think otherwise. We even employ “eat your own dog food” and have people play on XBOX instead of other platforms.
  • Disappearing items!? – This has started to crop up. Now, this shouldn’t be confused with a database crash during last night, this is something far more complicated but is connected to the Salvage Matrix but probably not the cause. Rather an underlying error which happens more with the use of the Salvage Matrix. We started seeing this happening more and more and have a team investigating this. We might disable the Salvage Matrix to minimize this problem while we find a solution.
  • Game Server Crashes – We still have too many of these over all platforms. They will result in you getting disconnected, you can usually reconnect but sometimes they don’t crash “gracefully” and stay in a zombie mode. Now, I know we all like zombies. They can be good entertainment but not the best for deep conversations. The same happens to Game Server Zombies. You keep on getting directed to them and get messages that say the service is down etc. and then in a little while you’re fine again.
  • Connectivity and patching – Back up top again. We’ve gotten a lot of data so keep it coming. Some people are even finding they have to “patch” the entire client again. We’ve cleared up a lot of log-in issues, connectivity, we’ve gotten reports of much better patching speeds but we can do more and better. Keep letting us know. It helps us help you. Special note though, if you have a Beta client installed still, uninstall it. This applies to all platforms. There are edge cases we’ve gotten reported which can cause serious problems in installation and patching.

And of course, this list keeps on to change based on what we solve, improve and new issues that may emerge. We adapt. Otherwise we will die. Not being on the list does not mean we’re not working on it. These are highlights and insight.

On that note, giving you more insight into today and some indication of what’s coming up.

  • Huge client patch coming – Client patches are very risky to deploy so we employ a more rigorous testing process before we distribute them. We can very easily revert a server patch. Depending on testing and certification, we should see that around the 15th, hopefully sooner, possibly later. It currently has 8 pages of patch notes, listing fixes and improvements.
  • Good batch of improvements – We’re happy that we’ve been able to keep a good stream of fixes and improvements just through the server updates. Client updates are much difficult but we did manage to address a set of crashes and the 120hz. That’s all ongoing of course so there is more to come, especially in today’s update and tomorrows as well.
  • Your boosts are not in vain – We’re looking at ways to make up to you all these various server problems and launch issues. We know your boosts are running while you aren’t able to play. Not just because you said so but because the few of us that do go home occasionally, try to play from there and come back even more frustrated in the morning. We don’t feel for you. We feel with you.
  • Customer Support – They are really heavily loaded but are crunching through. Remember also when submitting to them your PSN ID/Gamertag so they can act quickly. We’ve been streamlining our setup based on incoming issues, doing systematic fixes and updates when possible but also remember, they are one of our most vital feedback channels. They see everything coming in, parse it and report it so we can react on it.

You’re still here and read all that? I thank you for that. This feels almost like writing a diary now. I should mention that I have been feeling sad lately, then immensely happy, then frustrated but in the end I’m excited and ecstatic that I’m here. Because this is what it feels like launching this little game, an emotional rollercoaster. Maybe your feelings are similar.

I’m now going to run to deploy a client patch to the XBOX and another round of updates to the XBOX cluster as it just crashed. So we’re using the opportunity to deploy a client patch and do one reconfiguration since we had that planned for later anyways. Then it’s back to watch the blinking lights. Hey, why did I get the blinking lights detail? “Are they blinking still?” … “Yes”. Something’s awry.

Ave Arkhunter,

Oveur

Nathan Richardsson

Executive Producer Defiance