Fred Williams |
The Commish Age: 63 160 Posts |
Posted Sep 09, 2019 at 11:51 AM As many of you are aware, the site was either very slow or down between 2:30pm and 3pm Sunday, then up&down again from 5:30pm through 7:30pm. Several other large sites similar to ours suffered the same fate. These timeframes are of course the most busy of the week as players log in to check standings after a round of games complete. We mitigated this last year by boosting our server plan. But it obviously wasn't enough this year to accomodate the roughly 30% growth we saw in players.
While we have several ideas of how to improve during such peak volume, we are still working with our host company for more details and a mitigation plan. I'll provide more info in this thread as I learn more. Thanks for your understanding.
Fred
"I want to die peacefully in my sleep like my grandfather, not screaming in terror like his passengers!" |
Bryan Dyer |
|
Posted Sep 10, 2019 at 1:07 PM I DID notice and was unable to make my picks due to it... |
Matt Szeto |
|
Posted Sep 10, 2019 at 1:45 PM
Bryan Dyer wrote:
I DID notice and was unable to make my picks due to it...
While this outage was certainly our problem, you should always have your manager email handy so that you can send them the picks to submit later.
|
Bill Reid |
Age: 61 1 Posts |
Posted Sep 10, 2019 at 2:05 PM just to be clear, picks need to be in by thursday by 8pm correct? or no? |
jay K |
|
Posted Sep 10, 2019 at 2:55 PM 9/10/2019- I have activated my league Delcom Finest for the football season and it is on week 21 and did not reset to the current week.
My people are trying to pick teams and there is not any loaded . Please help..
Thanks
|
Don McKinnon |
Age: 55 1 Posts |
Posted Sep 11, 2019 at 1:35 PM
Bill Reid wrote:
just to be clear, picks need to be in by thursday by 8pm correct? or no?
Depends on your league settings. I have mine set up so that picks are due before each game's kickoff. |
Fred Williams |
The Commish Age: 63 160 Posts |
Posted Sep 14, 2019 at 2:24 PM Here is the lastest on the website slowdown/outage last Sunday Sept 7. While we initially suspected something with our server host company, we should have instead noticed the thumb pointing back at us. :) Last year we experienced a similar slowdown and by week 3 had upgraded to a more powerful cloud server, and at least for last year solved the problem. This year however was a different story, as the 35% growth in players was enough to put us over the top again. But we are no longer convinced that it was due to lack of proper server resources. To make a long story short, today we uploaded to the site what should be significant software improvements that we expect will prevent a slowdown this coming Sunday. While we can't gurantee it won't grind to a crawl again, we know the changes we made are very significant in addressing the problem. If you want details feel free to read on.
Details/Improvements
Before our changes, each time a game completed and a player visited the Standings page, it would rebuild to get the Standings up-to-date, then future accesses would skip the update until the next score. However, during high traffic the number of players and league Standings pages being accessed made this rebuild step problematic. While the logic would hold off other database accesses for that league, other leagues would still kick off thier own processing, and thus created an enviroment of lots of web server sessions all hitting the database at the same time. Multi-league access itself was throttled, but this wasn't enough to mitigate the heavy traffic. We made changes that should drastically improve this. First, we moved all our league standings processing into several background tasks. Second, all Pickem/Confidence Weekly and Season Standings tables and Survivor Standings tables are now cached in memory. Thus, when a player in a league is the first to hit the Standings page, the page will load what is already in the database (it will no longer process the standings, leaving it instead up to the background task) and will then cache the data so that the next player that hits the page does not cause a database roundtrip. The background processing that updates the standings is triggered on a 10 minute cadence, and our measurements show it can take about 10 minutes for the background tasks to complete, so we expect standings to reflect the lastest scores somewhere between 0 and 20 minutes after a game completes. If its during the 2:15 to 3pm timeframe on Sunday where there is super heavy traffic, the time from score to Standings updated could obviously strecth some, but given the big savings in database hits we don't expect it to be much.
We'll of course be monitoring things this Sunday and gather data as we want to continue to fine tune the logic. We also do not yet have caching on the pickem & conficence Player Picks tab on the Standings page and will assess that impact.
Thanks everyone for your patience and understanding.
Fred
"I want to die peacefully in my sleep like my grandfather, not screaming in terror like his passengers!" |
Fred Williams |
The Commish Age: 63 160 Posts |
Posted Sep 15, 2019 at 11:09 PM Well, mea culpa that our database access improvements were not enough to prevent the site outages today between 2:30pm to 3:15pm, and again from 5:30 pm through 6:30pm. While the internal data did in fact show a significant improvement in database accesses, this wasn't seen by you the player or manager because the CPU availablility crashed to 1% or less during these timeframes! We had a misunderstanding with our host company and were led to believe we had plenty of CPU availability. We are now planning to have the host company bump us up to the next level to get more CPU availability at the server. This is what ultimately "solved" the problem last year. This is a seemless operation on the host company and should not cause any downtime.
Some of you have asked if we have considered switching to a different hosting server. Given what has occurred the past two Sundays we definitely plan to do so, but it will probably be in the offseason given that such a switch during the NFL season wouldn't be ideal (it can result in up to a day of downtime).
Fred
"I want to die peacefully in my sleep like my grandfather, not screaming in terror like his passengers!" |
Fred Williams |
The Commish Age: 63 160 Posts |
Posted Sep 19, 2019 at 5:34 PM Tonight at 11pm MST we will be switching to a new server that will be faster with greater flexibility. This process can take anywhere from 1 to 24 hours. If you visit the site after 11pm MST, you will be at the OLD server if the site is in Maintenance mode. There will be a message to check back periodically. When the maintenace mode message is no longer seen, then you are on the new server!
We can't guarantee the slowdown/outage that has occurred at 2:30-3 and 5:30 to 6 on Sunday will go away with this change, as we will still need to determine how much CPU and memory we will need to accomodate that time period. We'll be monitoring things on Sunday and make adjustments on the fly if necessary.
Thanks again for your patience and understanding.
Fred
"I want to die peacefully in my sleep like my grandfather, not screaming in terror like his passengers!" |
Fred Williams |
The Commish Age: 63 160 Posts |
Posted Sep 20, 2019 at 7:40 AM We completed our migration to a new server! Not only is the server much more powerful, we also installed MS SQL 2016 which is a significant improvement over its predecessor. We'll also now be better equipped to handle the huge traffic spikes that occur after the early and afternoon games complete.
Fred
"I want to die peacefully in my sleep like my grandfather, not screaming in terror like his passengers!" |
Fred Williams |
The Commish Age: 63 160 Posts |
Posted Sep 23, 2019 at 10:04 AM Apologies for yet another tough Sunday with the server. For those new to this thread, the problem stems from everyone and their brother logging in after the morning games end, then after the evening games. We've just bumped the current server to 3 additional CPUs, and contunue to improve places in the code to make database accesses more efficient. The problem is that when the problem occurs, we can't do a quick fix, it instead requires us to take logs to make sure we understand where the problem is and where to do the improvements. The old server we were on last week was limited in the number of CPUs (we were already at their max) and the ability to monitor logs; however the new server we just migrated to provides us much more upward movement to meet the demands on the database, and also better access to the logs that will hopefully help us pin down the issue once and for all. I can't promise that next Sunday will finally put us over the hump, but we are trying and active in coming to a resolution.
Addendum: We did not have a baseline for CPU power from the old server because it was abstracted from us, and why it is tough to know how much is needed to accomodate the 30% traffic growth over last year. We are also aware that CPU may not be the only villian here, as the slowdown occurs to pages that heavily rely on the database. One such area that we found in the logs that was causing more resource use was the new login security (MS Identity) we upgraded to in the offseason. This and other areas of the code are getting close scrutiny in our efforts to improve database efficiency, in addition to the step we already took last night to add 3 CPUs and additional memory.
Fred
"I want to die peacefully in my sleep like my grandfather, not screaming in terror like his passengers!" |
Fred Williams |
The Commish Age: 63 160 Posts |
Posted Sep 27, 2019 at 3:17 PM Today (Friday 9/27) we added an additional 2 CPUs and 4GB of memory, which is now about triple the number of CPUs we started with week 1. More importantly, we discovered from the database logs on Sunday several queries that pointed to some significant inefficiencies with database usage especially when accessing the home page, and we have cleaned up that code.
So to date we have tripled the number of CPUs and memory, upgraded to a private database server with a more recent version, and applied code improvements by caching the standings pages and significantly reducing the database accesses when logging in and navigating between pages.
We will continue to minotor the heavy Sunday traffic closely that occurs as the afternoon games complete, and will continue looking for more ways to keep things running smoothly.
"I want to die peacefully in my sleep like my grandfather, not screaming in terror like his passengers!" |
Fred Williams |
The Commish Age: 63 160 Posts |
Posted Sep 30, 2019 at 8:10 AM Yesterday we saw significant improvements in site responsiveness during the peak times. We did have one sluggish period between 6:05-6:45pm MST. From the logs we know where to address another area that when optimized should make a significant difference. We're getting close.
"I want to die peacefully in my sleep like my grandfather, not screaming in terror like his passengers!" |
Fred Williams |
The Commish Age: 63 160 Posts |
Posted Oct 07, 2019 at 8:00 AM Our last change to cache another busy database table did the trick and we had no outages this Sunday.
"I want to die peacefully in my sleep like my grandfather, not screaming in terror like his passengers!" |
Neil Martin |
|
Posted Oct 07, 2019 at 8:36 AM Thank you for all of your efforts!! Very appreciated and recognized! |