Infrastructure Implications of Tweeting
Posted March 19, 2009on:
The Twitter phenomenon, or micro-blogging, has been quite intriguing. Though not yet a regular tweeter myself, I am told that the “aha” moment will come when I start using it actively. So I started tweeting this week on Twitter and Facebook.
As I was warming up, a new tweet popped up in my mind. What are the infrastructure implications of tweeting, in terms of HTTP connection rate, rate of new storage required, etc. I quickly looked up Twitter stats on tweetstats.com – nearly 2 million tweets per day. What if most of the world starts tweeting using smart phones (very much like SMS today)? To get a better sense of the infrastructure needed for this human urge to tweet, I did some quick back of the envelope calculation.
Average Tweet Size: 100 bytes
# of Tweets: 10 per tweeter per day
# of Tweeters: 1 billion worldwide (think big!)
Tweet Rate: 10 billion tweets per day
Tweet Storage: 100 Gigabytes per day (with 10:1 compression)
Each tweet is essentially an HTTP transaction (request and response). The tweet rate of 10B/day translates to ~115K HTTP transactions/sec for tweets uniformly distributed throughout the day. Assuming that the compute infrastructure (aggregate of web, application, database servers) can process 1000 transactions/sec/server, about 115 servers are needed. If a peak to average ratio of 3:1 is assumed, then about 350 servers are needed.
Storage needs appear to be quite manageable also – 100GB/day means ~37TB/year, which is no sweat in the petabyte world we live in today.
Net-net, setting up a tweeting service does not seem to need an onerous compute/storage infrastructure (even if people double or triple their daily tweetings). Any techie tweeters out there who can validate/correct the above?
An interesting extension of this would be to estimate capacity of handling all new thoughts of every human being on this planet!!!