Infrastructure Implications of Tweeting
Posted on: March 19, 2009
The Twitter phenomenon, or micro-blogging, has been quite intriguing. Though not yet a regular tweeter myself, I am told that the “aha” moment will come when I start using it actively. So I started tweeting this week on Twitter and Facebook.
As I was warming up, a new tweet popped up in my mind. What are the infrastructure implications of tweeting, in terms of HTTP connection rate, rate of new storage required, etc. I quickly looked up Twitter stats on tweetstats.com – nearly 2 million tweets per day. What if most of the world starts tweeting using smart phones (very much like SMS today)? To get a better sense of the infrastructure needed for this human urge to tweet, I did some quick back of the envelope calculation.
Assumptions:
Average Tweet Size: 100 bytes
# of Tweets: 10 per tweeter per day
# of Tweeters: 1 billion worldwide (think big!)
Infrastructure Requirements:
Tweet Rate: 10 billion tweets per day
Tweet Storage: 100 Gigabytes per day (with 10:1 compression)
Each tweet is essentially an HTTP transaction (request and response). The tweet rate of 10B/day translates to ~115K HTTP transactions/sec for tweets uniformly distributed throughout the day. Assuming that the compute infrastructure (aggregate of web, application, database servers) can process 1000 transactions/sec/server, about 115 servers are needed. If a peak to average ratio of 3:1 is assumed, then about 350 servers are needed.
Storage needs appear to be quite manageable also – 100GB/day means ~37TB/year, which is no sweat in the petabyte world we live in today.
Net-net, setting up a tweeting service does not seem to need an onerous compute/storage infrastructure (even if people double or triple their daily tweetings). Any techie tweeters out there who can validate/correct the above?
An interesting extension of this would be to estimate capacity of handling all new thoughts of every human being on this planet!!!
PG.
7 Responses to "Infrastructure Implications of Tweeting"
10 tweets per day might be a lower number if you take into account some of the one-to-one tweets which is more like SMS/chat. Eventually if twitter becomes one of the prime mediums of communication (partially replacing SMS, scraps in social networking, passing comments, short discussions, idea sharing etc) the tweets could grow exponentially.
Also on the storage front, I guess twitter stores meta data associated with each tweet. And the meta data depends on how much twitter is interested in. If you account for geo-location information and the like it would be much more per tweet. Looking at the way twitter is trying to monetize on its data, it might further process the tweets and generate more valuable information out of them which might need more storage as well.
Look at this
As you say, it will take a lot of time before we feel a pinch on the infrastructure limitations. There might also be scope for optimizing the efficiency of data storage and retrieval which might buy even more time.
[...] informative post on YES Cloud outlines the infrastructure implications of tweeting using Twitter. Using [...]
FANTASTIC!
Interesting article. Keep it up.
1 | Infrastructure Implications of Tweeting
March 19, 2009 at 3:40 am
[...] Original post by Prashant Gandhi [...]