How to: Build a Large and Scalable ECommerce Platform

[Guest article contributed by Amitabh Misra, Vice President and Head of Engineering at Snapdeal.com. He has spent over a decade in Silicon Valley building and managing large scale technology platforms across multiple sectors. Amitabh holds an engineering degree from IIT Kanpur and is an MBA from the University of California at Berkeley, USA.]

What does it take to build a world class ECommerce technology platform? When starting to build one, many questions come to mind: What architecture to follow? Whether to use an off the shelf ECommerce product or build it yourself? What kind of hardware/equipment and services to buy and/or use? What technology to use: open source, java, php or ruby or something else? And many more such questions. I think a handful of principles and building blocks can put a technology team, which is trying to build one, in good stead. These thoughts are based on my own over-a-decade-long experience in building large scale distributed platforms while working in Silicon Valley.

Do It Yourself

Numerous ecommerce products are available in the market. Some offer end-to-end capabilities; other focus on specific areas such as shopping cart, payments, catalogue management etc. Using off-the-shelf ecommerce products certainly has advantages. For one, these products are fairly evolved having incorporated best practices, processes and lessons from implementations around the world. But more importantly, by using an off-the-shelf product, your site can be up and running very quickly and cheaply with a handful of engineers.

But a cookie cutter is a cookie cutter. It is efficient but cuts only one type. In the context of off-the-shelf products, that means you will need customizations, extensions, plugins etc to meet your frequently changing business needs. As your business scales, developing these custom solutions will be increasingly harder and prohibitively expensive if at all doable. Eventually, customizations will consume more time and money than building on a home grown platform. These issues are even more pronounced in Indian context. Underlying businesses such as payment gateways, logistics, suppliers, shipping etc, on which ECommerce critically depends, are themselves fairly primitive and rapidly changing. Furthermore, at very large scales, it is near impossible to maintain high performance and availability on a platform that you don’t know inside out.

In summary, off-the-shelf ecommerce products allow you to get off the ground easily in the short term. However, in the long term, they put severe limitations on feature development and scaling. So if you think that your platform is going to last a few years, and your development team size will be more than 5 people, you can and should build it yourself. Just hire good engineers.

Use Classic 3 Tier + Service Oriented Architecture

A modern day ECommerce platform is very large with many application components. These applications may include catalogue management, content management, search, personalization and recommendation, order management, payments, review and rating, shipping and warehouse management, pricing and inventory management, fraud and risk management, email and SMS management and many many more. We can also view a platform in terms of technology components: a user interface, a data store, a back end data processing system, which is further split on the lines of real time and offline, continuous and batch processing and so on. Needless to say, that a platform with such diverse needs, if built as a single monolith, would be too complex and hard to build, evolve and scale.

Challenges in building an ECommerce platform are best met using a Service Oriented Architecture aka SOA. A SOA platform is modular, distributed and decentralized; with each application component fully isolated from any other. These components or sub systems talk only through a well-defined API, which is exposed as a service. Note, when I say isolated, I mean not just at code level but also deployment, not just in backend processing but also storage. Even the teams that develop these should be separate, typically 4-8 in size. With a SOA structure in place, each component evolves rapidly and independently. Production problems are isolated and rectified quickly. Scaling challenges are identified and met fast. Problems in one area do not spill over since interfaces and boundaries between components are strictly defined and managed.

Besides SOA, the Classic 3 Tier architecture should also be used. This means that the user interface layer, middle data processing layer and backend database layer should all be separately built and deployed. The primary reason for this is that performance tuning requirements for each layer are very different. The user interface layer is state-full (because it stores user session), gets high number of connections, serves rich content and is an aggregator. The middle data processing layer is usually stateless, has high concurrency, processes logic and crunches numbers. Finally, the backend layer stores and retrieves data in a structured manner. Therefore, 3-Tier architecture allows most optimal tuning. That in turn allows fast and organic horizontal scaling.

Multi-Layered Caching and Storage

ECommerce sites these days are very rich in content. Each web page contains tons of images and scripts besides regular text. Further, to provide for a wide Overview_of_a_three-tier_application_vectorvariety of product offerings, the databases and other storages are huge. Moreover, consumers are in millions and spread out geographically. In such a scenario, bringing all the data from backend storages to consumers’ browsers in a reliable and efficient manner is a mighty challenge. A combination of a few multi-layered and distributed caching technologies can help address this challenge. The key here is that much of the data served is static. Further, much of the dynamic data also is not changing all the time.

First level of caching can be done in the browser itself. The cached data lives on consumer’s desktop till it becomes stale or expires.

Next level of caching is done using Content Delivery Networks, aka CDNs, such as Akamai. CDNs typically serve static content such as images and java scripts. Internet traffic, from the browser to website, makes several hops across multiple ISP’s routers. Each hop typically adds 100s of milli seconds in response times. CDNs reduce hops by partnering with ISPs. Their data servers sit right next to ISP’s edge routers, which are typically the first routers that a consumer’s browser connects to. That way, CDNs cache and serve content to users from nearest possible point. Thus, website responses become faster, internet traffic is reduced and fewer webservers serve lot more consumers.

Next, use reverse proxies, such as apache traffic server or varnish, to serve dynamic page content that doesn’t change real time. Reverse proxies sit right in front of webservers and provide fine grain control over what is cached when. End effect is reduced load on webservers, which now have to build dynamic pages less frequently.

Fourth level of caching can be done in applications processes themselves. Today’s 64 bit processes offer terra bytes of virtual address space. This translates to serious big caches limited only by availability of RAM on the machine. Earlier 32 bit systems typically allowed only up to 1.5GB for application use. A word of caution here: don’t go overboard with heap sizes; the bigger they are, the longer your java process would freeze when executing full GC.

Fifth could be use of nosql databases. Last 5 years have seen evolution of a lot of no sql databases: memcache, mongodb, solr, hazelcast, redis etc. NoSQL DB’s can be used as secondary level caches or even authoritative data sources. What they offer in comparison to a relational database is high performance and scalability at the cost of consistency and structure. Each one of NoSQL DBs has its strength on parameters such as read/write ratio, persistence, search/indexing capabilities etc. Pick and choose one that fits your need.

Finally, you can scale by having multiple replicas of your master database. When a database becomes very large, traffic to the database can be split with all updates directed to the master and while all queries are directed to one or more slave replicas.

Detailed Multi Level Monitoring

Good monitoring is absolutely critical to a top performing ECommerce site. In general, it ensures high availability and fast response times. Furthermore, it allows you to see the impact of changes quickly. This in turn allows rapid deployment of new features. Third, it allows developer to troubleshoot production problems quickly. This is especially important because the really complex problems show up only in production. Finally, it allows capacity planning well in advance.

To build a good monitoring system, monitor each one of the SOA components independently. Unless there is strict control on SLA’s between components, overall SLAs cannot be achieved. Within components, measure everything at all levels and collect as much data as you can. Modern applications have multiple levels of abstraction. Collect detailed data from all levels: network, operating system, jvm, application server, application, database etc. A combination of system monitoring tool such as nagios, application monitoring tool through bytecode instrumentation such as Introscope, New Relic or Hyperic, external site monitoring tool such as site24x7 and application and jvm internal statistics would make a powerful monitoring setup. Finally, build an integrated view of the entire platform with monitoring feeds from individual SOA component.

Be on the Cloud

Building an ECommerce platform on cloud has great advantages. First, it allows the engineering team to focus on feature development instead of worrying about hardware and network infrastructure issues. Second, as business scales, scaling the technology platform on cloud is the easiest. Bringing up new servers or adding more CPU or RAM can happen in minutes. Finally, when it comes to batch processing enormous data for analytics and personalization, clouds on demand processing capability is most cost effective.

Use Java and Open Source

When it comes to deciding which programming languages to use, three are most popular: php, ruby and java. In general, in a distributed service oriented architecture all 3 could co-exist. However, each have their distinct advantages. PHP and Ruby will get you up and running much faster than java. Code maintainability is best in ruby. However, I would choose java. This is because when it comes to scalability, development and diagnostic tools and good engineers who know how to tackle complex performance issues, java is a far more mature platform.

With respect to open source, I think the entire ECommerce platform can successfully be built on it. Linux, MySQL, Apache, Spring, Hibernate and many NoSQL DBs are widely used, have seen over a decade of evolution and are now fairly evolved and stable. Vibrant and active community forums more than make up professional support. Also, many companies provides 24/7 support for most open source software(s).

What are your thoughts?

[Image credit.]

7 Comments

Leave a Reply
  1. When it comes to product development, are CDNs, reverse proxies and fourth level of caching should be in the top priority for a startup where an individual who is solely developing an e-commerce platform?

  2. Thanks Amitabh for amazing words. But it would be more helpful if you could tell technologies at UI level, otherwise its wonderful…

  3. off the topic article .the title says “How to: Build a Large and Scalable ECommerce Platform” and you ™write about pros and cons between using a platform or building it your self and technology stack to choose .total waste of time