Flipkart's flagship technology event turns three this year. Taking forward the success of 2013 & 2014, we are building on our core belief that knowledge is meant to be shared - and we have some inspiring talks and insightful discussions to get the hive mind buzzing.
As always, slash n continues to serve as an open platform for technologists to unite.
Intelligence @ Scale
This year's slash n theme is 'Intelligence @ Scale'
True to the theme, you will see speakers and the audience engaging on a number of technology areas. Discussions will draw from conceptual frameworks with insights into how these can be applied to solve real world problems.
Performance of database/Web-service backed applications can be significantly improved by asynchronous submission of queries/requests well ahead of the point where the results are needed so they can be prefeteched. However, manually writing applications to exploit asynchronous query submission is tedious and error-prone. In this talk, we address the issue of automatically transforming a program written assuming synchronous query submission, to one that exploits asynchronous query submission. Our program transformation method is based on data flow analysis and is framed as a set of rules which can handle query executions within loops. We also present a novel approach that, at runtime, can combine multiple asynchronous requests into batches, thereby achieving the benefits of batching in addition to that of asynchronous submission. We have built a tool that implements our transformation techniques on Java programs that use JDBC calls; our tool can be extended to handle Web service calls. We have carried out a detailed experimental study on several real-life applications, which shows the effectiveness of the proposed rewrite techniques, both in terms of their applicability and the performance gains achieved.
Address Classification without using Geolocation Coordinates
Online retail focuses on optimal delivery system of ordered shipments. In the Last Mile context of a Supply Chain, automatic categorization of addresses is an important problem. An automated solution to this problem reduces manual effort significantly. In the absence of geolocation information in terms of latitude and longitude of individual houses and a definitive structure in the addresses, classifying a given address as belonging to a particular locality is a challenging task. In the current work we devised an accurate method that classifies the addresses belonging to a region as belonging to predefined subregions in the background of the above challenges.
12:20pm - 1:00pm
Large Scale Data Migration: Challenges / Solutions
During a recent capacity expansion that involved a data center
migration, Flipkart moved data from disparate sources and re-connected
systems with very little down-time. This includes hundreds of
databases in user facing services, catalog information, business
intelligence data, backup systems, persistent message queues and CDN
This talk introduces the issues involved in an exercise like this and
the solutions employed to ensure a smooth internal and external
A Sneak peek into the Android internals and Exploring Android hooking
Talk starts with a sneak peak into the android internals. We take a look into a sample flow from booting the phone to the point when the application takes control. We take a 360 view of zygote.Then we act as a kernel programmer and contrast kernel hooking with hooking in android and what power it provides to the module writer. Then we take a look into the world of security and sandboxing in android and how it can be exploited. we also explore the most recent addition by google in android to enhance security in android (SE Linux).
1:00pm - 2:00pm
2:00pm - 2:30pm
CAP Theorem: You don't need CP, you don't want AP, and you can't have CA
"CAP Theorem is everywhere: "Consistency, Availability, Partition tolerance - choose any two!" But it is oversimplified and misunderstood more often than not. CAP's consistency isn't what most people think it is; CAP's availability isn't what most people think it is; what does partition-tolerance even mean?
In this talk we'll explore the CAP-theorem and understand what it is really asserting. We'll understand that just calling a system out as CP or AP (or even CA) is pretty pointless, and learn to judge them beyond the simple monikers. We'll also analyse some popular databases of the world (Cassandra, MongoDB, HBase, MySQL etc.) with this framework."
Deep learning is a fairly new technique that has dominated pattern recognition in the past few years, especially Computer Vision and Speech Recognition. In this talk I will explain the basics of Deep Learning in the context of Natural Language Processing (NLP). NLP has a lot of applications, such as prioritising e-mails received by Customer Care, categorising Tweets aired towards an Organisation, measuring impact of Promotions in Social Media, etc. I will depict our foray into Deep Learning with these classes of applications in mind. Specifically, I will describe how we tamed Deep Convolutional Neural Network, most commonly applied to Computer Vision, to help classify (short) texts, attaining near-state-of-the-art results on several SemEval tasks consistently, and a few tasks of importance to Flipkart.
2:30pm - 3:00pm
Scaling Systems using change propagation across data stores and sharding
Systems that need to handle large scale often use a combination of Data Stores to server varied use cases. It is fairly common for products and services to use Relational Databases to store business critical data. These data stores therefore become source of truth for such data. However Relational Databases may not scale well for all kinds of use cases. Use cases include analytics, reporting, search indexing, historical reads, etc that need this data in secondary data stores. Some of these secondary data stores are non-relational and are efficient at handling such use cases. Different databases try to optimize for specific workload patterns and data durability, consistency guarantees. These data stores are not operated in isolation and must share data and updates. There is then, a need, for a system to transfer data from the primary data store to these secondary data stores. In Payments, data from these secondary stores is used to feed more than just business decisions. The data from these secondary data stores feeds into Real-Time use cases like Console, Fraud Detection and Monitoring Systems. We therefore needed a system that can transfer data across data stores in real time.
Aesop is a keen observer of changes that can also relay change events reliably to interested parties. It provides useful infrastructure for building Eventually Consistent data sources and systems. Aesop scales by partitioning the data stream and coordinates across subscription nodes using Zookeeper. It provides at-least-once delivery guarantees and timeline-ordered data updates.
Aesop is used at scale in business critical systems – the multi-tiered payments data store, the user wishlist system, and streaming facts to data analysis platform at Flipkart. Aesop has been used successfully to move millions of data records between MySQL, HBase, Redis, Kafka, and Elasticsearch clusters.
Aesop shares common design approach and technologies with the Facebook Wormhole system
"How smart can a smart proxy be?
Proxy is loosely understood as entity representing some other entity. For the scope of our interaction, we will restrict ourselves to the role of remote proxies i.e. when 2 communicating entities are in different address spaces, more precisely over network. In such a paradigm, how proxy manifests and elevates itself from being a layman term to a necessary sophisticated computing pattern.
What are the different techniques of implementing proxy is it a code running in a client/server process or is it a mid-tier process having dedicated resources?
What are the essential features in modern day proxies?
What are the popularly used proxies, how flipkart is solving it?
And much more.. Lets flock together to explore
"Fraud Paradigms and Techniques
Discussion about building a platform in the e-commerce domain that can host custom fraud detection pipelines for various use cases of buyer and seller fraud. The platform features both batch and realtime processing. Challenges like distributed processing, scalability, fault tolerance, low-latency shall be discussed. The detection pipelines can be powered by Machine Learning or Rule based models. The platform eases building pipelines and on-boarding multiple clients. The system also facilitates domain specific configurations and optimisations, feedback loops to constantly enrich our decision making capabilities."
Imagine the frustration of the user, when they found their perfect wish while browsing, only to realize it later (when they clicked it) that it was out of stock or the price switched or it was not delivered at their location. This happens when the search index doesn’t have the real-time availability, price and seller information. Hence it is a core challenge that an E-Commerce marketplace search engine has to solve. Regular document search index technologies (like Solr/Lucene) have trouble dealing with attributes which are in high constant flux (like availability, price) which are typically seller/listing specific attributes. In this talk, we present the challenges and our solutions for a customized search index for e-commerce addressing these challenges.
3:30pm - 3:45pm
3:45pm - 4:30pm
Plenary: India - at the cusp of technology led transformation
Indian Space Research Organisation forms an Indian success story that indigenously designed, developed and launched remote sensing, communications as well as scientific missions. It reaped benefits from space for the benefit of the country and earned international recognition. The current talk focuses on mission complexities and challenges as well as the systems that went into two missions, viz., Chandrayaan, mission to moon and Mangalyaan, an interplanetary mission to Mars. The missions required unprecedented challenges of project management, design of launch vehicle and spacecraft, developing ground segment which required a 32met diameter antenna, a challenging space segment, complex flight dynamics and mission management. The talk takes us through this journey.
"Progressive web apps are a brand new breed of web applications which give you a fast, reliable, engaging and an app-like experience. In this talk, we do a deep dive into how you can power your web apps with Service Workers and other modern web capabilities to build such experiences and how you can have an effective offline experience as well as engage your users with push notifications.
We also share our journey through architecting Flipkart Lite and explain how we built it to be fast and immersive with a highly performant user experience."
Powering Seller profitability through data insights
The Seller Insights and Recommendations Platform analyzes data-models, generates recommendations and delivers personalized insights. The platform leverages Big Data for the generation, adoption and net effect of the insights, tracks consumption and derives meta-insights to provide visualizations with cause-and-effect.
This talk will showcase the technologies utilized for the Insights Platform and the challenges faced.
Social interaction is an integral part of the shopping experience in real life, and with Ping we have brought this important dimension to the Flipkart mobile app. At its core, Ping requires reliable, ordered, bi-directional messaging between users in real-time and at scale. In this talk, we will discuss the high-level architecture of Ping, some of the key problems the team had to solve, and future directions for Ping
Fast Distance Matrix computation using Contraction Hierarchies
"Many optimization problems in logistics depend on the computation of distance between all pairs of points. The pairs of distances are typically represented in distance matrix. For example to optimize the delivery of 1000 shipments from a delivery hub, 10,00,000 distance pairs need to be computed. To find each single pair 100,000 nodes are explored on a typical road network graph. All these computations take time and memory.
This talk will provide an introduction on why normal ways to compute big distance matrices fail(in time and memory) and how we have solved this problem using Contraction Hierarchies.
Idea of routing using Contraction Hierarchies is seen in Robert Geisberger thesis(2008), and it or it’s variants are used daily by people throughout globe in finding routes and directions on maps, while travelling or taking a cab ride."
1:00pm - 2:00pm
2:00pm - 2:30pm
Spark and friends - a peek into Spark internals and capabilities
Spark is the new poster child for big data analytics but surprisingly little is actually understood about how this system works, and what its capabilities actually are versus what they are claimed to be. This talk will dive into details on various parts of the Spark ecosystem and give a deeper explanation of how Spark works as well as what the current capabilities and limitations are.
Predictable and Prioritized compute on volatile multi source data
Consider the scenario where business comes and asks to run a flash sale to react to competitive sales. Such asks are very time sensitive as they have huge business implications (E.g. “Deal of the hour”, you can’t take half an hour to complete processing). So, how do you compute the construct specified for the sale over the huge volatile catalog and serve it to the users. We present you the solutions and challenges in building such a performant & predictable processing pipeline, the journey of evaluating multiple design paradigms and rational behind functional and non-functional choices.
2:30pm - 3:00pm
Simplifying User Insight generation using PExtract
Providing a personalized online shopping experience for each user leads to better conversions through increased relevance and strengthens customer loyalty. We analyze Terabytes of browse and purchase activity data to infer the user preferences and interests. Some of these are based upon rules, some are complex algorithms and others are based on machine learning models. We will talk about some of the insights being generated, their usage. We will also talk about Pextract a framework extending PMML(Predictive Model Markup Language), built to leap over the shortcomings in productionising machine learning based insights and simplifying the iterative changes in rule based insights.
Log-service (LogSvc) is a managed service that aggregates, archives and indexes logs from all applications/services in Flipkart. This talk explores scaling a portion of the tech-stack that makes LogSvc tick. After a quick introduction to LogSvc architecture, we go deeper into the sub-system that powers log-search using SolrCloud. We share our leanings from scaling this cluster from a sustained aggregate indexing throughput of 40K messages/sec to 2.2M messages/sec(a 50x jump) without increasing the cluster size and we talk about mistakes made and lessons learnt in the exercise.
3:00pm - 3:30pm
Building mobile advertising with an ecommerce flavour for Flipkart
"While building out advertising product for sellers in our marketplace, we faced some major challenges:
The traditional reporting metrics on desktop ads is not relevant on mobile
What is the correct metric on which the pricing model should be based for Product Listing Ads? Impressions, clicks, conversions or something else?
What is the correct conversion attribution for ads shown as the full customer lifecycle is measurable on an ecommerce site
The talk focusses on new measurement metrics and techniques to correctly measure ad performance of Product Listings Ad for sellers. The talk will also briefly touch on open challenges and the way forward."
MySQL is a popular data store for the stability, convenience and familiarity it provides, along with strong guarantees around data durability and transactionality. However, traditionally, we have relied on asynchronous Master-Slave replication (which has a tendency to lag while replicating on a single thread), and any permanent failures on the Master can result in a large recovery time or even data loss. This is where a full multimaster setup comes in, providing strong data durability without sacrificing the guarantees a standalone MySQL server provides, while also scaling linearly for reads, and giving improved performance for writes. This will be a quick talk on one of the possible solutions (Galera), its methodology, use cases and pitfalls.