This website uses cookies on its adverts and sponsored links. By clicking the "Accept" button you are consenting to their use.

Read more

Accept


MobileTechTracker
≡ sections
Home

Apps

TrackerBlog

GeekHub

Tech News

About

Welcome to Mobile Tech Tracker. Our mission is to help technically-minded people to become better versions of themselves and to help ordinary people to use modern smart technologies to their own advantage.


How big data web applications are built

On the most fundamental level, a web application consists of client-side code and markup, server-side code and a database. Many small websites and small-to-medium enterprise-level browser-based applications consist of nothing other than this components. However, on its own, this basic setup is not suitable for a big data application.

Any web application that is intended to manage large quantities of data would need the following components:

High availability

It must be available at any time, regardless of how many users are using it simultaneously. This is why it is absolutely crucial that a big data application doesn't have any single points of failure. Therefore, it would not be possible to achieve high availability if either all of the code or the entire database exists on a single server as a single instance.

High performance

When you start typing a new email in Gmail, it instantly saves a draft on the server and constantly updates it while you are making further changes. When you write a post on Facebook, all of your friends can see it instantly. This level of performance is achieved despite the fact that both of the apps have millions of user interactions at any given time.

To achieve this level of performance, a web application shouldn't have any bottlenecks. Therefore, once again, having a single instance of either code base or the database is not suitable. Also, there is a limit to how much hardware you can carry on adding to a single server to increase the performance. As the usage of an app grows linearly, the response time grows logarithmically. Therefore, at some point, adding more hardware will result in diminishing returns.

Data analytics

As an organisation processes large quantities of user-generated data, it need to take full advantage of it, both to help the organisation to make effective business decisions and to deliver the most relevant content to the users of its products.


These are some of the best techniques that developers of big data applications use to achieve the goals of high availability, high performance and analytics:


Load balancing

Instead of having a single server to execute the logic within a web application, the same code may be duplicated on several servers. When a request from a user's browser is received, the system sends it to a server that is not overloaded with other requests. As the load is distributed between many servers, the system never gets into a state where response time increases to unacceptable level.

There are many ways to implement load balancing. Microsoft Azure, for example, has a service that constantly sends requests to each server within a cluster to check if the server would be able to accept an incoming request from a user. The exact parameters of a good server state can be configured. For example, a server may be set to not respond with an OK status code if its current CPU usage is above 90â„…. If a particular server does not respond with OK status code within an acceptable time frame, the system may be set to try to resolve any issues with the server, but the immediate action would be to stop sending any users' requests to that server and chose a different one instead.

Load balancing provides both performance and availability. If implemented properly, none of the servers within a load-balanced system are allowed to process more than they can handle without significantly slowing down. Likewise, if any server becomes unavailable, the system would still be able to cope with remaining servers.

Databases can also be load-balanced, but this is mainly done for availability rather than performance. Microsoft SQL Server AlwaysON Availability Group, for example, allows a .NET application with Entity Framework to implement an execution strategy, which, for example, can be set to resubmit a SQL command to the database cluster if the server that the application was previously connected to suddenly became unavailable during the initial execution of the command.


Database sharding

If a system has a very large number of users that generate a very large amount of content on constant basis, you cannot just use one huge database. Writing data to a disk and reading data from it are relatively computationally expensive operations. So is performing a scan on a huge database table. Therefore, even if you had a very powerful server, your system would grind to a halt due to the sheer number of simultaneous reads, writes and table scans.

Load balancing provides only a partial solution to this problem. Although it would reduce the number of simultaneous reads and writes, it will not eliminate the need to scan large tables or indexes. Therefore, as more and more data will be generated, the performance will still eventually decrease to unacceptable levels.

One of the best solutions for this is database sharding, which is also known as "nothing shared" architecture. This is a technique of having several databases with the same schema, but different data rows. For example, if an application has millions of users, the users will be split into subsets, each of which will be stored in a separate database on its own server, while each of the databases will be a part of the same cluster and have the same data structure. Just like a database table may have a primary key, each database in the cluster would have its own primary key to enable the system to quickly identify which cluater node needs to be queried when a particular request is submitted. For example, users my be grouped by geographic locations, so location code would form a unique identifier of a cluster. This model is known as KKV or Key Key Value and is widely implemented by database systems that have been specifically designed for sharding, such as Cassandra.

Although this architecture is referred to as "nothing shared", this doesn't accurately describe a sharded database structure. To increase performance of the queries, there would be some degree of duplication, which mainly relates to lookup tables and reference data records.

Of course, sharding doesn't eliminate the need for load balancing, as it improves the performance but has virtually no effect on the availability. Therefore software architects would need to ensure that every node within the cluster is sufficiently replicated.


Making sense of the data

Any organisation that manages big data would be aiming to take full advantage of it. Therefore, the process of big data analytics would be required to uncover hidden patterns of the user's' behaviour. This is what makes you see personalised content on various websites and ads that are relevant to what you have been searching for on the web. On a larges scale, big data analytics is what drives business decisions for the companies that manage the data.

The field of big data analysis is vast. Some of it is automated via various business intelligence tools. HBase for example, a NoSQL database that forms a part of Apache Hadoop database clustering solution, comes with its own set of analytic tools. So do more traditional relational database systems such as SQL Server and Oracle. However, as well as using these tools, many organisations develop bespoke in-house solutions with the help of specially qualified data scientists.


Further resources

This article was only a brief introduction to how big data applications are build. If you are interested to find out more, you will find the following links useful:

Facebook: Science and Social Graph (Video)

Scale at Facebook

Facebook Chat Architecture

YouTube Architecture

The challenge of scaling microservices



Written by

Posted on 25 Jan 2017

Comments (0)

Author's Name *

Email *

4 + 5 *

Comment

*


More from GeekHub


Is jQuery a library or a framework? The answer may surprise you.

Is jQuery a library or a framework? The answer may surprise you.


Where to start if you want to become a hacker

Where to start if you want to become a hacker


What to study to become a mobile app developer

What to study to become a mobile app developer


6 creative ways for a software developer to earn money on the side

6 creative ways for a software developer to earn money on the side


Software design patterns explained

Software design patterns explained


What to study to become a web developer

What to study to become a web developer


Why is there a limit on SD card capacity on mobile devices

Why is there a limit on SD card capacity on mobile devices


How traffic predictions and live traffic work

How traffic predictions and live traffic work


What is PAC and how mobile numbers are ported

What is PAC and how mobile numbers are ported


3 reasons to use strings.xml in Android application

3 reasons to use strings.xml in Android application


Share this:

Facebook Google LinkedIn Twitter

Mobile Tech Tracker - Blog Directory OnToplist.com

Software
Blogtoplist

Top Blogs

BlogrollCenter.Com

Blog Directory



More from GeekHub











Privacy Policy

© MobileTechTracker. All rights reserved. Unauthorised copying of any of this website's content is prohibited under international law.

For any queries, comments or suggestions, please write to info@mobiletechtracker.co.uk.