What is Big Data and How Facebook is Using Big Data?

Sahithreddy
8 min readSep 17, 2020

What is Big Data?

The term “big data” refers to data that is so large, fast, or complex that it’s difficult or impossible to process using traditional methods. The act of accessing and storing large amounts of information for analytics has been around a long time. But the concept of big data gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the two V’s:

Volume: Organizations collect data from a variety of sources, including business transactions, smart (IoT) devices, industrial equipment, videos, social media, and more. In the past, storing it would have been a problem — but cheaper storage on platforms like data lakes and Hadoop have eased the burden.

Velocity: With the growth in the Internet of Things, data streams into businesses at an unprecedented speed and must be handled in a timely manner. RFID tags, sensors, and smart meters are driving the need to deal with these torrents of data in near-real-time.

Big Data at Facebook

Companies with more than 1,000 employees already had more than 200 terabytes of data of their customer’s lives stored. Consider adding that startling amount of stored data to the rapid growth of data provided to social media platforms since then. There are trillions of tweets, billions of Facebook likes, and other social media sites like Snapchat, Instagram, and Pinterest are only adding to this social media data deluge.

Every day, we feed Facebook’s data beast with mounds of information. Every 60 seconds, 136,000 photos are uploaded, 510,000 comments are posted, and 293,000 status updates are posted. Facebook generates 4 petabytes of data per day — that’s a million gigabytes.This information may not seem to mean very much. But with data like this, Facebook knows who our friends are, what we look like, where we are, what we are doing, our likes, our dislikes, and so much more. Some researchers even say Facebook has enough data to know us better than our therapists!

Apart from Google, Facebook is probably the only company that possesses this high level of detailed customer information. The more users who use Facebook, the more information they amass. Heavily investing in its ability to collect, store, and analyze data, Facebook does not stop there. Apart from analyzing user data, Facebook has other ways of determining user behavior.

  1. Tracking cookies: Facebook tracks its users across the web by using tracking cookies. If a user is logged into Facebook and simultaneously browses other websites, Facebook can track the sites they are visiting.
  2. Facial recognition: One of Facebook’s latest investments has been in facial recognition and image processing capabilities. Facebook can track its users across the internet and other Facebook profiles with image data provided through user sharing.
  3. Tag suggestions: Facebook suggests who to tag in user photos through image processing and facial recognition.
  4. Analyzing the Likes: A recent study conducted showed that it is viable to predict data accurately on a range of personal attributes that are highly sensitive just by analyzing a user’s Facebook Likes. Work conducted by researchers at Cambridge University and Microsoft Research shows how the patterns of Facebook Likes can very accurately predict your sexual orientation, satisfaction with life, intelligence, emotional stability, religion, alcohol use and drug use, relationship status, age, gender, race, and political views — among many others.

Facebook Inc. analytics chief Ken Rudin says, “Big Data is crucial to the company’s very being.” He goes on to say that, “Facebook relies on a massive installation of Hadoop, a highly scalable open-source framework that uses clusters of low-cost servers to solve problems. Facebook even designs its hardware for this purpose. Hadoop is just one of many Big Data technologies employed at Facebook.”

Examples of how Facebook uses BigData:

Example 1:-Friendaversary

Honoring its 10th anniversary, Facebook offered its users the option of viewing and sharing a video that traces the course of their social network activity from the date of registration until the present. Called the “Flashback,” this video is a collection of photos and posts that received the most comments and likes and set to nostalgic background music.

Other videos have been created since then, including those you can view and share in celebrating a “Friendversary,” the anniversary of two people becoming friends on Facebook. You’ll also be able to see a special video on your birthday.

Example 2:- Facial recognition:

Facebook uses a DL application called DeepFace to teach it to recognize people in photos. It says that its most advanced image recognition tool is more successful than humans in recognizing whether two different images are of the same person or not — with DeepFace scoring a 97% success rate compared to humans with 96%.

It’s fair to say that the use of this technology has proven controversial. Privacy campaigners said it went too far as it would allow Facebook — based on a high-resolution photograph of a crowd — to put names to many of the faces which are clearly an obstacle to our freedom to move in public anonymously. EU legislators agreed and persuaded Facebook to remove the functionality from European citizens’ accounts in 2013. Back then the social media giant was using an earlier version of the facial recognition tool which did not use Deep Learning. Facebook has been somewhat quiet about the development of this technology since it first hit headlines, and can be assumed to be waiting on the outcome of pending privacy cases before saying more about their plans to roll it out.

The Downsides

Privacy Concerns With Facebook:

Facebook has always said that the privacy worries these causes are addressed by the fact that all information is shared with our permission and anonymized when sold on for marketing purposes. That hasn’t stopped a lot of critics taking issue with their practices though. For example, many say that the privacy settings are too complex or not clearly explained, meaning it is too easy for people to share things they didn’t mean to. Facebook has tried to fix this several times over the years — often confusing people who had got used to the way things were!

Another feature that caused concern when it was introduced was facial recognition. When you upload a picture, you might see suggestions for people you could tag on it. This is based on analysis of the picture data, which is compared against pictures of people in your Friends list, and prompted an investigation by EU privacy regulators in 2011.

Big Data Analytics At Facebook

More recently, changes to the way its users’ habits are monitored have caused more concerns. Its latest monitoring tools record everything from how long a user “hovers” their cursor over certain parts of the page to what websites they visit outside of Facebook. Last month it announced that this information is being used in their algorithms that determine which adverts to show us.

Facebook’s data strategy is led by its Data Science team — who have their own page, of course. They regularly post updates on insights they have gleaned from analyzing the habits of the millions who browse the site.

For example, the team has developed ways to safely predict the intelligence of users, their political views, even their emotional stability. They are also able to predict weeks before it is actually happening, that a user will change their relationship status from ‘single’ to ‘in a relationship’.

This is all helping Facebook to sell ever-more targeted adverts — which you could argue is not such a bad thing. At least you are seeing adverts for things you are actually interested in. However, the concern that comes to mind is how governments could potentially use this information (especially in places that haven’t got functioning democracies). Could they use Facebook to find the people opposed to their views and even manipulate their moods?

What If Data Stored By Facebook leaked?

Facebook Cambridge Analytica Scandal:

Marketers and many users know that Facebook collects user data via apps and behavior. This data is one of the main reasons that Facebook blows its competition out of the water as far as targeting and useful, trackable digital advertising campaigns go. More than 2 billion monthly active Facebook users create a lot of data. In 2014, a researcher was collecting data through an app called “thisisyourdigitallife,” which prompted users to take personality quizzes and then harvested their data — and the data of their friends. The researcher had told Facebook he would be using the data for academic research.

The app was downloaded by 270,000 users who all agreed to allow the app to collect their data, but because the app also collected the data of their friends, it collected information on what was thought to be more than 50 million people. That number eventually grew to 87 million.

The issue here is that many users didn’t know how their information would be collected and used, and the users who had their data collected secondarily didn’t even know the app existed.

Cambridge Analytica later came into contact with this information through the researcher. Who is Cambridge Analytica? It’s the firm the Trump campaign used to target individuals with tailored content about the election and politics. Cambridge Analytica specifically targeted swing voters with the ads. So now, Facebook is catching fire for an information breach that helped Trump win the election.

--

--