Data scraping introduction

            We are living in this world where we are in need of lots of information on a daily basis...

Consider yourself in a situation where you want to know about used car dealerships in your location so that you can buy the product from the dealership company and resell it to individual and you are earning money from it

What you need to do for this process---You have to collect the information about the dealerships and record it so that you can use it to buy the products from them

                                       How do you collect information?

You will think collecting information is easy. Yes, the process is easy for collecting info. But what if you want to collect lakhs of information...You cannot google it and enter the data manually for storage..You need the information to be collected automatically rather than manually. Because collecting information manually is a tedious process 

Data scraping-Extracting information from the website

The process of extracting information automatically/programmatically from websites is said to be data extraction. Scraping is used for extracting a large amount of data from websites. Specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for the purpose of analysis

We need to know about two terminologies while learning about data scraping.

1.Scarper bot 
 Code which used to extract the information from websites
2.Data scraper 
 User designing the scraper bot to extract data

Process of scraping:

First, the Data scraper designs Scarper bot which sends an HTTP GET request to a specific website.

When the website responds, the scraper parses the HTML document for a specific pattern of data.

Once the data is extracted, it is converted into a specific format that scraper bot’s needed.

Scraping can be a powerful tool. If it is in the right hand, it automates the gathering of information and using it legally. In the wrong hands, it can lead to theft of intellectual property/information and using it illegally

Even Facebook registered complained about scraping of data from Facebook against two companies in the US...Those companies scraped data from Facebook and used those data to sell marketing intelligence to third parties.

https://www.youtube.com/watch?v=COeMzrSYgvw

                                     How can we prevent from Data scraping?

*Usage of  CAPTCHAs for high-volume requesters

* Limiting the maximum number of requests a particular IP address able to make over a period of time

* Using an advanced bot management solution that can help websites eliminate access for scraper bots almost completely.

So data scraping is a powerful way of business but it depends on who is using it and why they are using it...