ParseHub Tutorial: Scraping 2 eCommerce Websites in 1 Project

ParseHub
6 Jul 201808:43
EducationalLearning
32 Likes 10 Comments

TLDRThis tutorial showcases a method to scrape data from one website and use it on another. It guides users through creating a project in Parsa, selecting and extracting product names and prices from Amazon, and then utilizing this data for searches on eBay. The process involves using various Parsa commands like 'select', 'relative select', 'loop', and 'go to template', and emphasizes testing the project to ensure proper functioning. The tutorial is a practical guide for those interested in web scraping and data extraction across different e-commerce platforms.

Takeaways
  • πŸ“‚ Start by opening the Parsa client and creating a new project with the URL of the website to be scraped.
  • 🎯 Use the interactive view within the Parsa client to inspect the website and identify elements for data extraction.
  • πŸ“‹ The Parsa client interface is divided into three areas: project structure and settings, interactive website view, and data preview in CSV or JSON formats.
  • πŸš€ Begin the project with an empty selection command, which should be placed in the command structure by default.
  • πŸ” Utilize the Select command to identify and select elements like product names on the website.
  • πŸ”„ The relative Select command is used to relate data from one element to another, such as associating a product with its price.
  • 🌐 Switch between different website projects by using the Go to Template command and creating new templates for different HTML structures.
  • πŸ” Use the Loop command to iterate through a list of items, such as product names from Amazon, to perform actions like searching on eBay.
  • πŸ”— The Browse mode allows you to simulate browsing the website and testing the data extraction process.
  • 🚦 Test runs can be conducted locally on your computer to debug and understand the project's behavior.
  • πŸ“Š After running the project, results can be downloaded in CSV or JSON formats, or integrated with other applications using the provided API.
Q & A
  • What is the main topic of the tutorial?

    -The main topic of the tutorial is demonstrating how to scrape data from one website and use it as input for another website using the Parsa tool.

  • What is the recommended approach for scraping websites with Parsa?

    -Parsa normally recommends building separate projects for each website, but in some cases, it might be necessary to combine two different websites into one project.

  • How does one begin a new project in the Parsa client?

    -To begin a new project, open the Parsa client, click on 'New Project', and enter the URL of the website that you would like to scrape.

  • What are the three areas visible within the Parsa client when a project is loaded?

    -The three areas are: the left side containing the project structure and settings, the middle containing an interactive view of the website, and the bottom section for previewing data in CSV or JSON formats.

  • How can you select and extract product names from a website using Parsa?

    -Using the 'Select' command, click on the title of the first product to select it. Parsa will then highlight similar elements in yellow. To select the rest, click on one of the highlighted products, and Parsa will automatically extract the names and URLs of these products.

  • How can you rename a selection in Parsa?

    -To rename a selection, double-click on the command, and enter the new name for the selection, such as renaming it to 'Amazon Product' for the extracted product names.

  • What is the purpose of the 'Relative Select' command in Parsa?

    -The 'Relative Select' command is used to relate each extracted piece of data, such as a price, to its corresponding product in the results file.

  • How does one switch between different templates in Parsa?

    -To switch templates, click on the 'Go to Template' command, enter the URL of the new template, and create a new template if necessary.

  • What is the 'Loop' command used for in Parsa?

    -The 'Loop' command is used to iterate through a list of items, such as product names from Amazon, and perform actions on each item, like searching for them on eBay.

  • How can you test a project in Parsa?

    -To test a project, click on 'Get Data' at the bottom of the page and then click 'Test Run'. This allows you to run the project locally on your computer and understand its behavior.

  • What are the different ways to run a project in Parsa and view the results?

    -You can use the 'Step In' button to run through the project one step at a time, the 'Play' button to run it slowly, the 'Fast-Forward' button to quickly see the extracted data, or the 'Stop' button to end the test run. Once the project has finished, you can download the results in CSV or JSON formats.

  • How can users get help with their specific projects in Parsa?

    -Users can contact Parsa support at 'hello@parsub.com' for assistance with any questions or issues related to their particular projects.

Outlines
00:00
🌐 Scraping Data from a Website and Using it on Another

This paragraph introduces the process of scraping data from one website and utilizing it on another. It begins by explaining the recommended practice of creating separate projects for different websites but acknowledges the need to combine data from different sources in certain cases. The tutorial then provides a step-by-step guide on how to set up a project in the Parsa client, including starting a new project, selecting the target website (Amazon in this case), and navigating the Parsa interface. It details the use of the Select command to extract product names and URLs, renaming selections for clarity, and the use of the relative Select command to associate prices with their corresponding products. The paragraph also touches on the importance of adjusting selection commands to capture complete data, such as full product prices.

05:05
πŸ” Using Extracted Data for Searching on a Different E-commerce Platform

This paragraph continues the tutorial by explaining how to use the extracted data from Amazon for searching products on eBay, a different e-commerce platform with a distinct HTML structure. It guides through the process of creating a new template for eBay, using a loop command to input the list of Amazon product names as search terms, and the subsequent steps to extract relevant information from eBay's search results. The paragraph also covers the use of input commands, selection of search buttons, and the creation of new entries for eBay products. Additionally, it provides insights into testing the project locally using test runs, highlighting the different modes available for review, and the final steps to retrieve and download the extracted data in desired formats. The tutorial concludes with an offer of assistance for any project-related queries and emphasizes the versatility of the demonstrated technique.

Mindmap
Keywords
πŸ’‘Web Scraping
Web scraping is the process of extracting data from websites. In the context of the video, it refers to the initial step of gathering information from an e-commerce site, such as Amazon, by copying the URL and using the Parsa client to scrape for product names and prices.
πŸ’‘Parsa Client
The Parsa Client is a tool or platform used for web scraping. It allows users to create projects, input website URLs, and extract desired data from those sites. The video tutorial shows how to use the Parsa Client to scrape data from one website and use it on another.
πŸ’‘Project Structure
Project structure refers to the organization of tasks and commands within the Parsa Client. It includes the interactive view of the website, the command structure for data extraction, and the data preview options. The video explains how to navigate and utilize the three main areas of the Parsa Client to build a project for web scraping.
πŸ’‘Select Command
The Select Command is a feature within the Parsa Client that allows users to choose specific elements on a webpage to extract data from. It is used to identify and select the product names, prices, and other relevant information from the e-commerce website.
πŸ’‘Relative Select Command
The Relative Select Command is used to relate or associate one piece of data with another. In the video, it is used to connect each product price with its corresponding product name. This ensures that the extracted data maintains the correct associations when transferred to another website or used for further analysis.
πŸ’‘Template
A template in the context of the video refers to a structured format or set of commands used for web scraping. It is specific to the HTML structure of a particular website. The video shows how to create and use new templates for different websites, such as eBay, which have different HTML structures from the initial website scraped.
πŸ’‘Loop Command
The Loop Command is used to iterate over a list of items or data in the Parsa Client. It allows for the automated processing of each item in the list, such as using the product names from Amazon as search terms on eBay. The Loop Command streamlines the scraping process by repeating actions for each item in the list.
πŸ’‘Search Bar
The search bar is an input field on a website where users can enter keywords or phrases to search for specific content. In the video, the search bar is targeted as the location where the scraped product names from Amazon are to be used as search terms on eBay.
πŸ’‘Data Extraction
Data extraction is the process of collecting and retrieving data from various sources. In the video, it refers to the act of pulling product names and prices from one website and using this data on another for the purpose of comparison or further analysis.
πŸ’‘CSV and JSON Formats
CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) are two common data formats used for storing and exchanging data. The video mentions these formats as options for users to download the extracted data from the Parsa Client, allowing for further use and analysis in various applications.
πŸ’‘Test Run
A test run is a trial execution of a project or process to check its functionality and identify any issues. In the video, the test run allows the user to run their web scraping project locally on their computer, step by step, to ensure it works as intended before running it on the servers.
Highlights

Demonstrating the process of scraping data from one website and using it as input for another.

Recommendation to build separate projects for each website, but acknowledging exceptions.

Starting a new project in the Parsa client by entering the URL of the website to scrape.

Using the interactive view within the Parsa client to preview data in CSV or JSON formats.

The automatic placement of an empty selection command in the command structure.

Selecting and extracting product names from the first website using the Select command.

Renaming selections for clarity and better organization of the project.

Utilizing the relative select command to relate product prices to their corresponding products.

Adjusting the selection to capture the entire product price using zoom out functionality.

Creating a new template for eBay due to its different HTML structure compared to Amazon.

Using a loop command to iterate through the list of Amazon product names as search terms on eBay.

Extracting and organizing eBay product information based on the Amazon product names.

Selecting and extracting product names and prices on eBay following the same steps as on Amazon.

Testing the project using test runs to understand the project's behavior and functionality.

Running the project on the server and downloading the results in CSV or JSON formats.

Providing an API for integrating the scraped data with other applications.

Offering support for any project-related questions through contact with Parsa.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: