ParseHub Tutorial: Pagination (no 'next' button)

ParseHub
10 Sept 201905:53
EducationalLearning
32 Likes 10 Comments

TLDRThis tutorial demonstrates how to scrape data from a website without a 'Next' button for pagination. It guides users through setting up a project, selecting elements using the PAR client, and implementing pagination commands. The process includes testing the project locally and checking the functionality of the commands in browse mode. Finally, it covers running the project and downloading the results in CSV or JSON formats, offering an API for data integration.

Takeaways
  • 🔹 Start a new project by entering the URL of the desired website to scrape.
  • 🔹 After starting the project, use the interactive view to select elements of interest on the website.
  • 🔹 Utilize the 'Select' command to choose specific elements, such as product titles.
  • 🔹 Rename commands for clarity, such as labeling a command 'product' for product titles.
  • 🔹 Use the 'Relative Select' command to select related elements, like prices associated with products.
  • 🔹 Add pagination to the project by selecting the current page and using relative commands to navigate to the next page.
  • 🔹 Ensure that the 'Next Page' command is correctly identifying the button to move to the next set of results.
  • 🔹 Test the project's functionality by using the 'Test Run' feature to check if the commands work as intended.
  • 🚀 Monitor the test run step by step or use the play and fast-forward buttons to see the project's progress.
  • đź“Š After confirming the project's accuracy, run it to extract data which can be downloaded in CSV or JSON format.
  • đź”— Learn about the API for integrating the scraped data with other applications.
Q & A
  • What is the main topic of this tutorial?

    -The main topic of this tutorial is how to set up a web scraping project on a page that doesn't have a 'Next' button, using a tool like ParseHub.

  • What is the first step in starting a new project?

    -The first step in starting a new project is to click on the 'New Project' button and enter the URL of the website you wish to scrape.

  • What are the three areas visible within the ParseHub client after loading a page?

    -The three areas visible within the ParseHub client are: the project structure and settings on the left, an interactive view of the website in the middle, and a preview section for data in CSV or JSON formats at the bottom.

  • How can you select elements on the website using the select command?

    -You can use the select command by clicking on the title of the element you want to select, such as a product name. Pressing 'Up' will select the element in green, and you can continue clicking on other elements to select them as well.

  • What is the purpose of the relative select command?

    -The relative select command is used to select elements that are related to or positioned near another element, such as selecting the price corresponding to a product name.

  • How do you add pagination to a project?

    -To add pagination, you add a select command to select the currently active page, then add a relative select command to select the button for the next page in the sequence. You then add a click command for the next page button.

  • What is Browse Mode and what can you do with it?

    -Browse Mode allows you to click through the website as if you're in a regular web browser. It helps you test and verify that your commands are working properly by navigating through the pages of the website.

  • How can you test run a project to ensure it works properly?

    -You can test run a project by clicking 'Get Data' at the bottom of the page and then clicking 'Test Run'. This will run the project locally on your computer, allowing you to step through the project and observe its behavior.

  • What happens when you run a project on ParseHub's servers?

    -When you run a project on ParseHub's servers, the project will execute automatically, and you can check the status of the run in the provided box. Once the project has finished running, you can download your results in CSV or JSON formats.

  • How can you integrate the data extracted with ParseHub into other applications?

    -You can use ParseHub's API to integrate the extracted data with other applications, allowing for further automation and data utilization.

  • What should you do if you have questions about a specific project?

    -If you have questions about a specific project, you can contact ParseHub's support team at hello@parsehub.com for assistance.

Outlines
00:00
🔍 Setting Up Web Scraping Without a 'Next' Button

This paragraph outlines the process of initiating a web scraping project using a platform that allows for scraping websites without a 'Next' button on the results page. It begins by instructing the user to start a new project and enter the URL of the desired website, using the Toys R Us Canada website as an example. The user is then guided through the interface, which is divided into three sections: project structure and settings, an interactive view of the website, and a data preview section. The paragraph details the use of 'Select' commands to identify and select data from the website, such as product names and prices, and introduces the concept of relative selection for associated data. It also explains how to add pagination to the project by selecting the current page and the 'Next Page' button, and emphasizes the importance of testing the pagination commands in 'Browse Mode' to ensure they function correctly. The paragraph concludes with instructions on how to run the project and view the extracted data in various formats.

05:01
🚀 Testing and Running the Web Scraping Project

The second paragraph focuses on the testing and execution of the web scraping project. It begins by explaining how to use the 'Get Data' button to run the project on the platform's servers and monitor its progress. Once the project has completed, the user is informed that they can download the results in either CSV or JSON formats. Additionally, the paragraph mentions the availability of an API for integrating the scraped data with other applications. The tutorial ends with an offer of assistance for users who may have questions about their specific projects, encouraging them to reach out for help.

Mindmap
Keywords
đź’ˇWeb Scraping
Web scraping is the process of extracting data from websites. In the context of the video, it refers to the method used to gather information from the Toys R Us Canada website. The main theme of the video is about setting up a project to scrape data from a webpage, making web scraping the central technique around which the tutorial is built.
đź’ˇURL
URL, or Uniform Resource Locator, is a reference to the address of a webpage. In the video, entering the URL is the first step to specify the website from which data will be scraped. It is fundamental to the process as it directs the scraper to the correct location on the internet.
đź’ˇProject Structure
Project structure refers to the organization and layout of the components within a web scraping project. The video emphasizes the importance of understanding the project structure to navigate and manipulate the data effectively. It is the framework that holds the commands and settings for the scraping process.
đź’ˇInteractive View
Interactive view is a feature that allows users to see and interact with the website they are scraping in real-time. This is crucial for selecting elements on the webpage and testing the scraping commands. The video uses the interactive view to demonstrate how to select and rename commands, and how to test the scraping process.
đź’ˇCSV or JSON
CSV (Comma-Separated Values) and JSON (JavaScript Object Notation) are file formats used to store and exchange data. In the video, these formats are mentioned as the possible output formats for the scraped data. They are essential for users to understand as they determine how the scraped data will be organized and used afterwards.
đź’ˇSelect Command
A select command is an instruction used in web scraping to identify and select specific elements on a webpage. It is a fundamental concept in the video, as it is used to extract the desired data from the website. The video provides a step-by-step guide on how to use and rename select commands.
đź’ˇRelative Select Command
A relative select command is used to select data based on its position or relationship to another element on the webpage. It is an advanced technique shown in the video to pick up the price for each product by selecting the corresponding price relative to the product name.
đź’ˇPagination
Pagination refers to the process of splitting a document or a set of data into multiple pages, especially in the context of web scraping. In the video, pagination is a key concept as it is used to navigate through multiple pages of a website to scrape more data, even when there is no 'Next' button present.
đź’ˇBrowse Mode
Browse mode is a feature that simulates a regular web browser, allowing users to click through the website and interact with it as they would in a normal browsing session. This mode is crucial for testing and verifying that the pagination commands are working correctly.
đź’ˇTest Run
A test run is a trial execution of a web scraping project to ensure that it functions as intended and extracts the correct data. It is an essential step in the video, as it allows users to review the project's behavior and make adjustments before running it on the server.
đź’ˇAPI
API, or Application Programming Interface, is a set of protocols and tools that allows different software applications to communicate with each other. In the context of the video, the API is mentioned as a way for users to integrate their scraped data with other applications, providing flexibility and utility for the data collected.
Highlights

Introduction to a tutorial on web scraping without a 'Next' button.

Starting a new project and entering the URL of the desired website.

Using the PAR sub client to view project structure, website interactive view, and data preview.

Placing an empty 'Select' command for the project layout and using '+' sign to add commands.

Selecting and renaming the first product title using the 'Select' command.

Utilizing the 'Relative Select' command to capture product prices.

Adding pagination to the project by selecting the currently active page.

Creating a 'Relative Select' command for the 'Next Page' button and testing it.

Activating and adjusting the 'Relative Next Page' command for proper functionality.

Using 'Browse Mode' to simulate a regular web browser experience for testing.

Testing the project with 'Get Data' and 'Test Run' to ensure proper operation.

Understanding the step-by-step process of the project through the 'Step In' button.

Observing the project's progress and extracted data using the 'Play' and 'Fast-Forward' buttons.

Pausing and stopping the test run to examine the results.

Downloading the results in CSV or JSON formats after project completion.

Mention of an API for integrating the scraped data with other applications.

Offering support for any project-specific questions through contact information.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: