ParseHub Tutorial: Directories

ParseHub
3 Aug 201710:15
EducationalLearning
32 Likes 10 Comments

TLDRThis tutorial demonstrates how to use ParseHub, a web scraping tool, to extract data from directory websites. It guides users through creating a new project, inputting search criteria, and selecting data fields to scrape. The process includes teaching ParseHub to navigate through multiple pages of results, extracting relevant information such as business names, addresses, phone numbers, and additional details. The tutorial concludes by showcasing a test run of the project, highlighting the tool's capability to automate the scraping process and preview data in various formats.

Takeaways
  • πŸ” Start by opening your ParseHub client and initiating a new project with the target website URL.
  • πŸ“‹ The ParseHub interface consists of three main sections: commands and settings on the left, an interactive website view in the middle, and data preview on the right.
  • πŸ”Ž Enter the URL of the directory website you wish to scrape and begin a new project on it.
  • πŸ“ Utilize 'Select' commands to identify and input data such as business names, cities, and other relevant information.
  • πŸ”„ Rename commands to descriptive terms for better organization and understanding of the data flow.
  • πŸ”˜ Use 'Click' commands to simulate interactions with the website, such as clicking the search button.
  • πŸ“Š Create new templates for different pages of the website, like 'Results' and 'Details', to extract various data points.
  • πŸ‘Œ Verify that ParseHub correctly identifies and highlights elements of the same type for consistency in data extraction.
  • πŸ“‚ Organize data into structured formats like CSV or JSON for easy analysis and use.
  • πŸ”„ Train ParseHub to navigate through multiple pages of results by using 'Click' commands on 'Next' buttons.
  • πŸš€ Test your project by running a test data extraction, which can be done step-by-step or in full auto-play mode.
Q & A
  • What is the main topic of the video tutorial?

    -The main topic of the video tutorial is how to scrape data from a directory type website using ParseHub.

  • Which website is used as an example in the tutorial?

    -The Yellow Pages website (yellowpages.com) is used as an example in the tutorial.

  • What are the three main sections of the ParseHub tool?

    -The three main sections of the ParseHub tool are the commands and settings on the left-hand side, the interactive view of the website in the middle, and the data preview section in either CSV or JSON format.

  • How does the tutorial guide the user to input the business they are looking for?

    -The tutorial guides the user to input the business they are looking for by using a 'Select' command to type the desired search term, in this case, 'web developers', into the input field on the directory website.

  • What is the purpose of the 'Click' command in ParseHub?

    -The 'Click' command in ParseHub is used to instruct the tool to simulate a mouse click on a specific element, such as a search button, to interact with the website and proceed to the next step of the scraping process.

  • How does the tutorial demonstrate the extraction of data from the results page?

    -The tutorial demonstrates the extraction of data by using 'Select' commands to identify and select the desired information, such as business names, addresses, and phone numbers, and then renaming the selections for clarity.

  • What is a 'Relative Select' command in ParseHub?

    -A 'Relative Select' command in ParseHub allows the user to associate one piece of data with another. For example, it can be used to associate the name of a business with its address or phone number.

  • How does the tutorial handle pagination of results in the scraping process?

    -The tutorial instructs the user to use a 'Select' command on the 'Next' button to navigate through multiple pages of results, and to create a new template for each subsequent page to continue extracting data in the same manner as the first page.

  • What additional information can be extracted from the 'Details' template created in the tutorial?

    -The 'Details' template created in the tutorial can extract additional information such as the number of years the business has been operational and a description of the business.

  • How can users test their ParseHub project after setting it up?

    -Users can test their ParseHub project by clicking on 'Get Data' and choosing to either step through the project manually or run it automatically using the 'Play' button to ensure the scraping process works as intended.

  • Where can users get help if they encounter issues with their ParseHub project?

    -If users encounter issues with their ParseHub project, they can contact the support team at hello@parcel.com for assistance.

Outlines
00:00
πŸš€ Introduction to Web Scraping with ParseHub

This paragraph introduces the process of web scraping using ParseHub, a tool that allows users to extract data from directory type websites. It begins by guiding users to open their ParseHub client and start a new project with a specific URL, in this case, the Yellow Pages (wwlp.com). The paragraph explains the three main sections of the ParseHub tool: commands and settings on the left, the interactive website view in the middle, and the data preview pane on the right. It emphasizes the select mode and the availability of an empty select command as a starting point for the tutorial.

05:01
πŸ” Setting Up the Scraping Process

The second paragraph delves into the specifics of setting up the scraping process using ParseHub. It instructs users on how to input search criteria, such as the business type and location, and how to simulate the clicking of the search button. The paragraph details the process of creating a new template for the results page, selecting and renaming elements to capture relevant data like business names, locations, and phone numbers. It also touches on the use of relative select commands to associate data and the importance of ensuring that ParseHub correctly identifies similar elements on the page.

10:01
πŸ“Š Extracting and Organizing Data

This paragraph focuses on the extraction and organization of data from the results page. It explains how to use ParseHub's select commands to extract additional information such as the number of years a business has been operating and its description. The paragraph also covers the creation of additional templates for further details and the process of automating the scraping of multiple pages of results. The tutorial concludes with instructions on how to test the project and the option to run it step-by-step or in full auto-play mode. It emphasizes the comprehensive nature of the scraping process, capturing various details about businesses from the directory website.

πŸ“ž Support and Conclusion

The final paragraph concludes the tutorial by offering support for any questions or issues users might encounter with their ParseHub projects. It provides contact information for assistance and encourages users to reach out for help. The paragraph wraps up the tutorial, leaving users with a clear understanding of how to scrape data from directory type websites using ParseHub and where to seek help if needed.

Mindmap
Keywords
πŸ’‘Web Scraping
Web scraping is the process of extracting data from websites. In the context of the video, it refers to the method used to gather information from a directory type website, such as the Yellow Pages. The main theme of the video is to demonstrate how to use ParseHub, a web scraping tool, to efficiently scrape data from a website.
πŸ’‘ParseHub
ParseHub is a web scraping tool that allows users to extract data from websites. It is the primary tool discussed in the video, used to create projects, input commands, and extract data in various formats like CSV or JSON. The video provides a step-by-step guide on how to use ParseHub for web scraping.
πŸ’‘Yellow Pages
The Yellow Pages is a directory type website that lists businesses and their contact information. In the video, it serves as the example website from which data will be scraped using ParseHub. The Yellow Pages is used to illustrate the process of searching for and extracting business information.
πŸ’‘Input Command
An input command is a function used in web scraping to simulate user input, such as typing text into a search box or filling out forms. In the video, input commands are used to specify the search criteria, such as the business type and location, which are then used to scrape the relevant data.
πŸ’‘Select Command
A select command is used in web scraping to identify and select specific elements on a webpage, such as input boxes, buttons, or text. In the video, select commands are utilized to target the elements that need to be interacted with or extracted, like the search button or business names.
πŸ’‘Template
In the context of the video, a template is a set of instructions or a project structure created within ParseHub to define the data extraction process. Templates are used to organize and automate the scraping process, allowing for efficient data collection from multiple pages or sections of a website.
πŸ’‘CSV
CSV, or Comma-Separated Values, is a file format used to store and organize data in a tabular form. In the video, CSV is one of the formats that ParseHub can export the scraped data into, allowing users to open and manipulate the data using spreadsheet software like Excel.
πŸ’‘JSON
JSON, or JavaScript Object Notation, is a lightweight data interchange format that is easy for humans to read and write and for machines to parse and generate. In the video, JSON is another format that ParseHub can use to present the scraped data, structuring it in a way that can be easily processed by computer programs.
πŸ’‘Relative Select Command
A relative select command is a feature in ParseHub that allows users to associate data from one element to another based on their position or relationship on the webpage. This helps in extracting related data points, such as associating a business name with its address or phone number.
πŸ’‘Quick Command
A quick command in ParseHub is a shortcut that enables users to perform actions like clicking on a link or button without having to create a full select command. It simplifies the scraping process by allowing users to quickly interact with webpage elements.
πŸ’‘Data Extraction
Data extraction is the process of collecting and retrieving data from various sources. In the video, data extraction refers to the act of gathering business information, such as names, addresses, phone numbers, years in business, and descriptions, from the Yellow Pages website using ParseHub.
Highlights

Introduction to using ParseHub for web scraping

Starting a new project with a directory type website URL

Understanding the three main sections of the ParseHub tool

Using the 'Select' command to input search criteria

Automated input command recognition by ParseHub

Renaming commands for better organization and clarity

Creating a new template for the search results page

Selecting and extracting data in JSON or CSV format

Using 'Begin New Entry' command to structure data

Renaming 'Selection' commands for data clarity

Utilizing 'Relative Select' command to associate data

Automatic identification and highlighting of similar elements

Extraction of additional data such as business descriptions

Creating additional templates for deeper data extraction

Teaching ParseHub to navigate through multiple pages of results

Finalizing the project with multiple templates for comprehensive scraping

Testing the project and running a data extraction

Previewing and reviewing the extracted data

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: