Ultimate URL Scraper by Cyb3rNuX
Scraping a site
Open the positioning that you want to scrape.
The first thing you’d prefer to attempt to do when creating a sitemap is specifying the start URL. This is usually the URL of which the scraping will begin. You’ll also specify multiple start URLs if the scraping should start from various places. For example, if you’d prefer to scrape many search results, you’ll build a separate start URL for all search results.
Specify multiple URLs with series
Where a site utilizes valuation in pages URLs, it’s much simpler to make an expansion inauguration URL than building Link selectors that may navigate the placement. To designate a series URL, replace the numeric each segment of start URL with an expansion definition – [1-100]. If the arrangement uses zero paddings in URLs, add zero fillings to the range definition – [001-100]. If you would like to skip some URLs, you’ll also specify incremental like this [0-100:10].
Use range URL like this http://example.com/page/[1-3] for links like these:
Apply range URL including zero paddings similar that https://example.com/page/[001 – 1000] as links same those:
Use range URL with increment like this http://example.com/page/[0-1000:10] for links like these:
After you’ve created the sitemap, you’ll add selectors to it. Within the Selectors panel, you’ll add new selectors, modify them, and navigate the selector tree. The selectors could also be added in an exceedingly very tree-type structure. The net scraper will execute the selectors within the order of how they’re organized in the tree arrangement. For instance, there is a story site, and you would want to scrape whole articles whose links are available on the first page. In image 1, you will see this instance site.
Fig. 1: News site
To scrape this site, you will be ready to create a Link selector which may extract all article links within the first page. As a toddler selector, you’ll add a Text selector that could extract articles from the object pages that the Link selector obtained links to. The image below illustrates how the sitemap should be built for the news site.
Fig. 2: News site sitemap
Remark that during building selectors, use Element preview and Data preview stress to substantiate that you just have selected the proper elements with the correct data.
More information about the selector tree building is on the market in selector documentation. You want to a minimum of examining these core selectors:
Investigate selector tree
Later you have created selectors to the sitemap; you’ll inspect the tree structure of selectors within the Selector graph panel. The image below shows an example selector graph.
Fig. 3: News site selector graph
Scrape the placement
After you’ve created selectors for the sitemap, you will be ready to start scraping. Open Scrape panel and start scraping. A novel popup window will open within which the scraper will load pages and extract data from them. After the scraping is finished, the popup window will close, and you’ll be notified with a popup message. You will be ready to view the scraped data by opening the Browse decoration and export it by opening the Exportation data as CSV.