I want to group webpages in a shopping website as category pages,subcategory pages and product pages.
I developed a chrome extension that opens some webpages of a shopping website and ask few questions on each page to generate category,subcategory and product xpath. Its also evaluates the generated xpath on upcoming pages and based on the results it clusters them to the respective group and also skip the questions on these pages.
But it didn’t worked well as expected. Consider a website which has some pure product pages and some subcategories + product pages. If the tool initially opens up a pure product page and generates a product xpath based on the user input, it recognizes upcoming subcategories + product pages as product pages because the product xpath worked on this page too. It missed subcategory pages.
Is there anyway to differentiate webpages based on layout or template ?