scrap data from chrome extension using nodejs

how to scrap data from chrome extension like app running in chrome browser as a plugin e.g a line chat app runs on a browser so how can I scrap data from this app. Kindly tell me step to make the task possible.

Return windows object using puppeteer

I’m trying to return the whole windows object from a page, and then traversing the object outside of puppeteer.

I’m trying to access the data in Highcharts property, for which I need to access the window object. The normal javascript code being something like window.Highcharts.charts[0].series[0].data.

I thought the easiest way would be to use puppeteer to access the site, and just send me back the windows object, which I could then use outside of puppeteer like any other JS object.

After reading the documentation, I’m finding it difficult to return the object as it would appear just putting ‘window’ into the chrome console. I’m not sure what I’m missing?

I’ve read through the documentation, and the following two methods seem like they should work?

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.example.com', {waitUntil: 'networkidle2'});

    // METHOD 1
    // Create a Map object
    await page.evaluate(() => window.map = new Map());
    // Get a handle to the Map object prototype
    const mapPrototype = await page.evaluateHandle(() => Map.prototype);
    // Query all map instances into an array
    const mapInstances = await page.queryObjects(mapPrototype);

    console.log(mapInstances);

    await mapInstances.dispose();
    await mapPrototype.dispose();

    // METHOD 2
    const handle = await page.evaluateHandle(() => ({window, document}));
    const properties = await handle.getProperties();
    const windowHandle = properties.get('window');
    const documentHandle = properties.get('document');
    var result = await page.evaluate(win => win, windowHandle);

    console.log(result)

    await handle.dispose();



    await browser.close();
})();

However, it only returns the following in the console, and not the simple object I would like;

enter image description here

Not sure if I’m going about this the right way, so any help/advice is much appreciated.

Chrome extension to calculate salary from linkedin profile

I have downloaded this extension “salary calculator:. After adding this, when i open linkedin profile page and click on any profile , i should see a popup which tells the salary of tht user. But I am unable to get. I have tried to analyze the code too. But couldnt understand where is the actual error. Also, how to store the elements in the database?

The code is provided in the below link –

https://github.com/jayfeng1/Linkedin_Salary_Chrome

Any suggestions or solution is highly appreciated. Thanks in advance !

How to delay fetch() until website has finished loading dynamic content

I’m using the following javascript code to download the source of a webpage in the form of a html file. This code is currently run whenever the user clicks my extension’s button:

let URL = 'https://smmry.com/https://www.cnn.com/2018/04/01/politics/ronald-kessler-jake-tapper-interview/index.html#&SM_LENGTH=7'
    fetch(URL)
        .then((resp) => resp.text())
        .then(responseText => {
           download("website_source.html", responseText)
        })

function download(filename, text) {

    var element = document.createElement('a');
    element.setAttribute('href', 'data:text/plain;charset=utf-8,' + encodeURIComponent(text));
    element.setAttribute('download', filename);

    element.style.display = 'none';
    document.body.appendChild(element);

    element.click();

    document.body.removeChild(element);
}

Here’s the source of the webpage: https://smmry.com/https://www.cnn.com/2018/04/01/politics/ronald-kessler-jake-tapper-interview/index.html#&SM_LENGTH=7

However, as you can see if you visit the webpage, sometimes the webpage takes a small amount of time (up to a few seconds) to summarize the article. It’s less noticeable on this article – but usually a pink loading bar will move up and down in the pink box until the summary is created and displayed on the website.

I believe my code is downloading the source of the website before it finishes summarizing the article, thus the HTML file my program downloads does not contain the summary of the article.

How can I make sure the fetch() request only downloads the content of the website once the website https://smmry.com has finished summarizing the article https://www.cnn.com/2018/04/01/politics/ronald-kessler-jake-tapper-interview/index.html.

Get reactInstance from Javascript in Chrome Extension

I’m currently facing a problem while developing a Chrome Extension. This extension is used on a ReactJS based website. I need to scrap some data from the page. He is an example of the page.

... ...

When I use the Chrome inspector, I can see that my div class="UserWallet"> has a property __reactInternalInstance. I found a function findReact(element) used to get the React Instance. This function is used in an other Chrome Extension called Steemit-More-Info. I have the exact same function and a use the same HTML element as parameter but my function is not working. When I do $(".UserWallet)", the result doesn’t contains the property __reactInternalInstance. But in the other extension, it’s working with the same JQuery code and the same findReact function.

Here is the code for findReact.

var findReact = function(dom) {
   for (var key in dom) {
      if (key.startsWith("__reactInternalInstance$")) {
        var compInternals = dom[key]._currentElement;
        var compWrapper = compInternals._owner;
        var comp = compWrapper._instance;
        return comp;
      }
    }
    return null;
};

Has anyone ever faced that problem? Is there a special library that I need to include in my extension to be able to scrap the reactInstance?

Thank you,

cedric_g

Selenium Chrome Error: You are using an unsupported command-line flag: –ignore-certifcate-errors

Okay so I am learning Web Scraping and am comfortable with Java hence I choose Jsoup, which is a web scrapping library. I planned on scrapping A CodeChef contest problem (which is just a coding problem), but I found difficulty scrapping all the displayed content, which is not possible as most of it is dynamic source. So I used selenium to render the JavaScript and obtain simple HTML page and then feed it to JSOUP.

So I tried printing the rendered HTML page just to verify, but I get the following error when I run the code:

My Code:

    File f = new File("");
    System.setProperty("webdriver.chrome.driver", f.getAbsolutePath());
    WebDriver driver = new ChromeDriver();
    driver.get("https://www.codechef.com/problems/FRGTNLNG");
    System.out.println(driver.getPageSource());

Error (in chrome):

You are using an unsupported command-line flag: --ignore-certifcate-errors. Stability and security will suffer

I tried the following solution from Chrome Error: You are using an unsupported command-line flag: –ignore-certifcate-errors. Stability and security will suffer, I installed the latest chromedriver but it didn’t resolve my error.

I also tried adding Desired Capabilities (now deprecated) and ChromeOptions as per Pass driver ChromeOptions and DesiredCapabilities?, but the same error persists.

Thanks in advance!

Using Xpath to scrape links and descriptions from this Etsy product listings page

I am trying to scrape all the links on https://www.etsy.com/market/happiness_bracelet and then the product descriptions from within each link extracted.

I am using a chrome extension called Scraper to input the xpath, obtained from right-clicking the element in Chrome Dev Tools. But not getting the desired result.

Problem: Can’t find proper xpath for the links. What would be the proper setup to get the xpath of the links on that webpage and extract the product descriptions from within them?

Is there’s a way to do it just using Chrome Dev Tools and the proper Xpath, or would I need Python/bs4/selenium for this task?

Thanks for your help.

Does the context.Context carries information such as request cookies and headers?

I’m using the chromedp browser driver pkg to login in a website which I want to retrive data from. Now, I would like to do that on a larger scale. Maybe if I could use the headers, cookies and authorization from the browser (which is opened in a context) I could do that with a simples GET request.
Is that possible? If so, how come?

Scraping from specific part of html based on right click

I am building a chrome extension whereby I want to be able to right click on a certain part of a page and then scrape some info from it. So using chrome.contextMenus I’d like to be able to only scrape from the element(one of it’s attributes) where I’ve right clicked. Sort of like the behaviour in chrome when you right click somewhere on a page and select inspect it will open the element view on the page element you right clicked. The reason I want to do this is because there will be a number of similar type elements with different ids(attribute) so I want to be able get only the id of the particular element I’m interested in.

Is this even possible?

I was looking though the chrome.contextMenus documentation and I’m wondering if I know the element type(article)could I set the context menu on that and get the id that is stored in it that way?

Full scrolling in web scraper – chrome extension

I am new to this scraping thing. For the basics I have started using chrome extension. I am using this site http://www.barnehagefakta.no/sok for trial. I have added Element Scroll Down to _root and selected the element

div.large-9

Now when I preview it, it just scrolls twice. It does not keep on scrolling till all records show even with Multiple option selected. I set the query time to What am I missing in it? 200000ms.