Phantomjs - Some Useful Tips

Code Context
There are mainly two contexts in PhantomJS program: first, the PhantomJS program itself; second, the webpage open in your headless browser, i.e., access to the DOM.

How can I execute JavaScript on the given webpage itself inside PhantomJS?
The solution is page.evaluate (where page is the variable representing the current page "open" in your headless browser). page.evaluate takes, as argument, a function to-be executed in the context of the webpage. It can return a result from the webpage back to your PhantomJS program. For example, that you'd like to grab the text of an element on the current page with ID "foo":
var foo = page.evaluate(function() {
    return $("#foo").text;
})
You could then use foo in your PhantomJS program, successfully extracting the value from the webpage. Note: return values are limited to simple objects, rather than, say, functions.

IncludeJs and InjectJs
You can use PhantomJS to inject/include JavaScript files (jQuery and other libraries) in the current webpage using two functions: page.injectJs and page.includeJs.

Difference between injectJs and includeJs
page.injectJs pauses execution until the script is loaded, while page.includeJs loads the script like any other. page.includeJs includes external script from the specified url (usually a remote location) on the page and executes the callback upon completion. Injects external script code from the specified file into the page (like includeJs, except that the file does not need to be accessible from the hosted page) and returns true if injection is successful, otherwise it returns false.. Note: both accept callbacks.
page.includeJs("http://ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js", function() {
 /* jQuery is loaded, now manipulate the DOM */
    var foo = page.evaluate(function() {
        return $("#foo").text;
    });
    // do what you gotta do with 'foo'
    // ...
});
The above snippet will open up a web page, include the jQuery library into the page and evaluate the statement.
Console.logging from your web browser
Well, if you type console.log("Hello, World!") in your PhantomJS program, that will be printed to your terminal. If, however, your webpage tries to log the same message, it will pass by unnoticed! So if your webpage prints a bunch of traces to the console, you'll never see 'em.
Specifically, the following code does nothing because "Hello, World!" is printed in the context of the browser:
page.evaluate(function() {
    console.log("Hello, World!")
})
So, what if you want to log messages to your terminal from within your webpage? The trick is to use the page.onConsoleMessage event and echo any messages printed in the browser out to your terminal. Try this:
page.onConsoleMessage = function(msg){
    console.log(msg);
};

waitFor.js
PhantomJS beginners constantly ask how they can wait for something to appear on their webpage before acting. For example, maybe they want a banner to appear and then extract some text from it. Say "#foo" is now a div that loads a few seconds after the page has appeared. If you simply use the following code, you'll get unexpected results, as the banner may not be loaded at the time of query:
var page = require('webpage').create();
page.open('http://www.sample.com', function() {
    var foo = page.evaluate(function() {
        return $("#foo").text;
    });
    // ...
    phantom.exit();
});
Instead, you should use waitFor.js, a nice JavaScript snippet provided by the PhantomJS guys. This function is pretty simple, but very, very useful. Essentially, it queries the page every few seconds (the exact interval is an optional parameter), executing a user-specified function when a certain condition has been met. Expanding on the previous example, our code might look like the following (excluding the lengthy definition of waitFor):

function waitFor(testFx, onReady, timeOutMillis) {
    var maxtimeOutMillis = timeOutMillis ? timeOutMillis : 3000, //< Default Max Timout is 3s
        start = new Date().getTime(),
        condition = false,
        interval = setInterval(function() {
            if ( (new Date().getTime() - start < maxtimeOutMillis) && !condition ) {
                // If not time-out yet and condition not yet fulfilled
                condition = (typeof(testFx) === "string" ? eval(testFx) : testFx()); //< defensive code
            } else {
                if(!condition) {
                    // If condition still not fulfilled (timeout but condition is 'false')
                    console.log("'waitFor()' timeout");
                    phantom.exit(1);
                } else {
                    // Condition fulfilled (timeout and/or condition is 'true')
                    console.log("'waitFor()' finished in " + (new Date().getTime() - start) + "ms.");
                    typeof(onReady) === "string" ? eval(onReady) : onReady(); //< Do what it's supposed to do once the condition is fulfilled
                    clearInterval(interval); //< Stop this interval
                }
            }
        }, 250); //< repeat check every 250ms
};


var page = require('webpage').create();
page.open('http://www.sample.com', function() {
    waitFor(function() {
            // Check in the page if a specific element is now visible
            return page.evaluate(function() {
                return $("#foo").is(":visible");
            });
        }, function() {
           var foo = page.evaluate(function() {
                return $("#foo").text;
            });
            // ...
            phantom.exit();
        });   
    });
});
Source: http://www.princeton.edu/~crmarsh/phantomjs/
https://github.com/ariya/phantomjs/wiki/API-Reference-WebPage

Comments

Popular Posts