Data Scrap from Google Scholar

Note: This post is only for educational purpose.

In the previous post, we have learned how to scrap data from wikipedia,

Data (table ) can be scrapped likewise from Google scholar too, but there is one problem with the strategy we plan to employ.

Lets scrap data of  Professor Dr Vijay Bhargava from Google scholars in this post,


When we scrap table with table class tag or id tag only data that is non-hidden from total data in table is scrapped.

  • Table id: gsc_a_tr
  • td class: gsc_a_t (paper name, year published, no of citations)
  • div class:  gs_gray ( author data)


Fig1: Table data (rest of data is hidden, and unlocked by clicking on “show more button”


Fig2: unlocked dynamically hidden data

Hence it is planned to use selenium library to unlock dynamically hidden data by clicking on “Show more” Button required no of times.


Fig3: Show more click button


Fig4: selenium library code to unlock dynamically hidden data


Fig5: Beautiful soup to grab data from table (full data that included hidden dynamic data)


Fig6: div class gs_gray has author name data for each paper

Scrapped data from web, appended to dataframe


Fig7: Appended dataframe


Fig8: Group data by year ( that gives no of papers published per year)


Fig9: Grouped data by year, no of papers


Fig10: Bar graph that represents no of papers on y axis, and year of publication on x axis.





Mtech in Clinical Eng Jointly offered by Indian institute of technology Madras& Christian medical college Vellore& Sree chitra tirunal institute for medical sciences and technology Trivandrum.
This entry was posted in Programming. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s