Note: This post is only for educational purpose.
In the previous post, we have learned how to scrap data from wikipedia,
Data (table ) can be scrapped likewise from Google scholar too, but there is one problem with the strategy we plan to employ.
Lets scrap data of Professor Dr Vijay Bhargava from Google scholars in this post,
When we scrap table with table class tag or id tag only data that is non-hidden from total data in table is scrapped.
- Table id: gsc_a_tr
- td class: gsc_a_t (paper name, year published, no of citations)
- div class: gs_gray ( author data)
Fig1: Table data (rest of data is hidden, and unlocked by clicking on “show more button”
Fig2: unlocked dynamically hidden data
Hence it is planned to use selenium library to unlock dynamically hidden data by clicking on “Show more” Button required no of times.
Fig3: Show more click button
Fig4: selenium library code to unlock dynamically hidden data
Fig5: Beautiful soup to grab data from table (full data that included hidden dynamic data)
Fig6: div class gs_gray has author name data for each paper
Scrapped data from web, appended to dataframe
Fig7: Appended dataframe
Fig8: Group data by year ( that gives no of papers published per year)
Fig9: Grouped data by year, no of papers
Fig10: Bar graph that represents no of papers on y axis, and year of publication on x axis.