Extract links and other information related to about page

extract_about_links(base_url, timeout_thres = 10)

Arguments

base_url

A base URL (the base part of the web address)

timeout_thres

A timeout threshold. The default value is 10 seconds.

Value

If successful, the function returns a dataframe of two columns ("href", "link"). If not successful, the function returns a dataframe of three columns ('href', 'link_text', link'). In this dataframe, href should be NA and link_test should inform one of the following five error cases: "Found without tree search.", "This website is broken.", "The website is flat (no tree structure).", "PHP error", or "The website does not have about page."