Here's a relatively simple PHP function that will check if an URL really leads to a valid page (as opposed to generating "404 Not Found" or some other kind of error). It uses the CURL library – if your server doesn't have it installed, see "Alternatives" at the end of this post. This script may be useful for finding broken links and similar tasks.
function page_exists($url){
$parts=parse_url($url);
if(!$parts) return false;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);
if($parts['scheme']=='https'){
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
}
$response = curl_exec($ch);
curl_close($ch);
if(preg_match('/HTTP\/1\.\d+\s+(\d+)/', $response, $matches)){
$code=intval($matches[1]);
} else {
return false;
};
return (($code>=200) && ($code<400));
}
Notes on implementation
I've used a somewhat liberal interpretation of "exists" here – this function will return TRUE even when URL redirects to a different page. I think that this is generally a good idea.
Another thing to note is that this function expects a fully qualified and well-formed URL. Checking if a random string represents a syntactically valid URL is not the it's purpose and would be very inefficient + error-prone.
If you're familiar with CURL you might know about the CURLOPT_FAILONERROR option which is supposed to make curl_exec() treat a non-existent page as an error. It might seem that with this option set page_exists() might be simplified by only checking if $response equals FALSE (indicating an error). Well, that doesn't really work, at least not as expected. In my tests CURLOPT_FAILONERROR made curl_exec() fail when the returned HTTP status code was 302 – a form of temporary redirect. Needless to say the URL in question worked fine in my browser so I decided to blame CURL and revise the function to explicitly check the status code, treating all codes in the 2XX – 3XX range as success.
Alternatives
If you can't or don't want to use CURL there are other ways to see if a page exists.
- fopen() – try opening the URL as a file and hope the fopen() URL wrapper is enabled. You can find lots of similar examples on Google.
$handle = @fopen($url,'r');
if($handle !== false){
echo 'Page Exists';
} else {
echo 'Page Not Found';
}
-
fsockopen() – use sockets to connect to the target host, build the HTTP request by hand and analyze the server's response. See some page-checking examples in the comments for the fsockopen() function on php.net. IMHO this method is a bit of overkill – it's complex and may lead to strange bugs if you don't know exactly what you're doing.