求助received http 302 while downloading模式

AWStats logfile analyzer 7.4 Documentation
visitor is a person or computer (host) that has made at least 1 hit
on 1 page of your web site during the current period shown by the
If this user makes several visits during this period, it is counted
only once. Visitors are tracked by IP address, so if multiple users are
accessing your site from the same IP (such as a home or office
network), they will be counted as a single unique visitor.
The period shown by AWStats reports is by default the current month.
However if you use AWStats as a CGI you can click on the "year" link to
have a report for all the year.
In such a report, period is a full year, so Unique Visitors are number
of hosts that have made at least 1 hit
on 1 page of your web site during the year.
Number of visits made by all visitors.
Think "session" here, say a unique IP accesses a page, and then
requests three other pages within an hour. All of the "pages" are
included in the visit, therefore you should expect
multiple pages per visit and multiple visits per unique visitor
(assuming that some of the unique IPs are
logged with more than an hour between requests)
The number of "pages" viewed by visitors. Pages are usually HTML, PHP
or ASP files, not images or other files requested as a result
of loading a "Page" (like js,css... files). Files listed in the
NotPageList config
parameter (and match an entry of OnlyFiles config parameter if used)
are not counted as "Pages".
Any files requested from the server (including files that are "Pages")
except those that match
the SkipFiles config parameter.
Total number of bytes for pages, images and files downloaded by web
Note 1: Of course, this number includes only traffic for web only (or
mail only, or ftp only
depending on value of LogType).
Note 2: This number does not include technical header data size used
inside the HTTP or HTTPS protocol or by
protocols at a lower level (TCP, IP...).
Because of two previous notes, this number is often lower than bandwith
reported by your provider (your
provider counts in most cases bandwitdh at a lower level and includes
all IP and UDP traffic).
First page viewed by a visitor during its visit.
Note: When a visit started at end of month to end at beginning of next
you might have an Entry page for the month report and no Exit pages.
That's why Entry pages can be different than Exit pages.
Last page viewed by a visitor during its visit.
Note: When a visit started at end of month to end at beginning of next
you might have an Entry page for the month report and no Exit pages.
That's why Entry pages can be different than Exit pages.
The time a visitor spent on your site for each visit.
Some Visits durations are 'unknown' because they can't always be
calculated. This is the major reason for this:
- Visit was not finished when 'update' occured.
- Visit started the last hour (after 23:00) of the last day of a month
(A technical reason prevents AWStats from
calculating duration of such sessions).
A browser that is used primarily for copying locally an entire site.
These include
for example "teleport", "webcapture", "webcopier"...
This number represent the number of hits or ratio of hits when a visit
to your site comes
from a direct access. This means the first page of your web site was
- By typing your URL on the web browser address bar
- By clicking on your URL stored by a visitor inside its favorites
- By clicking on your URL found everywhere but not another internet web
pages (a link in a document,
an application, etc...)
- Clicking an URL of your site inside a mail is often counted here.
This value, available in the "miscellanous chart", reports an estimated
that can be used to have an idea of the number of times a visitor has
added your web
site into its favourite bookmarks.
The technical rules for that is the following formula:
Number of Add to Favourites = round((x+y) / r)
x = Number of hits made by IE browsers for "/anydir/favicon.ico", with
a referer field not defined, and with no 404 error code
y = Number of hits made by IE browsers for "/favicon.ico", with a
referer field not defined, with or without 404 error code
r = Ratio of hits made by IE browsers compared to hits made by all
browsers (r &= 1)
As you can see in formula, only IE is used to count reliable "add", the
"Add to favourites"
for other browsers are estimated using ratio of other browsers usage
compared to ratio of
IE usage. The reason is that only IE do a hit on favicon.ico nearly
ONLY when a user add the
page to its favourites. The other browsers make often hits on this file
also for other reasons
so we can't count one "hit" as one "add" since it might be a hit for
another reason.
AWStats differentiate also hits with error and not to avoid counting
multiple hits
made recursively in upper path when favicon.ico file is not found in
deeper directory
Note that this number is just an indicator that is in most case higher
than true value.
The reason is that even IE browser sometimes make hit on favicon
without an "Add to favourites"
action by a user.
HTTP status codes are returned by web servers to indicate the status of
a request.
Codes 200 and 304 are used to
tell the browser the page can be viewed.
206 codes indicate partial
downloading of content and is reported in the Downloads section. All
other codes generates hits and traffic 'not seen' by the visitor.
For example a return
code 301 or 302 will tell the browser to ask another page. The browser
will do another hit
and should finaly receive the page with a return code 200
All codes that are 'unseen' traffic are isolated by AWStats in the HTTP
Status report chart,
enabled by the directives .
in config file. You can also change value for 'not error' hits (set by
default to 200 and 304
directive.
The following table outlines all status codes defined for the HTTP/1.1
draft specification
outlined in .
They are 3-digit codes where the first digit of this code identifies
the class of the status
code and the remaining 2 digits correspond to the specific condition
within the response class.
They are classified in 5 categories:
class - Informational
Informational
status codes are provisional
responses from the web server... they give the client a heads-up on
the server is doing. Informational codes do not indicate an error
condition.&
100 Continue
continue status code tells the
browser to continue sending a request to the server.&
101 Switching
server sends this response when
the client asks to switch from HTTP/1.0 to HTTP/1.1&
2xx class - Successful
class of status code indicates
that the client's request was received, understood, and
successful.&
200 Successful
201 Created
202 Accepted
Non-Authorative Information
204 No Content
205 Reset Content
206 Partial Content
partial content success code is
issued when the server fulfills a partial GET request. This happens
the client is downloading a multi-part document or part of a larger
class - Redirection
code tells the client that the
browser should be redirected to another URL in order to complete the
This is not an error condition.&
300 Multiple
Permanently
Temporarily
303 See Other
304 Not Modified
305 Use Proxy
class - Client Error
status code indicates that the
client has sent bad data or a malformed request to the server. Client
are generally issued by the webserver when a client tries to gain
to a protected area using a bad username and password.&
400 Bad Request
401 Unauthorized
402 Payment
403 Forbidden
404 Not Found
400 Method Not
400 Not Acceptable
Authentication Required
400 Request Timeout
409 Conflict
411 Length Required
412 Precondition
413 Request Entity
414 Request-URI
415 Unsupported
Media Type
class - Server Error
status code indicates that the
client's request couldn't be succesfully processed due to some internal
error in the web server. These error codes may indicate something is
wrong with the web server.&
500 Internal
Server Error
internal server error has caused
the server to abort your request. This is an error condition that may
indicate a misconfiguration with the web server. However, the most
reason for 500 server errors is when you try to execute a script that
syntax errors.&
501 Not Implemented
code is generated by a webserver
when the client requests a service that is not implemented on the
Typically, not implemented codes are returned when a client attempts to
POST data to a non-CGI (ie, the form action tag refers to a
non-executable
502 Bad Gateway
server, when acting as a proxy,
issues this response when it receives a bad response from an upstream
support server.&
503 Service
Unavailable
web server is too busy processing
current requests to listen to a new client. This error represents a
problem with the webserver (normally solved with a reboot).&
504 Gateway Timeout
timeouts are normally issued
by proxy servers when an upstream or support server doesn't respond to
a request in a timely fashion.&
505 HTTP Version
Not Supported
server issues this status code
when a client tries to talk using an HTTP protocol that the server
support or is configured to ignore.
SMTP status codes are returned by mail servers to indicate the status
of a sending/receiving mail.
The status code depends on mail server and preprocessor used to analyze
All codes that are failure codes are isolated by AWStats in the SMTP
Status report chart,
enabled by the directives
in AWStats
config file. You can decide which codes are successfull mail transfer
that should not appear
in this chart with the
directive.
Here are values reported for most mail servers (This should also be
values when mail log file
is preprocessing with maillogconvert.pl).
SMTP Errors are classified in 3 categories:
2xx/3xx class - Success
are SMTP protocols successfull answers
200 Non standard
success response
standard success response
211 System status,
or system help reply
status, or system help reply
214 Help message
Service ready
Service ready
Service closing transmission channel
Service closing transmission channel
250 Requested mail
action taken and completed
ISP mail server have successfully executes a command and the DNS is
reporting a positive delivery.
251 User not
local: will forward to
message to a specified
email address is not local to the mail server, but it will accept and
forward the message to a different recipient email address.
252 Recipient
cannot be verified
cannot be verified but mail server accepts the message and attempts
354 Start mail
input and end with .
mail server is ready
to accept the message or instruct your mail client to send the message
body after the mail server have received the message headers.
4xx class - Temporary Errors
codes are temporary error message. They are used to tell client sender
an error occured but he can try to solve it but trying again, so in
most cases, clients that
receive such codes will keep the mail in their queue and will try again
Service not available, closing transmission channel
may be a reply to any command if the service knows it must shut down.
450 Requested mail
action not taken: mailbox busy or access denied
ISP mail server indicates
that an email address does not exist or the mailbox is busy. It could
be the network connection went down while sending, or it could also
happen if the remote mail server does not want to accept mail from you
for some reason i.e. (IP address, From address, Recipient, etc.)
451 Requested mail
action aborted: error in processing
ISP mail server indicates
that the mailing has been interrupted, usually due to overloading from
too many messages or transient failure is one in which the message sent
is valid, but some temporary event prevents the successful sending of
the message. Sending in the future may be successful.
452 Requested mail
action not taken: insufficient system storage
ISP mail server indicates, probable overloading from too many messages
and sending in the future may be successful.
453 Too many
mail servers have the
option to reduce the number of concurrent connection and also the
number of messages sent per connection. If you have a lot of messages
queued up it could go over the max number of messages per connection.
To see if this is the case you can try submitting only a few messages
to that domain at a time and then keep increasing the number until you
find the maximum number accepted by the server.
5xx class - Permanent Errors
are permanent error codes. Mail transfer is definitly a failure. No
other try will be done.
500 Syntax error,
command unrecognized or command line too long
501 Syntax error
in parameters or arguments
502 Command not
implemented
503 Server
encountered bad sequence of commands
504 Command
parameter not implemented
does not accept mail or closing transmission channel
must be pop-authenticated before you can use this SMTP server and you
must use your mail address for the Sender/From field.
530 Access denied
sendmailism ?
550 Requested mail
action not taken (Relaying not allowed, Unknown recipient user, ...)
an email to recipients
outside of your domain are not allowed or your mail server does not
know that you have access to use it for relaying messages and
authentication is required. Or to prevent the sending of SPAM some mail
servers will not allow (relay) send mail to any e-mail using another
company’s network and computer resources.
551 User not
local: please try
or Invalid Address: Relay
request denied
552 Requested mail
action aborted: exceeded storage allocation
mail server indicates, probable overloading from too many messages.
553 Requested mail
action not taken: mailbox name not allowed
mail servers have the
option to reduce the number of concurrent connection and also the
number of messages sent per connection. If you have a lot of messages
queued up (being sent) for a domain, it could go over the maximum
number of messages per connection and/or some change to the message
and/or destination must be made for successful delivery.
554 Requested mail
action rejected: access denied
557 Too many
duplicate messages
temporarily unavailable Indicates (probable) that there is some kind of
anti-spam system on the mail server.
Article written by .Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.
I'm teaching myself some basic scraping and I've found that sometimes the URL's that I feed into my code return 404, which gums up all the rest of my code.
So I need a test at the top of the code to check if the URL returns 404 or not.
This would seem like a pretty straightfoward task, but Google's not giving me any answers.
I worry I'm searching for the wrong stuff.
One blog recommended I use this:
$valid = @fsockopen($url, 80, $errno, $errstr, 30);
and then test to see if $valid if empty or not.
But I think the URL that's giving me problems has a redirect on it, so $valid is coming up empty for all values.
Or perhaps I'm doing something else wrong.
I've also looked into a "head request" but I've yet to find any actual code examples I can play with or try out.
Suggestions?
And what's this about curl?
7,27732650
If you are using PHP's , you can check the error code using
$handle = curl_init($url);
curl_setopt($handle,
CURLOPT_RETURNTRANSFER, TRUE);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 404) {
/* Handle 404 here. */
curl_close($handle);
/* Handle $response here. */
If your running php5 you can use:
$url = '';
print_r(get_headers($url, 1));
Alternatively with php4 a user has contributed the following:
This is a modified version of code from "stuart at sixletterwords dot com", at 14-Sep-. This version tries to emulate get_headers() function at PHP4. I think it works fairly well, and is simple. It is not the best emulation available, but it works.
- supports (and requires) full URLs.
- supports changing of default port in URL.
- stops downloading from socket as soon as end-of-headers is detected.
Limitations:
- only gets the root URL (see line with "GET / HTTP/1.1").
- don't support HTTPS (nor the default HTTPS port).
if(!function_exists('get_headers'))
function get_headers($url,$format=0)
$url=parse_url($url);
$end = "\r\n\r\n";
$fp = fsockopen($url['host'], (empty($url['port'])?80:$url['port']), $errno, $errstr, 30);
= "GET / HTTP/1.1\r\n";
$out .= "Host: ".$url['host']."\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
while (!feof($fp))
$var.=fgets($fp, 1280);
if(strpos($var,$end))
fclose($fp);
$var=preg_replace("/\r\n\r\n.*\$/",'',$var);
$var=explode("\r\n",$var);
if($format)
foreach($var as $i)
if(preg_match('/^([a-zA-Z -]+): +(.*)$/',$i,$parts))
$v[$parts[1]]=$parts[2];
return $v;
Both would have a result similar to:
[0] =& HTTP/1.1 200 OK
[Date] =& Sat, 29 May :14 GMT
[Server] =& Apache/1.3.27 (Unix)
(Red-Hat/Linux)
[Last-Modified] =& Wed, 08 Jan :55 GMT
[ETag] =& "3f80f-1b6-3e1cb03b"
[Accept-Ranges] =& bytes
[Content-Length] =& 438
[Connection] =& close
[Content-Type] =& text/html
Therefore you could just check to see that the header response was OK eg:
$headers = get_headers($url, 1);
if ($headers[0] == 'HTTP/1.1 200 OK') {
if ($headers[0] == 'HTTP/1.1 301 Moved Permanently') {
//moved or redirect page
10.1k33360
With strager's code, you can also check the CURLINFO_HTTP_CODE for other codes. Some websites do not report a 404, rather they simply redirect to a custom 404 page and return 302 (redirect) or something similar. I used this to check if an actual file (eg. robots.txt) existed on the server or not. Clearly this kind of file would not cause a redirect if it existed, but if it didn't it would redirect to a 404 page, which as I said before may not have a 404 code.
function is_404($url) {
$handle = curl_init($url);
curl_setopt($handle,
CURLOPT_RETURNTRANSFER, TRUE);
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
curl_close($handle);
/* If the document has loaded successfully without any redirection or error */
if ($httpCode &= 200 && $httpCode & 300) {
9,58143956
As strager suggests, look into using cURL. You may also be interested in setting CURLOPT_NOBODY with
to skip downloading the whole page (you just want the headers).
If you are looking for an easiest solution and the one you can try in one go on php5 do
file_get_contents('');
//and check by echoing
echo $http_response_header[0];
As an additional hint to the great accepted answer:
When using a variation of the proposed solution, I got errors because of php setting 'max_execution_time'. So what I did was the following:
set_time_limit(120);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_NOBODY, true);
$result = curl_exec($curl);
set_time_limit(ini_get('max_execution_time'));
curl_close($curl);
First I set the time limit to a higher number of seconds, in the end I set it back to the value defined in the php settings.
24.3k1571118
I found this answer :
if(($twitter_XML_raw=file_get_contents($timeline))==false){
// Retrieve HTTP status code
list($version,$status_code,$msg) = explode(' ',$http_response_header[0], 3);
// Check the HTTP Status code
switch($status_code) {
$error_status="200: Success";
$error_status="401: Login failure.
Try logging out and back in.
Password are ONLY used when posting.";
$error_status="400: Invalid request.
You may have exceeded your rate limit.";
$error_status="404: Not found.
This shouldn't happen.
Please let me know what happened using the feedback link above.";
$error_status="500: Twitter servers replied with an error. Hopefully they'll be OK soon!";
$error_status="502: Twitter servers may be down or being upgraded. Hopefully they'll be OK soon!";
$error_status="503: Twitter service unavailable. Hopefully they'll be OK soon!";
$error_status="Undocumented error: " . $status_
Essentially, you use the "file get contents" method to retrieve the URL, which automatically populates the http response header variable with the status code.
4,37142233
tested those 3 methods considering performance.
The result, at least in my testing environment:
This test is done under the consideration that only the headers (noBody) is needed.
Test yourself:
$url = "http://de.wikipedia.org/wiki/Pinocchio";
$start_time = microtime(TRUE);
$headers = get_headers($url);
echo $headers[0]."&br&";
$end_time = microtime(TRUE);
echo $end_time - $start_time."&br&";
$start_time = microtime(TRUE);
$response = file_get_contents($url);
echo $http_response_header[0]."&br&";
$end_time = microtime(TRUE);
echo $end_time - $start_time."&br&";
$start_time = microtime(TRUE);
$handle = curl_init($url);
curl_setopt($handle,
CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($handle, CURLOPT_NOBODY, 1); // and *only* get the header
/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);
/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
// if($httpCode == 404) {
// /* Handle 404 here. */
echo $httpCode."&br&";
curl_close($handle);
$end_time = microtime(TRUE);
echo $end_time - $start_time."&br&";
this is just and slice of code,
hope works for you
$ch = @curl_init();
@curl_setopt($ch, CURLOPT_URL, '');
@curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (W U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/ Firefox/2.0.0.1");
@curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
@curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
@curl_setopt($ch, CURLOPT_TIMEOUT, 10);
= @curl_exec($ch);
= @curl_errno($ch);
= @curl_error($ch);
$response = $
$info = @curl_getinfo($ch);
return $info['http_code'];
You can use this code too, to see the status of any link:
function get_url_status($url, $timeout = 10)
$ch = curl_init();
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER =& true, // do not output to browser
CURLOPT_URL =& $url,
// set URL
CURLOPT_NOBODY =& true,
// do a HEAD request only
CURLOPT_TIMEOUT =& $timeout);
// set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE); // find HTTP status
curl_close($ch); // close handle
echo $ //or return $
//example checking
if ($status == '302') { echo 'HEY, redirection';}
get_url_status('m');
3,91122133
To catch all errors : 4XX and 5XX, i use this little script :
function URLIsValid($URL){
$headers = @get_headers($URL);
preg_match("/ [45][0-9]{2} /", (string)$headers[0] , $match);
return count($match) === 0;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (W U; Windows NT 6.0; en-US; rv:1.9.0.3) Gecko/ Firefox/3.0.4");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$output = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
$handle = curl_init($uri);
curl_setopt($handle,
CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($handle,CURLOPT_HTTPHEADER,array ("Accept: application/rdf+xml"));
curl_setopt($handle, CURLOPT_NOBODY, true);
curl_exec($handle);
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 200||$httpCode == 303) {
echo "you might get a reply";
curl_close($handle);
In your case you can change application/rdf+xml to whatever you use.
13.9k62035
Your Answer
Sign up or
Sign up using Google
Sign up using Facebook
Sign up using Stack Exchange
Post as a guest
Post as a guest
By posting your answer, you agree to the
Stack Overflow works best with JavaScript enabled

我要回帖

更多关于 fastboot downloading 的文章

 

随机推荐