There are a few ways to determine whether an object should be private or public. One is the request method. Only URLs requested with the ``GET'' method can be public. Another way is by examining the URL string. URLs which match one of the stoplist entries will always be private objects. Usually this includes ``cgi-bin'' scripts. A third way is by checking the HTTP request and reply headers. For example, if the request includes user authentication information, then the object should never be made public. Additionally, some HTTP replies such as ``401 Unauthorized'' should also never be made public.
For these reasons, Squid starts all objects out as private and changes them to public only after the HTTP reply headers have been read.
Unfortunately, this causes some problems with the UDP-based Internet Cache Protocol (ICP) used to query neighboring caches. Specifically, when an ICP reply packet is received, it only contains the object URL which is not sufficient enough to locate private objects in the cache metadata. To get the additional information needed to locate private objects, we decided to use the ``reqnum'' field of the ICP packet. This is an acceptable solution, except that as implemented in cached-1.4.pl3 and earlier, all ICP replies have the reqnum field reset to zero!
Squid will make use of private objects until it notices that one of its neighbors is sending ICP replies with the reqnum field set to zero. It will then only use private keys for objects which are not going to be queried for via ICP. These include objects in the stoplist and If-Modified-Since requests.
The exception to the above is the situation where Squid is located behind a firewall, and must use a parent for all external fetches. This situation is declared by the "behind_firewall on" configuration file setting.
See HTTP-codes.txt for a list of HTTP response codes, and how they are cached.
The HTTP codes are now logged to "access.log" in the native format (ie with 'emulate_httpd_log off').
The format is now:
timestamp elapsed src-address type/code size method URL
The -U option can be used to actually remove the invalid objects from disk.
In addition, the -z option will not cause 'rm -rf [0-9][0-9]' to be executed unless the -U option is also given.
When swap files are not removed during restart there internal counters for disk space taken will not match the actual disk space used. If you have a large cache or plenty of extra disk space, this should not be a problem. However, if space is an issue, you may want to use the -U option at the cost of a slower restart.
main.c: Section 1 cache_cf.c: Section 3 errorpage.c: Section 4 comm.c: Section 5 disk.c: Section 6 fdstat.c: Section 7 filemap.c: Section 8 ftp.c: Section 9 gopher.c: Section 10 http.c: Section 11 icp.c: Section 12 icp_lib.c: Section 13 ipcache.c: Section 14 neighbors.c: Section 15 objcache.c: Section 16 proto.c: Section 17 stat.c: Section 18 stmem.c: Section 19 store.c: Section 20 tools.c: Section 21 ttl.c: Section 22 url.c: Section 23 wais.c: Section 24 mime.c: Section 25 connect.c: Section 26 send-announce.c: Section 27 acl.c: Section 28Debugging levels are set in the configuration file with the 'debug_options' line. For example:
debug_options ALL,1 28,9 22,5
There are N types of lists:
'src' client IP address 'dst' server IP address** 'method' method of the request (eg, GET, POST) 'proto' protocol of the request (eg HTTP, WAIS) 'domain' domain of the URL request (eg .foo.org) 'port' port number of the URL request (eg 80, 21) 'time' time-of-day and day-of-week format: [SMTWHFA] [hh:mm-hh:mm] 'pattern' regular expression matching on the URL-pathAfter the access lists have been defined, you can then combine them in way to allow or deny access.
For example, your cache might be configured to accept requests from both inside and outside of your organization. In that case you'd probably want to allow internal clients to access anything, but limit outside access to only sites within your organization. It could be done like this:
acl ourclients src 128.138.0.0/255.255.0.0 198.117.213.0/24 acl ourservers domain .whatsamattu.edu http_access deny !ourclients !ourservers http_access allow ourclientsIf you wanted to limit FTP requests to off-peak hours, you could use:
acl daytime time MTWHF 08:00-17:00 acl FTP proto FTP http_access deny FTP daytimeAny of the access list types can accept multiple values on the same line, except for 'time'. Multiple values of an 'acl' definition are treated with OR logic. Multiple ACLs of an 'http_access' are treated with AND logic. That is, all ACLs much match for the 'allow' or 'deny' take effect. The order of the 'http_access' lines are important. When a line matches any following lines are not considered at all.
'icp_access' is the same as 'http_access' but it applies to the ICP port. However, it is not yet fully implemented. It is only able to check 'src' and 'method' ACLs.
**Note, the 'dst' ACL type has been added for version 1.0.beta12. In that version it is implemented in a "lazy" manner. If the URL hostname is not already in the IP cache, the ACL checks will not match it, but they will start a DNS lookup so that it will likely be present for future ACL checks. This means some users may occasionally get oddball results. For example, a page may fail the first time, but succeed on the second try, or vice-versa.
As soon as the SIGTERM is received, the incoming HTTP socket is closed so that no further requests will be accepted.
cache_host big.foo.org parent 3128 3130 weight=5The weight must be a non-zero integer. It is used as a divisor to calculate a weighted round-trip-time (RTT). Higher weights will cause a parent to have a ``better'' RTT.
Weights are only involved when all parent caches return MISS. Squid still fetches an object from the first parent or neighbor to reply with a HIT, regardless of any weight values.
mv log log.old awk '{print $2,$4,$5,$6,$7}' < log.old > log
There is one important difference between these two methods however. Squid never makes ICP queries for objects which match the stoplists. Instead, the object will be fetched directly (unless on the other side of a firewall). We recommend that you use the stoplist for cgi-bin scripts and use the ttl_pattern rules to prevent caching of normal objects.