STATISTICS ABOUT THE COLLECTION

* Note:Data are collected with a number of pedestrian UNIX tools, so that *
* non-conforming files (for example) will interfere with correct counts; *
* but different investigations corroborate the numbers within 10%, in general.*

As of Tue Jan  12 12:37:57 CST 1999:

Results from hit-counters indicate approximately 38 000 hits from early
1995 through the end of 1998; now, hits come in to the welcome pages 
about once every every ten minutes. (Currently, the hit counters only
record accessions of the initial welcome/index pages; thus each
visitor will be counted at most once per web session -- and not at all
if they have a direct URL to some other page. This has not uniformly
been the case, and some of the hits are the result of me reviewing the
files!)

==============================================================================
                SIZE OF COLLECTION 
==============================================================================
(using  du;  ls -1 | wc; grep ^From\:\  * | wc; wc * | grep total )

directory	files	posts	lines	words	bytes

93_back/	  79	 185	 14K	  93K	 628K
94/		  71	 220	 11K	  72K	 488K
95/		 425	 878	 52K	 355K	2412K
96/		 285	 506	 32K	 199K	1395K
97/		 271	 409	 28K	 181K	1267K
98/		 586	1022 	 65K	 415K	2944K 
99/		   0	   0	   0	    0	   1K
  All 9*/:	1717	3220	202K	1315K	9135K
index/		 134	 --	 24K	 125K	1184K
collection/	  41	 --	 10K	  74K	1048K
images/		  80	 --	 --	  --	 251K
welcome.html	   1	 --	(164)	   1K	  13K
 TOTAL PUBLIC:	1973	3220	236K	1515K	     11631K

The directories 9* hold the "Selected Topics" files; index/ holds the
index and navigation pages, with the images in a separate directory;
collection/  holds this file and other information about the site.
(There are also private housekeeping and search-tool database directories;
the grand total is 13.7Meg).

There are segments of the "topics" files which are neither mail nor posts,
and lack a "From:" line; these are missed in the tallies above. (Indeed, some
41  (  grep -c ^From\:\  9*/* | grep  \:0 | wc  ) "topics" files seem to have 
no author at all; they are computer programs, .tex files, etc.)  Some topics
are listed in two index pages; there are 1732 ( grep ^\<li master.html | wc )
links to these topics.

Of the files in  index/  , 104 are MSC-area indices ( ls -1 [0-9]* | wc )
and 15 are part of the tour  ( ls -1 tour* | wc )  leaving 15 others.

For comparison, here are the numbers from Jan 1998:
>The content files constitute about
>	    1 100  files,
>	    2 041  items,
>	  135 218  lines,
>	  846 287  words,
>	5 402 612  bytes,
>The index pages contribute
>	      102  files,
>	   14 530  lines,
>	   71 014  words,
>	  638 918  bytes

==============================================================================
                CHARACTERISTICS OF CONTENT FILES
==============================================================================

Of these content files, we can check which ones do or don't contain certain 
kinds of information. 
***Did I have any effect on files: 1031 no (so 686 yes, plus the index files 
	etc. One should add a portion of the files with no "From:" lines)
	( grep -c -i rusin 9*/* | grep -c \:0  )
	This includes  788  posts and email I wrote, and 415 emails received.
	( grep ^From\:\  9*/* | grep -i rusin | wc )
	( grep ^To\:\  9*/* | grep -i rusin | wc )
***Newsgroups: Only 176 files have no Newsgroups: line (so 1541 do). 
	These contain (excerpts of) 2591 posts.
	( grep -c ^News 9*/* | grep \:0 | wc)
	( grep ^News 9*/* | wc )


Current traffic on USENET: approximately 150(?) messages per day in sci.math
alone across last 12 months, and more in subsidiary newsgroups. Thus
this collection represents much less than 1% of the recent postings in the math
newsgroups.

The process of seeking permission from authors revealed an
unduplicated count of about 500 authors by Spring 1996. About a dozen
had unintelligible addresses and mail to maybe 50 more bounced as
undeliverable (host or user unknown). Total count of authors is unknown. 
An upper bound is  2816  -- not very different from the 3220 items total!
	( grep ^From\:  9*/* | sort | uniq | wc )

Ages of dated "items"
   9 from 1990 ( grep ^Date 9*/* | grep -c 90 )
  14 from 1991 
  79 from 1992 
 187 from 1993 
 225 from 1994 
 910 from 1995 
 528 from 1996 
 423 from 1997 
1096 from 1998
   0 from 1999

==============================================================================
                STRENGTHS BY SUBJECT AREA
==============================================================================
We can estimate the _number_ of files to be retrieved for each area.
(Note: A few files are mentioned in more than one index page.)
( grep -c \"\\\.\\\.\/9 [0-9]*l )
In some cases it's easier to lump together subareas like this:
( cat 11*html | sort | uniq |  grep -c \"\\\.\\\.\/9  )

Here are the counts per subject area: (total should be about 1717)

00   10
01    8
03   29
04   14
05   77
06    3
08    0
11   331	ie, 20% of the files are in number theory
12   55		  + 20% in abstract algebra
13   40		  + 20% in geometry and topology
14   114	  + 20% in a few areas of fairly good coverage
15   52				(combinatorics, logic, computational math) 
16   16		  + 20% in many areas of poor coverage
17    0				(analysis, applications, statistics)
18    2
19    2
20   71
22    3
26   39
28   20
30   15
31    1
32    3
33   16
34   24
35    5
37    0
39   13
40   23
41   17
42    5
43    3
44    5
45    0
46   11
47    3
49    3
51   112
52   97
53   13
54   54
55   36
57   73
58    6
60   37
62   35
65   64
68   68
70   10
73    1
74    0
76    4
78    6
80    4
81    1
82    3
83    1
85    0
86    6
90   15
91    0
92    8
93    3
94   18
97