STATISTICS ABOUT THE COLLECTION * Note:Data are collected with a number of pedestrian UNIX tools, so that * * non-conforming files (for example) will interfere with correct counts; * * but different investigations corroborate the numbers within 10%, in general.* As of Tue Jan 12 12:37:57 CST 1999: Results from hit-counters indicate approximately 38 000 hits from early 1995 through the end of 1998; now, hits come in to the welcome pages about once every every ten minutes. (Currently, the hit counters only record accessions of the initial welcome/index pages; thus each visitor will be counted at most once per web session -- and not at all if they have a direct URL to some other page. This has not uniformly been the case, and some of the hits are the result of me reviewing the files!) ============================================================================== SIZE OF COLLECTION ============================================================================== (using du; ls -1 | wc; grep ^From\:\ * | wc; wc * | grep total ) directory files posts lines words bytes 93_back/ 79 185 14K 93K 628K 94/ 71 220 11K 72K 488K 95/ 425 878 52K 355K 2412K 96/ 285 506 32K 199K 1395K 97/ 271 409 28K 181K 1267K 98/ 586 1022 65K 415K 2944K 99/ 0 0 0 0 1K All 9*/: 1717 3220 202K 1315K 9135K index/ 134 -- 24K 125K 1184K collection/ 41 -- 10K 74K 1048K images/ 80 -- -- -- 251K welcome.html 1 -- (164) 1K 13K TOTAL PUBLIC: 1973 3220 236K 1515K 11631K The directories 9* hold the "Selected Topics" files; index/ holds the index and navigation pages, with the images in a separate directory; collection/ holds this file and other information about the site. (There are also private housekeeping and search-tool database directories; the grand total is 13.7Meg). There are segments of the "topics" files which are neither mail nor posts, and lack a "From:" line; these are missed in the tallies above. (Indeed, some 41 ( grep -c ^From\:\ 9*/* | grep \:0 | wc ) "topics" files seem to have no author at all; they are computer programs, .tex files, etc.) Some topics are listed in two index pages; there are 1732 ( grep ^\
  • The content files constitute about > 1 100 files, > 2 041 items, > 135 218 lines, > 846 287 words, > 5 402 612 bytes, >The index pages contribute > 102 files, > 14 530 lines, > 71 014 words, > 638 918 bytes ============================================================================== CHARACTERISTICS OF CONTENT FILES ============================================================================== Of these content files, we can check which ones do or don't contain certain kinds of information. ***Did I have any effect on files: 1031 no (so 686 yes, plus the index files etc. One should add a portion of the files with no "From:" lines) ( grep -c -i rusin 9*/* | grep -c \:0 ) This includes 788 posts and email I wrote, and 415 emails received. ( grep ^From\:\ 9*/* | grep -i rusin | wc ) ( grep ^To\:\ 9*/* | grep -i rusin | wc ) ***Newsgroups: Only 176 files have no Newsgroups: line (so 1541 do). These contain (excerpts of) 2591 posts. ( grep -c ^News 9*/* | grep \:0 | wc) ( grep ^News 9*/* | wc ) Current traffic on USENET: approximately 150(?) messages per day in sci.math alone across last 12 months, and more in subsidiary newsgroups. Thus this collection represents much less than 1% of the recent postings in the math newsgroups. The process of seeking permission from authors revealed an unduplicated count of about 500 authors by Spring 1996. About a dozen had unintelligible addresses and mail to maybe 50 more bounced as undeliverable (host or user unknown). Total count of authors is unknown. An upper bound is 2816 -- not very different from the 3220 items total! ( grep ^From\: 9*/* | sort | uniq | wc ) Ages of dated "items" 9 from 1990 ( grep ^Date 9*/* | grep -c 90 ) 14 from 1991 79 from 1992 187 from 1993 225 from 1994 910 from 1995 528 from 1996 423 from 1997 1096 from 1998 0 from 1999 ============================================================================== STRENGTHS BY SUBJECT AREA ============================================================================== We can estimate the _number_ of files to be retrieved for each area. (Note: A few files are mentioned in more than one index page.) ( grep -c \"\\\.\\\.\/9 [0-9]*l ) In some cases it's easier to lump together subareas like this: ( cat 11*html | sort | uniq | grep -c \"\\\.\\\.\/9 ) Here are the counts per subject area: (total should be about 1717) 00 10 01 8 03 29 04 14 05 77 06 3 08 0 11 331 ie, 20% of the files are in number theory 12 55 + 20% in abstract algebra 13 40 + 20% in geometry and topology 14 114 + 20% in a few areas of fairly good coverage 15 52 (combinatorics, logic, computational math) 16 16 + 20% in many areas of poor coverage 17 0 (analysis, applications, statistics) 18 2 19 2 20 71 22 3 26 39 28 20 30 15 31 1 32 3 33 16 34 24 35 5 37 0 39 13 40 23 41 17 42 5 43 3 44 5 45 0 46 11 47 3 49 3 51 112 52 97 53 13 54 54 55 36 57 73 58 6 60 37 62 35 65 64 68 68 70 10 73 1 74 0 76 4 78 6 80 4 81 1 82 3 83 1 85 0 86 6 90 15 91 0 92 8 93 3 94 18 97