REACH, March 1990 ----------------- Research & Educational Applications of Computers in the Humanities ----------------------------------- Newsletter of the Humanities Computing Facility of the University of California at Santa Barbara ------------------------------------------------ HCF ANNOUNCES ARRIVAL OF KURZWEIL 5100 SCANNER The Humanities Computing Facility has just acquired a Kurzweil 5100 scanner, the top model of the high end Kurzweil line of scanners, marketed by Xerox Imaging Systems, Inc., and suitable for use with IBM or compatible computers. It is currently in operation in South Hall 4421. Scanners are designed to take text or images from the printed paper page and convert them into electronic computer files which can be used with a variety of programs on your own personal computer. The Kurzweil 5100 is a particularly sophisticated and accurate scanner, capable of reading multi-column documents, documents in both portrait and landscape orientation, typeset material, typewritten material, photocopied material, mixed typefaces, italics, underscore, draft dot matrix print, and shaded backgrounds. It uses a proprietary ICR (intelligent character recognition) software, and has a modest form of artificial intelligence which allows it to recognize virtually any font and to achieve the highest level of overall character recognition of any scanner currently manufactured. Capable of reading any type size between 6 point and 24 point, it comes equipped with a sheet feeder holding up to 50 sheets of paper in any size up to 11 by 14 inches. If several different documents are put into the sheet feeder, separated from one another by a blank page, the scanner will then interpret and process them as separate files. Output text files, in generic ASCII format by default, can also be converted to any of a number of standard word processing formats. Several of the standard IBM image formats are supported. The Kurzweil 5100 uses lexicons in English, Dutch, French, German, Italian, Spanish, Swedish, and Finnish to support the recognition process. You can also define up to 10,000 terms in your own lexicon. The Kurzweil 5100's particular strength is its verification feature, a process which allows you to train it to recognize unusual typefaces or particularly difficult text, such as broken characters or deteriorated printing. This training capability allows you to improve the range and accuracy of the scanning process far beyond the levels attainable by other scanners. The scanner shows you the elements of the text which it does not recognize, and you tell it how to interpret them as letters. You train the system using the first several pages of the document, and then let it continue automatically on its own through the rest of the document using those instructions. The set of training instructions can be saved from session to session for further use with the same or similar documents. In scanning a document containing a mixture of text and images, it scans both the text and the graphics in a single pass, putting the text output into one file and each graphic it encounters into a separate image file of its own. If it needs to, the Kurzweil scans each page twice to gather the necessary text and image data. With the scanner working in the background, you can actually start editing the first pages of a scanned document with your word processing program while the scanner is still scanning the later pages. The Kurzweil 5100 can work in either of two modes. It can be set to scan each page and then pause to interpret that page before going on to the next page. It can also be instructed to scan page after page rapidly in succession as simple graphic images and store them for later interpretation at a less busy time, such as overnight. At the front of the scanner is an angled section designed to accommodate the scanning of books. These can be placed on the scanning bed without being completely flattened. Bound originals can thus be scanned without undue harm to the bindings. Scanning can be set to operate from either left-to-right or right-to-left, and the process of a typical book scan consists of going through all the right hand pages first, followed by a reverse trip through the left-hand pages in descending order. At the conclusion of the scan, the program itself assembles the pages in correct order into a single organized file. To those of a technical bent, it will be of interest that the Kurzweil 5100 consists of the scanner itself and a board which goes inside the host computer, a board with its own 4MB memory and its own 16-MHz Motorola 68020 microprocessor. It scans at 400 dpi, and produces output which is user definable from 50 to 600 dpi. The scanner ignores the color red, so a page marked with the traditional red pen can be put through the scanner without presenting any difficulty. Any interested computing humanist may undertake a project with the Kurzweil 5100 scanner after completing a required short period of familiarization in its use. The device is rapidly demonstrating its popularity, and reservations are already needed to ensure equitable access to its services. For information, reservations, or a demonstration or test, please call Eric Dahlin at Ext. 2208. ------------------------------------------------------------ MELVYL GIVES ACCESS TO DISTANT LIBRARIES Any user of MELVYL, the University of California electronic library catalogue system, will now be able to gain access to the catalogues of certain other institutions from within the MELVYL system. The external systems currently available without special accounts are those of Boston University, the Colorado Alliance of Research Libraries, the Northern Regional Library Facility, Pennsylvania State University, Rensselaer Polytechnic, the University of Delaware, the University of Maryland, and Virginia Tech. Systems requiring special accounts for access are those of UCLA and RLIN, the Research Libraries Information Network. For further information, please call Carol Gibbens in the research section of the UCSB Library at Ext. 8051. If you are a user of electronic mail on UCSBUXA, you will also be able to gain access to a number of additional electronic library catalogues by using the Telnet technique. Telnet is a program running on a local computer which allows you to use computers at remote locations, just as though you had a local terminal connection. It is connected to other computers through Internet, the large communication network. The HCF is collecting further information on the use of Telnet for this purpose and will include it in future issues of _REACH_. ------------------------------------------------------------ NATIONAL CENTER FOR MACHINE-READABLE TEXT A new initiative to create a National Center for Machine- Readable Texts in the Humanities, directed by Marianne Gaunt of Rutgers and Robert Hollander of Princeton, has received funding from the National Endowment for the Humanities and is now in its preliminary phase of implementation. It is hoped that the Center will become fully operational in 1991- 92. The proposed Center is designed to perform a variety of functions. It will create an inventory of all existing electronic texts and catalogue those texts. It will identify, preserve, and make available texts created for special projects, texts which might otherwise become unavailable at the conclusion of those projects. And, it will act as a general point of referral to other projects and centers. It may also start to collect individual texts not available in other large archives, and may eventually even develop its own capability to produce machine-readable texts. The current initiative had its origins in 1981, when Rutgers University received starting funds from the Council of Library Resources to begin the development of the Inventory of Machine-Readable Text. Directed by Marianne Gaunt of the Alexander Library at Rutgers, that project received a further of grant from the Mellon Foundation in 1982, and continued to develop until it evolved into the current initiative toward a National Center for Machine-Readable Text. This year, during the preliminary period of planning for the Center, a national conference, with participating international collaborators, will develop recommendations on the implementation of the project. Leading figures in various fields will also be consulted. For further information on the National Center for Machine- Readable Text, please communicate with either of the following individuals: Marianne Gaunt Alexander Library Rutgers University New Brunswick, NJ 08903 Robert Hollander Dept. of Comparative Lit. Princeton University Princeton, NJ 08544 E-mail: bobh@phoenix.princeton.edu A listserver has been established for discussion of further developments. To subscribe, send an e-mail message to Robert Hollander. ------------------------------------------------------------ NEW "CALL" JOURNAL A new international journal has been created for the exchange of ideas on all aspects of computer assisted language learning (CALL), as well as on the related areas of computer assisted translation, computer assisted composition, and multi-lingual systems. Submissions should be sent in both hardcopy and either disk form or e-mail form, preferably the latter, to the journal editor: Keith Cameron Queen's Building The University Exeter, EX4 4QH, UK E-mail: cameron@uk.ac.exeter Subscription information and a sample copy of the journal may be obtained from: Intellect Books Suite 2, 108/110 London Road Oxford OX3 9AW, UK. ------------------------------------------------------------ INTERNATIONAL TEXT ENCODING INITIATIVE UNDERWAY The Text Encoding Initiative (TEI) is an international project designed to develop standardized guidelines for the preparation and exchange of electronic or machine-readable texts. Supported by various grants from the National Endowment for the Humanities, the Andrew W. Mellon Foundation, and the European Economic Community, the initiative is sponsored by the Association for Computers and the Humanities, the Association for Computational Linguistics, and the Association for Literary and Linguistic Computing. Standardized formats for the coding of electronic text will make it possible for groups to share large bodies of computerized material effectively. At present, such cooperative use of expensive resources is made difficult by the varied sets of formatting conventions now in existence. Working committees of the TEI have been formed in four areas: text documentation, text representation, text analysis and interpretation, and metalanguage and syntax. These four committees are now at work analyzing the particular problems and factors in each of their separate areas. An Advisory Board consisting of representatives from various scholarly professional societies has additionally been developed to provide a channel of communication with interested segments of the research community. The TEI has also formulated tentative agreements with various projects involved in encoding large bodies of text. Under these agreements, the projects and the TEI will cooperate in the testing and revision of the developing encoding guidelines. For further information on the work of the initiative, please communicate with the head of the steering committee: Nancy Ide Computer Science Department Box 520 Vassar College Poughkeepsie, NY 12601 E-mail: ide@vassar.bitnet Interested individuals may also wish to follow the work of the initiative by subscribing to the Bitnet discussion group TEI-L. To sign up for the discussion group, send an electronic mail message "subscribe (your name)" to the Bitnet address listserv@uicvm, or communicate with the Project Editor of the TEI: C.M. Sperberg-McQueen Computer Center (M/C 135) University of Illinois at Chicago Box 6998 Chicago, IL 60680 Additional information on the Text Encoding Initiative, including membership lists of the working committees, is contained in the latest issue of the ACH newsletter. A copy of the publication is available for examination in the HCF. ------------------------------------------------------------ HCF Locations: South Hall 4421 Phone: 805/961-2208 Phelps Hall 5215 Phone: 805/961-8036 ------------------------------------------------------------ THREE COMING COMPUTING CONFERENCES Among the computing conferences scheduled for the next few months are three which are likely to be of particular interest to a number of UCSB computing humanists. Sponsored by the University of Texas at Austin and Texas Tech University, the Sixth Conference on Computers and Writing, "Writing the Future," will be held in Austin, Texas, on May 17-20. Information is available from either of the following: Fred Kemp English Department Texas Tech University Lubbock, TX 79409 E-mail: ykfok@ttacs.bitnet John Slatin English Department University of Texas at Austin Austin, TX 78712 E-mail: eieb360@uta3081.bitnet This year's ACH/ALLC Conference, sponsored by the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, will be held at the University of Siegen, in Siegen, West Germany, on June 5-9. The Conference alternates annually between Europe and North America. Information on this year's ACH/ALLC Conference can be obtained from: Prof. Dr. Helmut Schanze Universitaet-Gesamthochschule Siegen Postfach 101240 D-5900 Siegen Federal Republic of Germany E-mail: gc130@dsihrz51.bitnet The Third Biennial Conference on Teaching Computers in the Humanities, sponsored by the Association for Computers and the Humanities, will be held on the Bronx campus of Fordham University on June 23-25. For additional information, please communicate with: Craig Brush Modern Languages Department Fordham University Bronx, NY 10458 E-mail: brush@fordmurh.bitnet Further information on these and other coming conferences involving the uses of computers in the humanities is available in the Humanities Computing Facility location in South Hall 4421. ------------------------------------------------------------ A COMPUTER SLEUTH No fan of true tales of espionage and detection should miss Clifford Stoll's _The Cuckoo's Egg_, published by Doubleday, a delightful story of computer mystery and maneuver. Stoll is the astronomer who during a stint as a embryonic computer wizard several years ago at the Lawrence Berkeley Laboratory, close by the Berkeley campus of the University of California, becomes intrigued by a minor 75 cent accounting discrepancy in the computer center records and sets out to find the source of the variation. First he discovers an unauthorized user on the computer. Then he notices someone using the account of a person who has been gone from Berkeley for over a year. Stoll starts to monitor the computer operations, camping out in his office at night and setting up printers to record the complete traffic. Rather than locking out the intruder, he decides simply to observe the mysterious comings and goings. Who is using the account? It seems the stranger is coming in on a telephone line, but a special kind of telephone line. It belongs to Tymnet, a communications network which allows a user to gain access to a distant computer by using a local telephone line. The uninvited visitor could be coming in from anywhere. As Stoll waits and watches patiently night after night, the mysterious hacker shows an extraordinary persistence in trying to break into a number of sensitive military and civilian computers one after the other all across the country. What is going on? How can he find out more about the hacker's activities? The book tells of Stoll's enthusiastic and strenuous efforts over the next several months to identify the hacker and pinpoint the geographical source of the intrusions. Included along the way are descriptions of Stoll's frustrating encounters with the various mysterious governmental agencies which he tries to interest in the problem, generally with little success. Eventually Stoll's perseverance is rewarded, and the villain is finally run to earth in a secret lair. And in some ways a very unexpected lair it is. The book concludes with Stoll's move to the Harvard- Smithsonian Center for Astrophysics and his later encounter there with the nasty Internet "worm," the leading evil electronic creature in the famous computer hacker case which has been in the courts and news of late. Stoll was also the subject of an amusing interview in the February 1990 issue of the Smithsonian magazine. It forms an enchanting complement to a fascinating book. --Eric Dahlin ------------------------------------------------------------ REACH is published monthly by the Humanities Computing Facility of the University of California, Santa Barbara. Advisory Committee: William Ashby French & Italian Alva Bennett Classics Edward Branigan Film Studies John DuBois Linguistics Gunther Gottschalk, Chair Germanic, Oriental & Slavic Allan Grapard Religious Studies Barbara Harthorn Interdisciplinary Humanities Center Gerald Horne Black Studies Albert Lindemann History Ursula Mahlendorf Women's Studies Michael O'Connell English Giorgio Perissinotto, Vice Chair Spanish & Portuguese Nathan Salmon Philosophy Guadalupe San Miguel Chicano Studies Burr Wallen Art History ------------------------------------------------------------ HCF Coordinator & Editor of REACH: Eric Dahlin Phone: 805/961-2208. E-mail: hcf1dahl@ucsbuxa.bitnet ------------------------------------------------------------ REACH is produced on an IBM-AT, using Microsoft Word, Version 5.0, and Xerox Ventura Publisher, Version 2.0, with camera ready copy printed on an HP LaserJet II. Printing is by UCSB Printing & Reprographic Services. ------------------------------------------------------------