Digitization Information, Standards: Data Gathered by Robert Kraft for report at SBL/CARG session 24 November 1998, Orlando FL Some Issues for Consideration: Are there special needs in CARG areas of interest that call for special standards? If so, how flexible will the standards need to be? Examples: Manuscript studies (lexicography, paleography, textual criticism, etc.) Archaeological data (inscriptions, objects, architecture, plans, etc.) Photographs and Art (reproductions) Images in motion Levels of desired standards: Imaging (creating the digitized materials) "Metadata" (descriptive records of processes used, etc.) Some Convenient Links to Relevant Materials: Jay Treat's resources page at CCAT (UPenn) http://ccat.sas.upenn.edu/hot/mss.html The ABZU page for Papyri at Chicago http://www-oi.uchicago.edu/OI/DEPT/RA/ABZU/ABZU_REGINDX_EGYPT_PAP.HTML The Labyrinth page for MSS at Georgetown http://www.georgetown.edu/labyrinth/subjects/mss.htm Time & Bits: Managing Digital Continuity Getty Conservation Institute - Getty Information Institute - The Long Now Foundation http://www.gii.getty.edu/timeandbits/links.html [from a library and art history perspective, lots of information] See also the Special Issue on the Use of Computers in the Study of Ancient Documents in Literary & Linguistic Computing [= L&LC]12.3 (1997) Some Relevant Projects (index to more detailed sections that follow): 01. APIS = Advanced Papyrological Information System Proposal (1994) http://odyssey.lib.duke.edu/papyrus/texts/APISgrant.html [standards for the major joint papyrological project] [camera, 600 dpi, TIFF & JPEG storage; 150/72 dpi GIF displays] 02. Cairo Geniza Project at Cambridge ENG http://www.lib.cam.ac.uk/Taylor-Schechter/GOLD/digital.html [notes on procedures used for this pilot project on Hebrew MSS] [camera, 300 dpi/24 bit, JPEG delivery & storage] 03. CSAD = Centre for the Study of Ancient Documents (Alan Bowman, Oxford) alan.bowman@christ-church.ox.ac.uk http://www.csad.ox.ac.uk/CSAD/Images.html#images [initial focus on inscriptional materials at 150/300 dpi] 04. The Perseus Project http://www.perseus.tufts.edu/ gcrane@emerald.tufts.edu [various types of digitization in interaction with ancient texts] 05. Thesaurus Musicarum Latinarum (TML) http://www.music.indiana.edu/tml/tmlintro.htm [a major project on Latin Music Theory, uses GIF format] 06. Bodleian Library, Oxford Towards an Image Catalogue . . . http://www.rsl.ox.ac.uk/imacat.html [images from slides of MSS; pilot project using JPEG & GIF] 07. Digital Scriptorium, University of California at Berkeley http://sunsite.berkeley.edu/Imaging/ [proposed standards for digital libraries; outlines issues] 08. The Art Museum Image Consortium (AMICO) http://www.amico.net/docs/dataspec.final3.shtml [a major cooperative museum initiative] [1024 x 768 pixels, 24 bit color; uncompressed TIFF format] 09. Relationship to the Text Encoding Initiative http://lcweb.loc.gov/catdir/semdigdocs/hockey.html [some notes on "metadata" treatment by Susan Hockey] 09.1 TEI and XML in Digital Libraries http://www.hti.umich.edu/misc/ssp/workshops/teidlf/teigrp3.html 10. SURVEY ON PHYSICAL PREPARATION OF MATERIALS TO BE DIGITIZED http://www.rlg.org/preserv/joint/index.html [a recent conference to discuss these matters; results forthcoming] --- 01. APIS = Advanced Papyrological Information System Proposal (1994) http://odyssey.lib.duke.edu/papyrus/texts/APISgrant.html See also Roger Bagnall, "Imaging of Papyri: A Strategic View," in L&LC 12 (1997) 153f. (c) Images of the documents. The APIS partners have as part of their preparations for this application conducted a study of digital imaging technology, funded by the Commission on Preservation and Access, the report of which is included here as Appendix 8. The conclusions of this study are that true 600 dpi imaging through digital cameras (not flatbed scanners) is an appropriate minimum standard for images. For views of the entirety of large documents, 300 dpi images will be included, but 600 dpi or higher images of portions of the original. Papyrologists using these images, with the advantage of modern computer tools for text reconstruction, have consistently found them much more valuable than traditional photographs, either in black and white or in color, and 600 dpi provides sufficient resolution and color accuracy for almost all purposes. Some papyri or parts of papyri with exceptionally dense information will be captured at higher resolution. In Appendix 8 we detail the standards for file format and other pertinent aspects of the imaging that we will adopt. These standards, which are in line with industry standards and projects like the Vatican Library Project, will make images fully interchangeable among the institutions involved and allow ready consultation over the Internet. In the case of relatively small papyri, our study concluded that 600 dpi imaging was feasible with existing digital cameras, and that such cameras would produce imagery that would never need replacing. For larger papyri, however, it is not clear that future advances in imaging technology will not bring considerable advantages. For papyri over the size of 17 x 25 cm, therefore, we will also produce 4x5 color photographic transparencies. Experience in reproducing paintings and other fine arts has shown that these are sufficient to allow retaking of digital imagery as the capabilities of the latter improve. APIS will also eventually make use of multi-spectral imaging (MSI), a technology developed at the Jet Propulsion Laboratory (Pasadena) and applied to some Dead Sea Scrolls fragments with considerable success. MSI uses bandwidths in the extreme infrared to bring out writing by identifying what part of its spectral signature is unique to it and extracting that bandwidth. This technology also uses a digital camera, but with a special lens designed for sampling the spectrum at specified intervals of bandwidth. At present, MSI is still a rapidly developing technology, and the cameras that use it are not interchangeable with those supporting the standard color imaging already described. It is also useful only for that minority of papyri (and, importantly, ostraca, where contrast is often poor) where there are significant difficulties in bringing the writing out from the background. APIS plans to bring MSI into use in Phase 2, after the materials on which it will be used have been identified; at present it is impossible to estimate on what part of the collections it would be applicable.[7] All of the collections have existing bodies of photographic negatives made for various purposes in the past, and these will be scanned at an early stage. Michigan also has many photographs of texts once there but now returned to the Egyptian Museum in Cairo, and these photographs can also be digitized and included. Ultimately the vast resources of the International Photographic Archive of Papyri (centers in Brussels, Cologne, Oxford, and Urbana) can be drawn on as APIS grows beyond the original six collections. The project will need to resolve intellectual property issues connected with images when a wider range of repositories is included, particularly because it will not be possible to impose any standard policies about matters like publication permission on foreign institutions that adopt our standards. --- 01.1 From John F. Oates' Progress Report 1998 http://odyssey.lib.duke.edu/papyrus/texts/report.html B. IMAGES Each text contains a field through which on-line images can be accessed, where they exist. Links to numerous published texts are in place (from P.Bad. II, IV, VI, P.Carlsberg, P.Koeln VIII, P.Tebt. I, II, III 1, P.Oxy. LIX. Some of these images also provide links to the Greek texts in the DDBDP. This is an obvious locale where a gateway page on an APIS Web site can be of great use in facilitating the use of image and text together. --- 01.2 Imaging the Duke papyri by Peter van Minnen http://odyssey.lib.duke.edu/papyrus/texts/imaging.html In the course of the first year of the project it was decided to make digital images of the papyri instead of photographs or slides. The advantages of making digital images directly from the originals seemed obvious at the time. The imaging could be done in-house, the images would preserve the dimension of depth as no photograph would, they would allow manipulation afterwards, which could serve research as well as teaching, and they could be shared with other institutions over the internet. The imaging would de done in the second year of the project by the librarian hired for the purpose in addition to making catalogue records. It was expected that this would entail an extension of the project with a third year, but this seemed also indicated by the amount of time it would take the papyrologist to deal with the remaining number of papyri. For the imaging a flatbed color scanner was bought in September 1993. Color was deemed superior to grayscale, especially since the images were supposed to provide the archival backup to the collection. Using color would of course increase the size of the files. The scanner bought was a UMAX 800, a 24-bit scanner that allowed scanning up to 800 dpi. A digital camera was considered too expensive at the time. It was not even clear whether reliable digital cameras were already available commercially. The financial advantages of a flatbad scanner over a digital camera were more important than the disadvantages: the lack of a light source behind the papyrus to light up holes and the clumsiness of scanning upside down. Since the Duke papyri would be scanned after they were framed, however, the last-mentioned disadvantage seemed trivial. Experience has shown that scanning through glass has another disadvantage. The effect of the glass on the registration of color by the scanner is quite noticeable. This also goes for using a digital camera on papyri framed in glass. During the first year of the scanning project the color spectrum of the scanner was not calibrated. The color of the scans was corrected manually and only approximates the colors of the originals. The second year, a special calibration tool was purchased that goes some way to correct for the glass, but it turned out to be unavoidable to manually correct the scan. Taking papyri out of the frame for scanning purposes would not be feasible for those papyri that consist of multiple fragments or are too fragile to handle upside down. Scanning unframed papyri seems impossible if it is done by an outside vendor. Originally it was not clear whether potsherds, wooden boards and lead tablets could be scanned at all, but these three-dimensional objects posed no problem. Their surfaces are after all rather smooth. It is the fibrous structure of the papyrus itself that turned out most difficult to capture electronically. The UMAX 800 proved better at it than some other flatbed scanners, but it has been replaced in October 1995 with a Sharp JX-330, which scans much faster and records colour more accurately than the "old" UMAX 800. The angle of the light source used inside the scanner determines how the fibres will show up on the scan. The scanner reads the image in three times and corrects it as it goes along. The computer used for the scanning is a Macintosh Quadra 800 with 1GB of memory, 32MB of RAM and a 24-bit color video card. An accelerator board (Storm) was also installed. The program used to generate the scans was Adobe PhotoShop version 2.5, later version 3.0. This seemed quite adequate for both the production of the scans and the use of the images in research and teaching. As a backup digital tape(s) are used as well as an external hard drive (DataStor), acquired in February 1995. When the project is finished the tapes with the archival scans will be kept available at Duke through the computing center. The derivatives will be stored on a server and remain universally accessible in combination with the catalogue records and other supporting materials. Originally the color scans were made at 300 dpi, but after about one year of experimenting this was raised to 600 dpi. This would create much larger files, but the cost of storage media is dropping fast. The Advanced Papyrological Information System, an undertaking of several American universities holding substantial collections of papyri (Columbia University, Duke University, Princeton University, The University of California at Berkeley, The University of Michigan, and Yale University) has adopted 600 dpi as the standard for digital images of papyri. The reason for using 600 dpi is simple: to read original papyri papyrologists currently use microscopes that enlarge 4x, 5x, 7x or 10x; 600 dpi images can be blown up 8x on a 72 lpi screen, the kind most commonly in use today. With a digital camera a consistent 600 dpi rate can be achieved by keeping the camera at a fixed height on its stand. It was decided early on to adjust the quality of the image through a minimum of manual operations. The color of the scans is off for reasons stated above and this needs to be corrected in the archival scans. To this end the histograms have been manipulated minimally through Adobe PhotoShop. To increase the contrast of the writing, the most important consideration for specialists, the target audience most likely to use the archival scans, the unsharp mask function has been applied consistently. The data are saved in different formats. The archival master set consists of 600 dpi scans stored in TIFF with LZW compression. Another set of 600 dpi scans are stored with JPEG compression. These scans will be used in research and teaching at Duke and at other institutions on request. Two derivatives are made of each 600 dpi scan: a 150 dpi and a 72 dpi scan. Both are stored in GIF and made available over the internet. The relatively small file sizes should cause no problem. The 72 dpi scan will show up life- size on a viewer such as Netscape using a 72 lpi screen. The 150 dpi scan will show up twice enlarged or can be blown up to twice its size, which allows the more difficult scripts to be read with ease. But even the 150 dpi scans will not provide enough detail to deal with problem spots. This will, however, only be a problem for specialists interested in a particular papyrus, and they can always request a copy of the 600 dpi scan. It seemed more serviceable to all parties concerned (specialists and non- specialists) to have images of all Duke papyri available for quick consultation all the time. A review of a book on the application of digital technology to ancient manuscripts such as the Dead Sea Scrolls appears in the Bryn Mawr Classical Review. A more technical description of digitizing papyri has been prepared by a team of Finnish scholars working on the carbonized papyri from Petra in Jordan. http://www.cs.hut.fi/papyrus/ "Recording, Processing and Archiving Carbonized Papyri" Antti Nurminen, 34044T, andy@cs.hut.fi TABLE OF CONTENTS 1.Abstract 2.Introduction 1.Papyrus 2.Digitizing Images 3.Image Processing 4.Hypermedia 3.Related Work 4.Recording Carbonized Papyri 1.Measurements 2.Experimental Setup 3.Photography and Photo Lab Work 1.Illumination 4.Scanning Photographs 5.Direct Digitizing 5.Miscellaneous Imaging Methods 1.Thermographic Imaging 2.X-ray Imaging 3.Stereographic Imaging 4.Live Video Recording 6.Image Processing 1.Image Analysis 1.Histograms 2.Profiles 2.Image Processing Tools 1.Image Thresholding 2.Histogram Modification 3.Edge Detection 4.Noise Reduction 5.Deviation Method 6.Papyrus Structure Elimination 7.FFT Filters 8.Miscellaneous Viewing 3.Combinations of Digital Filters 7.Digital Image Archiving with Hypermedia 8.Conclusions 9.Acknowledgements 10.References See also D. Obbink, "Imaging the Carbonized Papyri from Herculaneum," L&LC 12 (1997) 159-62. Another application of digital technology to the study of Demotic papyri has been described in a congress paper by Janet H. Johnson. http://www-oi.uchicago.edu/OI/PROJ/DEM/PUB94/CGP/CGP.html COMPUTERS, GRAPHICS, AND PAPYROLOGY By Janet H. Johnson, Professor The Oriental Institute The University of Chicago (This article originally appeared in Proceedings of the 20th International Congress of Papyrologists Museum Tusculanum, Copenhagen 1994, pp. 618-620, and is made available electronically with the permission of the editor. 01.3 APIS at University of Michigan http://www.hti.umich.edu/a/apis/ (technical specs) See also Trianos Gagos "Advanced Papyrological Information System (APIS): The Michigan Experience," Literary and Linguistic Computing 12.3 (1997) 155-57. 2. Specific guidelines for the full-size image -- If there is any writing on the verso, make a full-size image of the entire surface in order to register the position of the writing on the surface (the 600 dpi image will capture only the close image). Make note of all verso files that are captured in the imaging process, but have not found their place in the publication. This can be done in the FileMaker Pro database field, called 'side section', e.g. r/v (PMich has recto only). 3. Specific guidelines for the 600 dpi images: -- one must assure sufficient overlap, otherwise (especially with curving and uneven lines) one might miss information. The minimum amount of overlap should be one line on the horizontal axis and two words on the vertical axis. Ideally, cropping should be done along word-ends. -- cropping becomes a complicated issue with images 10x12" and larger which consist of 4+ tiles. One must make sure that the tiles are of equal size which can be done by keeping the previous image on the screen while cropping. ( Tiles of inv. 281 are left uneven). Make sure that cropping is done along the same horizontal line and tiles can be matched up easily. -- when there are big gaps in the papyrus (half of left margin missing, e.g. 281 or gap in the middle) one should photograph the empty space as well, because it can be important for making conjectures. -- the technique for photographing a missing part of a left margin is as follows. Align a ruler along the existing part of the margin and put a yellow post-it mark along the line to help determine the width of the empty space that needs to be included. If a part of the post-it note gets into the image, it can be easily cropped out. -- in the case of several lines on the verso, photograph only the portion containing the text. The thumbnail version gives the position of the writing on the surface. -- do not include ruler in 600 dpi images -- the order in which the 600 dpi images were taken should be recorded in the imaging database. The arrangement follows several patterns which can be expressed with a grid attached to each individual file and the type of arrangement recorded in the database. Advanced Papyrological Information System Michigan Papyri On-line Catalogue List and Description of Fields LIST I. Background & Physical Properties Inv. No. Section/Side Publ./Side Library Location Connections Material Items Size Lines Mounted Negative Negative in Copenhagen Conservation Status Notes on Preservation Palaeographic Description Publication status II. Contents Date Origin Provenance Acquisition Language Genre Author Type of text/title of work Content Subject headings Persons Geographica Translation III. Information on Publications 1st Publication: Editor, Series, Year, Pg/Nr., Photo SB; Corrections Republication: Editor, Series, Year, Pg/Nr., Photo SB; Corrections Further republication: Editor, Series, Year, Pg/Nr., Photo SB; Corrections Bibliography Assignments, date; research status Electronic editor, date; revision history Notes III. System & Image Metadata Image source Image arrangement Extent of the image Full size front Full size back 600 dpi front 600 dpi back Availability/System requirements Scanning Medium Time to scan Scanner initials Creation of image Institution Date scanned IV. Information on Cataloging Cataloger initials Date Cataloged Check Transliteration rules for place and person names. --- 02. Cairo Geniza Project at Cambridge ENG http://www.lib.cam.ac.uk/Taylor-Schechter/GOLD/digital.html Producing images The images are obtained directly from the manuscripts. The digital camera is a Kontran Progres 3012, which has an Adobe Photoshop plug-in, which includes a calibration set-up. Adobe Photoshop is used for image manipulation. Colour matching is by eye. Sharpening is applied. Balance, contrast and brightness are adjusted by eye. The images are digitized at 300 dpi at 24 bits. Pixel detail is 2400x3200x24 bits or 3200x2400x24 bits. The originals are stored at 95% quality, and the delivery images are JPEG encoded at 80% quality. The delivery formats of the images are JPEG or PNG, depending on the browser. The storage format is JPEG. The images are archived in an automated tape archive and on cd-rom. --- 03. CSAD = Centre for the Study of Ancient Documents (Alan Bowman, Oxford) alan.bowman@christ-church.ox.ac.uk http://www.csad.ox.ac.uk/CSAD/Images.html#images See also C.V.Crowther, "Imaging Inscriptions," and Alan Bowman et al., "Imaging Incised Documents," in L&LC 12 (1997) 163-176. Building an Image Bank of Inscriptions Aims The imaging project for inscriptions developed out of an initiative to reorganise and catalogue Oxford University's squeeze collection and make it accessible as a research resource to the widest possible audience. The project draws its inspiration from the work undertaken at Michigan and Duke universities within the framework of APIS to create a unified database of papyrological resources including texts and, above all, images. Methods The requirements of an image database of inscriptions differ from those of a papyrological image bank in a basic respect - inscribed documents are in general much larger than written papyri. This makes it both possible, while retaining the advantages of the digital medium, and, indeed, necessary, in order to keep image sizes within manageable limits, to capture images at a lower resolution than the 600 dpi archival standard prescribed by APIS for papyri. During the initial experimental stage of the project images are being captured at resolutions of 150 dpi and 300 dpi. 150 dpi images are more than adequate for most purposes, but, in order to ensure that the images in the database remain useful for as long as possible, it has seemed better to set the higher resolution as the standard to be used in the second phase of the project, when representative corpora of images begin to be built up. Images are taken directly from squeezes using UMAX Powerlook Pro and Mirage flatbed scanners and PowerMacintosh 8100 and G3 computers. Images of inscriptions larger than the scanning area are stitched together from separate scans made using the same settings. Minor adjustments for contrast are made in Adobe Photoshop 3.0. Photoshop's Unsharp Mask filter has been applied to the derivative sample images available from this page, but not to the original scanned images. Experiments at photographing squeezes have also been carried out using the Centre's Leaf Lumina digital camera, but, although the results were satisfactory, the flatbed scanner remains the preferred method of capturing images of squeezes. The primary images acquired for the database will be taken from squeezes. Because these are a secondary medium, the basic images contained in the image bank will necessarily be at least two stages removed from the originals that they represent. A squeeze, moreover, can represent only the inscribed face of an inscription. In a field of study in which mistakes can follow from inattention to the physical context and character of a document, these are potentially serious limitations to the scope of the database. Wherever possible, therefore, images of squeezes will be supplemented by photographs of the original monument. For studying the inscribed text itself, however, images of squeezes have considerable advantages, because they can be captured in controlled conditions and at a uniform scale. It is for this reason that they form the primary focus of the Centre's imaging project. The quality of the image will depend naturally on the quality of the squeeze, which, in turn, will reflect the preservation of the inscribed surface. The sample images presented here are all taken from good squeezes, but in the case of OGIS 78, although the result is very useable, the difficulties of representing severely eroded surfaces are beginning to become obtrusive. It may be that image enhancement techniques will eventually have to be used to provide acceptable representations of particularly difficult texts. This is an issue that is at present under consideration. --- 04. The Perseus Project http://www.perseus.tufts.edu/ gcrane@emerald.tufts.edu --- 05. Thesaurus Musicarum Latinarum (TML) http://www.music.indiana.edu/tml/tmlintro.htm INTRODUCTION TO THE THESAURUS MUSICARUM LATINARUM AND ITS USE (September 1998) The Thesaurus Musicarum Latinarum (TML) is an evolving database that will eventually contain the entire corpus of Latin music theory written during the Middle Ages and the Renaissance. It complements but does not duplicate the Thesaurus Linguae Graecae (TLG), Thesaurus Linguae Latinae (TLL), Lexicon musicum Latinum medii aevi (LmL), and similar projects such as the Center for Computer Analysis of Texts (CCAT). The TML, a project of a consortium of universities, is managed by a Project Committee, an Editorial Advisory Committee, and a Project Director. The Project Office is centered at Indiana University--Bloomington. Work on the TML has been partially supported by generous grants from The National Endowment for the Humanities, an independent federal agency. Graphics Files: The graphics files, by their nature, are somewhat more complex. The GIF format has been selected in preference to any other format for several reasons. First, GIF files are quite small, and thus they can be downloaded very quickly. Second, the format can be read on any of the major hardware configurations with simple conversion programs available as free- or shareware. Third, the graphics files can be displayed directly online by WWW clients. --- 06. Bodleian Library, Oxford Towards an Image Catalogue . . . http://www.rsl.ox.ac.uk/imacat.html Below are small, "thumb-nail" versions of four digitized images taken from the Bodleian Library's slide collection of manuscripts. By clicking on the appropriate images, the larger, higher resolution versions can be brought to the client's computer. Three methods of viewing each image are provided in order to examine their relative merits. By using an "external viewer" the user can manipulate the image with the tools provided by the viewer used by the client software. (Although the GIF and JPEG files are approximately the same size, the JPEG files are potentially of greater quality, being more highly compressed and capable of handling 24-bit colour.) If the image is viewed as an in-line image in an HTML file, it can be presented more in the manner of traditional paper publications with accompanying annotation, but also with hypertext links to other documents, etc. However, modern browsers will usually view all forms in-line, rather than spawning an external viewer. In this case, if they wish to take advantage of the tools provided with an external viewer, they should download the image first. The four images are taken from a collection of approximately 30,000 35mm slides of iconography from manuscripts held in the Bodleian Library, Oxford, United Kingdom. The GIF images were converted from JPEG images that had been processed, using Adobe Photoshop, from Photo-CD images produced from the original 35mm slides: they have been compressed approximately 10-fold from their original file sizes. --- 07. Digital Scriptorium, University of California at Berkeley http://sunsite.berkeley.edu/Imaging/ Articles & Papers Digital Imaging for Photographic Collections: Foundations for Technical Standards An article by Franziska Frey, Research Scientist, Image Permanence Institute in RLG DigiNews. Reproduction Quality Issues in a Digital Library System "Observations on the Reproduction of Various Library and Archival Material Formats for Access and Preservation" from the Library of Congress. Example Decisions on Digital Image Formats Practical decisions on image formats for preservation and network access by the UC Berkeley Library. Technical Recommendations for Digital Imaging Projects An excellent "best practices" document from Columbia University. Toward On-line, Worldwide Access To Vatican Library Materials An article from the IBM Journal of Research and Development. --- 07.1 Proposed Standards http://sunsite.berkeley.edu/Imaging/Databases/#standards This page was compiled by Howard Besser and refined by Rachel Onuf, April 1996. Last edited 11/18/97 Index to this set of links General Overviews of Digital Imaging Information about Metadata and Standards Ethical & Legal Issues Image Capture and Compression Image Quality & Conservation Technical Protection for Images Retrieval issues and search engines for WWW Databases Content-based Retrieval Costs of Digital Libraries Image Databases on the Net Cultural Repositories' Collections Unique or Special Image Databases Three Dimensional Images Sources for Digital Images Large-Scale Projects and Initiatives Smaller-Scale Projects and Initiatives Sample implementations with Image Browsers Digital Library Initiatives Additional Resources Multimedia Resources Research Papers Jobs Misc --- 07.2 Image Standards Needed http://www-personal.si.umich.edu/~howardb/ImageDB/standards.html Prepared for Napa CIMI Meeting by Howard Besser, Consultant, Getty Art History Information Program howard@info.berkeley.edu Major Questions: 1.Which information must be placed in the image header, and which can be placed in an accompanying text record? 2.For each piece of information we must define: a controlled vocabulary (or lack of control) and an identified field to place it in 3.For which of these can we use existing standards? Adapt existing standards to our needs? Work with other bodies to make sure the standards they adopt will incorporate our needs? Set the standards ourselves? Information needed to view the image 1.-type (bit-mapped, vector, video) 2.-format (TIFF, GIF, JFIF, PICT, PCD, Photoshop, EPS, CGM, TGA, ...) 3.-compression (JPEG, LZW, Quicktime, ...) 4.-dimensions and dynamic range 5.-CLUT 6.-Color metric (CMYK, RGB, ...) Information about the quality and veracity of the image 1.-source image digitized 1.-the source of that image (recursively) 2.-source type 3.-source ID 2.-institution responsible for creation of the digital image 3.-Information about the scanning process 4.-light source (full spectrum, infrared, ...) 1.-resolution 2.-dynamic range 3.-type of scanner (for color rebalance) 4.-date of scan 5.-scanning personnel (in house information) 6.-a journal/audit trail of what is done to each image and when it was done (crop, color, adjust, ...) 5.-Digital signatures, authentication, ... Description of depiction/surrogate 1.-VRA terminology on perspective, position, orientation, aspect, ... 2.-linking between various views of the same original Description of original object 1.-AITF categories 2.-Systematics 3.-AACR2 4.-etc Rights and Reproduction Information 1.-copyright on original, digital copyright, surrogate copyright 2.-name of rightsholder 3.-use restrictions (viewing, printing, reproducing, ...) Location Information 1.-URNs, URLs, URCs 2.-different versions (browse, hi-res, medium res) derived from the same scan The CIMI meeting has organized these in the following way (with the header information being a subset of the technical data fields): Image File with Header 1.Information needed to view the image 2.Rights and Reproduction Information Image Technical Data Fields 1.Information needed to view the image 2.Information about the quality and veracity of the image Content 1.Description of depiction/surrogate, surrogate history 2.Description of original object 3.Rights and Reproduction Information Location 1.Location Information --- 08. The Art Museum Image Consortium (AMICO) http://www.amico.net/docs/dataspec.final3.shtml The Art Museum Image Consortium (AMICO) is a not for profit association of institutions with collections of art, that have come together to enable educational use of the digital documentation of their collections. Together, AMICO Members are building a joint digital library documenting their collections; this Library will be available to University, Public Library and Kindergarten through grade 12 educational communities. AMICO MEMBERS: FALL 1998 1. Albright-Knox Art Gallery, Buffalo, NY 2. Art Gallery of Ontario, Toronto, Ontario 3. Art Institute of Chicago, Chicago, IL 4. Asia Society Gallery, New York, NY 5. Center for Creative Photography, Tucson, AZ 6. Cleveland Museum of Art, Cleveland, OH 7. Davis Museum and Cultural Center, Wellesley, MA 8. Fine Arts Museum of San Francisco, San Francisco, CA 9. The Frick Collection (including the Frick Art Reference Library), NY 10. George Eastman House, Rochester, NY 11. J. Paul Getty Museum, Los Angeles, CA 12. The Library of Congress, Washington, DC 13. Los Angeles County Museum of Art, Los Angeles, CA 14. Metropolitan Museum of Art, New York, NY 15. Minneapolis Institute of Arts, Minneapolis, MN 16. Museum of Contemporary Art, San Diego, CA 17. Montreal Museum of Fine Arts, Montr=E9al, Quebec 18. Mus=E9e d'art contemporain de Montr=E9al, Montr=E9al, Quebec 19. Museum of Fine Arts, Boston, MA 20. National Gallery of Canada, Ottawa, Ontario 21. National Museum of American Art, Washington, DC 22. Philadelphia Museum of Art, Philadelphia, PA 23. San Francisco Museum of Modern Art, San Francisco, CA 24. San Jose Museum of Art, San Jose, CA 25. Walker Art Center, Minneapolis MN 26. Whitney Museum of American Art 1. "MAIN IMAGE" CONTRIBUTED TO AMICO Each work of art contributed to the AMICO Library must be documented by at least one image, showing a full view of the work. 1.1 Resolution Images contributed to AMICO should have a minimum resolution of 1024 x 768 pixels. Members are encouraged to contribute larger files, to allow for user manipulation. [Existing images at resolutions below 1024 x 768 (such as those at 800 x 600 created by the National Gallery of Canada for its CD-ROM project) will be accepted as part of the university testbed library, with the understanding that they may have to be replaced in subsequent years. Those museums who intend to submit at a lower resolution, are requested to post notification to the AMICO Technical Operations Committee Hypernews Discussion. It is not recommended that images created for AMICO be at resolutions less 1024 x 768.] The University Testbed project will study what resolution is required for particular functions (e.g. project, lab study on a monitor) as an explicit part of its research agenda. These findings will be incorporated into the next round of submissions of image data to AMICO. 1.2 Bit Depth: All images will be 24 bit color. 1.3 File Format/Compression: Images should be submitted to the AMICO Library as uncompressed files in the TIFF format. This will enable future compression by a distributor without loss of quality. Certain research uses also require uncompressed images. AMICO members will also be given the opportunity to produce their own compressed files to the specifications of any particular distributor. These files would be provided in addition to the uncompressed TIFF and on a schedule defined with the distributor. Distributors will be asked to declare their image sampling/compression processes, for AMICO review and approval. AMICO members will also be given an opportunity to review the results of distributor's compression routines. 1.4 File Names All associated media files, text, image, multimedia, etc., will follow the same naming and linking conventions. Filenames will be entered in the appropriate linking field in the AMICO main catalog record: Related-Image-Identifier/link, Related -Document-Identifier/Link or Related-Multimedia-Identifier/link As there are many differing schemes in use in AMICO member institutions, file names will not be assumed to have any meaning. The characteristics of images will be recorded in their accompanying metadata records. ... 2. METADATA for RELATED IMAGE AND MEDIA FILES: Each image or other media file will be accompanied by a separate structured text metadata record, containing the minimum fields, as specified. Sample metadata records are posted on the AMICO Web site, along with sample image and media files. Delimiters and character set specifications are identical for metadata records and for the AMICO main catalog record. 3. RELATED MULTI MEDIA FILES Associated media files will be contributed to the AMICO Library in their native format. i.e. in QuickTime, WAV, RealAudio, etc. Files must be accompanied by a metadata record, and referenced by an entry in the Related-Multimedia group field of an AMICO Catalogue Record. Validation routines will test to ensure that all files cited are present and that all files contributed are referenced in a record. 4. RELATED DOCUMENTS Related textual documents can be contributed in any common format: page images can be in TIFF, GIF or JFIF file format text documents can be in ASCII, or Rich Text Format (RTF) marked up texts can be in Hypertext Mark-up Language (HTML) or Standard Generalized Markup Language(SGML) Each related document file must be accompanied by a metadata record, and referenced by an entry in the Related-Documents group field of an AMICO Catalog record. Validation routines will test to ensure that all files cited are present and that all field contributed are referenced in a record. --- 09. Relationship to the Text Encoding Initiative http://lcweb.loc.gov/catdir/semdigdocs/hockey.html 5. The TEI and Digital Libraries The TEI's application of SGML satisfies many requirements of the digital library. Its scope already covers the major text types and, because of the modular DTDs, it is easily extended to new text types. It can handle multiple views and interpretations of a text and, through the header, it provides mechanisms for documenting the text. Furthermore, the use of SGML is not restricted to text. It can be used to describe images and other non-textual material and thus provides the link between a digital image of a text and its transcription. The TEI has extended the cross-referencing systems within SGML to enable them to point to complete texts or sections of text stored elsewhere as images or transcriptions. --- 09.1 TEI and XML in Digital Libraries http://www.hti.umich.edu/misc/ssp/workshops/teidlf/teigrp3.html TEI and XML in Digital Libraries, June 30-July 1, 1998, Washington DC. Working Group 3: Structural and Administrative Metadata in Page-Image Conversion Projects Discussion Summary and Recommendations Recommendations: 1.Decide on common elements for structural and administrative metadata. Determine whether a common syntax/vocabulary for elements is necessary, and if so, define. Examine possibilities for maintenance of the list. Must be completed before making recommendations for transfer mechanisms. 2.Consult other stakeholders (e.g., preservation community, publishing community) when determining the list of common elements. The list should be communicated to the TEI leadership when complete. 3.Flesh out the implications of choosing SGML as a common syntax. (Consider TEI, EAD, RDF, etc.) 4.For those who want to use SGML, decide whether TEI should be the common DTD or whether we should allow a range of DTDs. 5.Look into means for facilitating development of and access to tools we need for collecting and sharing/exchanging metadata. This may involve exploiting the commercial market, but we cannot rely on commercial interests to serve our needs. 6.Raise level of consciousness among decision-makers in the library community about the realities of technical requirements, costs, etc. --- 10. SURVEY ON PHYSICAL PREPARATION OF MATERIALS TO BE DIGITIZED http://www.rlg.org/preserv/joint/index.html *** All responses welcome! Please submit by 14 September *** The Research Libraries Group (RLG) and the National Preservation Office (NPO UK/Ireland) are jointly sponsoring a conference 28-30 September which has the overall theme, Guidelines for Digital Imaging. In order to gather information about the processes employed in the preparation of materials for digital imaging, the Working Group on Preparation has devised a public survey tool. This survey is web-based and is available through the conference site: http://www.rlg.org/preserv/joint/preparation_survey.html or http://www.thames.rlg.org/preserv/joint/preparation_survey.html (from Europe) The following brief description is taken from the introduction to the survey: "The Working Group on Preparation is trying to assess current practice relating to preservation aspects, prior to, and during, digital image capture. It is hoped, eventually, to produce best practice guidelines which will reflect preservation needs/concerns during digital image processing." he following topics will be covered: - Guidelines for the selection of collections; - Guidelines for the preparation of materials; - Guidelines for digital image capture; - Issues and approaches to preservation metadata; - Progress toward evolving best practices in digital archiving. For each of the sections addressing "guidelines," working groups have been formed by the speakers in order to gather information in regard to the current state of each topic from an international perspective. It is through this information gathering and sharing process that we may begin to identify areas in which consensus might exist as well as areas where further research and development may be needed. For more information on the conference, visit the web site at: http://www.thames.rlg.org/preserv/joint/index.html (from Europe) http://www.rlg.org/preserv/joint/index.html (from North America). --- 10.1 Digital Master Quality http://www.rlg.org/preserv/joint/imaging.html Objectives: to produce good scans; and to ensure persistence Questions Are good scans and good images necessarily the same thing? What does "preservation quality" mean? Does the concept of "sustainable loss" apply to preservation imaging as it does to traditional photographic methods? More specifically, are comparisons to photographs, photocopies, or microfilm apt? What metric(s) do we use to judge the quality of digital masters? If subjective (e.g., going print-to-print), when it is acceptable to use uncalibrated devices? If objective, which targets and software now exist to make practice viable? Should we adopt the Quality Index as a method to define and describe image quality? When should targets be used? What are the optimal specifications (for any class of material or digital image) for pictorial reproduction? (Attributes include color, tone, and detail reproduction; dimensions, cropping, skew.) Other than pictorial reproduction, what other attributes of the digital master should be addressed in these guidelines? (Attributes include file format, compression, color encoding method, tone distribution, file header information.) Under what circumstances is less than best practice acceptable? In hybrid approaches where film is scanned ¾ including Photo CD projects ¾ is it acceptable to issue scanning guidelines without addressing/prescribing the preferred techniques for photography? Resources --- 10.2 Metadata issues http://www.rlg.org/preserv/joint/supdocs.html RLG working group deemed the following 16 metadata elements crucial to the continued viability of a digital image master: 1.Date 2.Transcriber 3.Producer 4.Capture Device 5.Capture Details 6.Change History 7.Validation Key 8.Encryption 9.Watermark 10.Resolution 11.Compression 12.Source 13.Color 14.Color Management 15.Color Bar/Gray Scale Bar 16.Control Targets //end//