Site: Keio University
HUMI Project
2-15-45 Mita, Minato-ku,
Tokyo, Japan
http://uk@tempus.keio.ac.jp
Date Visited: 25 March 1998
WTEC Attendess: J. M. Mendel (report author), R. Chellappa, B. Davis-Brown, L. Goldberg, R. Larsen, R. Reddy
Hosts:
This section was extracted from the typewritten remarks of Professor Takamiya and Mr. Iwai, who generously provided them to the panel at our request.
The Humanities Media Interface Project (HUMI) was launched at Keio University in Spring 1996, with the aim, among others, of digitizing major rare books and manuscripts--Western, Japanese and Chinese--in the Keio collection, including the Keio Gutenberg Bible. (Obtaining the Gutenberg Bible was remotely connected with Keio University's founding president Yukichi Fukuzawa, who saw the Gutenberg Bible on his visit to St. Petersburg as early as 1862.) The HUMI Project has been supported by the Education Ministry, the Information-Technology Promotion Agency (IPA), which is attached to the Ministry of International Trade and Industry, and Keio University.
The Keio University Library has a very large collection of rare books, including 8,000 Western rare books. Project participants seem to have a very progressive view of digitization of books, namely that, once digitized, the book can be reassembled any way a person wants. The Keio Gutenberg Bible has played a very important role in the HUMI Project. It was acquired not just for possession of an important article of Western cultural heritage, but because Keio University believes that modern research libraries should possess works significant enough to be digitized for the benefit of today's scholars and for the greater goal of preserving these treasures for posterity without further decay.
Prof. Takamiya summarized the reason for the HUMI Project. He pointed out that, "Digitization means more than just creating a passable facsimile on a computer screen. It is an opportunity for transcending the confines of the traditional format, with its bound pages. Once digitized, every component can be unbound and rebound in an infinite number of ways. The book becomes a new entity in 'cyberspace'-perhaps more vivid than ever possible in the real world, where rare books are often inaccessible. In the worlds of virtual reality we can re-experience it in a personal way. This means that digitized rare books, including the Gutenberg Bible, will never become forgotten relics of past wisdom. They will come alive every time someone has access to them. This, then, is the raison-d'etre of the HUMI Project."
According to Prof. Takamiya, "The HUMI Project aims to digitize manuscripts and rare books, process them, research them, and provide online access to multimedia representations. Data and results will be transmitted via high-speed networks and the Internet. The global academic community will thus be able to use this material for education and research. In terms of digitization, there are two roles for the HUMI Project: (a) to establish the foundation of digital technicalities from a viewpoint of research in humanities, and (b) to explore the possibility of producing what should be called digital bibliology by applying digital imaging techniques to history of the book, information management, and pedagogical presentation."
As an inter-faculty initiative organized by Keio University, the HUMI Project is envisioned as a first step in the establishment of a digital research library based on the rare book collection at Keio. Bibliographical analysis of the rare books and manuscripts has been conducted by members of the English Department, led by Prof. Takamiya; non-destructive testing has been performed at two research laboratories in the Faculty of Physics and Technology under the supervision of Profs. Ozawa and Inoue, respectively; and virtual reality applications have been developed under the guidance of Prof. Okude of the Faculty of Environmental Information. Technical aspects of the HUMI project have also been supported by various firms forming a consortium.
The HUMI Project began its activities by taking advantage of the Japanese government's request for participation in the electronic library pilot project. This governmental project has its origin in the fact that Japan was nominated as one of the main promoters of a global electronic library at an international summit conference. In 1995, the government established the Center for Information Infrastructure (CII) at Keio University's Shonan-Fujisawa Campus as a part of the activities conducted by the Ministry of International Trade and Industry's Information Technology Promotion Agency (IPA). The Electronic Library Pilot Project began by digesting the resources of the National Diet Library, including 10 million pages from Japanese rare books and other materials. Many of these digital resources have already been opened up to the public through the Internet.
The Keio University HUMI Project began its partnership with CII in 1997, and provided digital images of the Keio University collection, which included both oriental and Western rare books.
In March 1997 project members successfully digitized a complete set of images of the Keio Gutenberg Bible (about 650 images). The group used a digital camera jointly developed by NTT (Nippon Telegraph & Telephone) and Olympus Optical Company. Since this camera is an experimental one-shot 3-CMD model, it took only a few seconds to acquire a full color high-resolution image (2,048 x 2,048 pixels). With this camera and a special book cradle developed by the HUMI Project, the team also successfully digitized the Cambridge University Library copy of the Gutenberg Bible (2 volumes, about 1,300 images) within four days in November 1998.
Prof. Matsuda described the online catalog (OLC) of Western manuscripts and rare books in the Keio University Library. He emphasized non-traditional access points (in addition to author, etc. information) and that the catalog is non-static and can constantly change. Different experts can add to the index, based on their interpretation of an item, and the index is easily updateable. There are no current plans for collaboration within Japan on the OLC and digitization, in the area of rare books and manuscripts. To-date the project has digitized 5,000 pages in about two years. High quality and resolution are emphasized. The project managers are considering re-digitization again and again as higher-quality digitization equipment becomes available.
The WTEC team members then toured three photo labs in the old library, in which project staff members are experimenting with different camera techniques ranging from high speed digital cameras to slower, but higher-resolution line scanning cameras. One laboratory contained a very high-speed digital camera, an NTT-Olympus prototype that takes 5 sec/page to get a large image onto a display. Curvature of the page is a problem. Bleed-through from the back of a page (which is actually present because the rare manuscript was originally written and illustrated on both sides) needs to be removed digitally, if the viewer so desires. This camera is used to copy an entire book very quickly. In a second laboratory, a Dicomed digital camera back with a Mamiya RZ67 camera is used to digitize Western illustrated books. The camera has a viewfinder and takes about 2 minutes per page. In the third laboratory there is a Kodak Professional PCD Scanner 4045 which is being used to scan 4 x 5 and 6 x 7 films.
The team members then went to the new library where they were given a demonstration of a virtual tour of the monastery of San de Marco in Florence, Italy. This is in Prof. Okude's laboratory. Unfortunately, he was traveling; however, his student, Ms. Tomoko Ushiyama, gave the team a wonderful presentation. The tour was displayed on three flat screens using back projection; each screen has its own projector. No glasses were required. One thousand photographs were taken at the monastery. These were then used with a 3D modeling package to create the tour. Buildings and surroundings were all synthesized, whereas the artwork was all photographed. This required 200 MB of storage. On the virtual tour it is possible to zoom in on the many works of art. The tour is controlled using a joystick. This is a wonderful example of how digital information can be used for education and learning about artwork at a location that most people will not have the opportunity to visit.
A digital Gutenberg Bible was demonstrated. It was pointed out that today almost no one can read or touch this kind of rare book; but, in the virtual reality environment, researchers, even young students, can access the Bible directly. The human interface of turning the pages, makes researchers learn intuitively. One can see the Bible as close up as possible, and pages can actually be turned. The Bible can be opened and closed, and we can look at its cover. Signatures of its past owners can be found, so we can get to know who kept this Bible in the past. It's possible to tear a page and see several pages at one time.
WTEC's hosts then described two technical problems for their project:
Finally, the WTEC team had a short question and answer period with the Keio University hosts. On the question of university/industry collaborations, they invited computer companies to join in a consortium, and 20-25 joined. Hitachi has been very helpful; NTT provided the digital camera (they want to be able to share the results of the HUMI research just for publicity purposes); and, Hitachi provided digital imaging systems for removing stains and processing of very high-resolution images. Keio University has excellent connections with companies; their graduates now occupy very high management positions in the 20-25 companies and are very supportive of their work.
On the question of making the virtual reality space available to others, it was stated that the space will be made available to researchers, and it is not going to be used just for demonstrations.
On the question of what lessons were learned and can be shared from their experiences, project team members stated that international collaboration would be very useful on such a project. In addition, the two weeks it took to scan the 600 pages of the Gutenberg Bible scales up, so that their experience in doing this can be used to help estimate costs of other projects.
Answers to a large collection of questions that were sent ahead of the panel's visit are provided below. The questions were circulated among members of the HUMI Project and were then compiled and transmitted by Kenji Umeto, Secretary, HUMI Project, Keio University (uk@tempus.keio.ac.jp).
[Okude] = Naohito Okude, Professor, Faculty of Environmental Information, Keio University
[Hosono] = Kimio Hosono, Professor, School of Library Science, Keio University
[Shibukawa] = Masatoshi Shibukawa, Professor, Faculty of Environmental Information, Keio University
[Armour] = Andrew Armour, Associate Professor, Faculty of Letters, Keio University
[Iwai] = Shigeaki Iwai, Lecturer, Faculty of Letters, Keio University
1. Please describe your long term vision or scenario for:
[Okude]
Digital information technology offers the most extraordinary opportunities to teach and study the liberal arts in new ways. Digitization of the liberal arts drastically democratizes them. The people who developed computer literacy perceived it as a device of democratization from its inception. This democratization is the most powerful influence of digital technology on modern thinking.
[Hosono]
In the academic environment digitized and printed information should co-exist together. Roles that printed information like academic journals have played can not be completely replaced by digitized versions in the near future. Digital information is not necessarily reliable in terms of its quality, stableness and durability.
[Shibukawa]
The supposed digital library could be considered as logistics of supplying any necessary information to common people, which would thoroughly differ from what we call 'library' now.
The current library, though useful, is not able to provide all the information concerning people's everyday life (personal, domestic, professional, or social). This, however, is the goal of the digital library: it must enable people to "live" using the digital network, in which all the digitized, organized, and united information can be retrieved. It is not predictable when and how such a system will be realized; its dynamics would be a harbinger of a social change. The "library" has progressed for 5,000 years, and the realization of the digital one will still need some other years though it will come true before the quincentenary of the Gutenberg revolution. This view is based on the statements of Fukuzawa Yukichi ("Knowledge develops courage," 1879), P. Butler ("Books are one social mechanism for preserving the racial memory and the library one social apparatus for transferring this to the consciousness of living individual," 1933), and P. Barker (his scenario from "Polymedia libraries" through "Electronic libraries" to "Digital libraries," 1996).
[Okude]
The role of the nineteenth century library as the custodian of physical printed materials will remain, but the digital libraries will become distributed information managers of the links to other digital libraries. A grand distributed global digital library is the dream and the final goal of the digital libraries' endeavor.
[Hosono]
Digital libraries could be defined in several ways, such as networked information resources, digitization of traditional libraries (i.e., integration of digital collection and the systems for utilizing it), computer systems emulating fundamental library functions, etc. If they are recognized as digitization of traditional ones, they may not become popular in the near future because of copyright issues, difficulty to establish inter-organizational management policies, unstableness of methods and technologies to capture and represent digital contents, etc.
In addition to frequently discussed copyright issues, we will have to face several kinds of managerial ones. The example is the decisions related to what materials in the collection of a library should be digitized (i.e., priority issues). As far as we limit the objectives or aims of digitization to the research by the use of, or feasibility studies of a particular IT, issues may not be so tough. If we seek, however, digitization of works in an operational base, the situation will change drastically. In this case, the following must be defined adequately and this is not easy to do at all. Issues include the following: (1) Who is responsible for making decisions in terms of selection, processing, maintenance and management of materials that are to be digitized? (2) How can we carry out cooperative digitization activities with other institutions in order to avoid duplication and establish a network to share the products among them? (3) Where should a digital collection be preserved and archived as the last resort for academic research and studies?
In addition, a lot of issues are left unsolved in terms of managing digital information provided by publishers such as electronic journals. Above (3) is also applicable here.
2. How do national and international intellectual property laws and commercial regulations or practices affect development, deployment, and utilization of digital information?
[Shibukawa]
Technologies including code and electronic watermark might be effective against the illegal use of intellectual property to a certain degree. We, however, are pessimistic over the development of technologies that can exterminate illegal use, especially when considering the social opinion that knowledge and information are common property of mankind on the one hand, and the existence of a genuine interest in deciphering itself on the other-hand. We, for the present, endeavor to establish a proper standard of the license contract with social consent and a system to watch the obedience of the contract with technological assistance.
It is problematic (both to social/cultural development and that of business) to insist on ownership of the creativity, regarding it as property, to be concerned only about its illegal use, and to seek the way to solve the problem only through technologies. It is far more important to ferment a common opinion that a proper royalty ought to be paid for a valuable information whether it is a property or not.
3. Please explain how your organization sees the relationship between digital library and electronic commerce.
a. What are the economic or business models that apply to digital library in Japan?[Okude]
Education and learning is a lifelong pursuit. Within a few decades, people in Japan will come to the university in broken times and take more than four years to graduate; more years to study, and more study. This fragmented and discontinuous pattern is more the exception than the norm now, but students in the future will attend in broken times often at more than one institution. People will want to study and learn more in the future. This knowledge consumer market is the digital libraries' business domain.
b. How have those models influenced the directions digital library technologies have taken and will take in the future?[Okude]
Learning has always been a people-to-people process. The digital library technology will promote a computer-mediated people-to-people learning process. Technology will be required to expand the libraries' traditional areas, such as information retrieval and distance learning, to the new frontier of information work application to assist the distributed constructionism learning process, using the network system.
[Shibukawa]
Keio University, as an academic institute, has no need to relate the digital library to electronic commerce. Yet present higher education, a public enterprise though it is, could vie with broadcasting and newspapers in their fields, if it will be able to provide lifelong education.
In this sense, education in universities ought to take the economic and marketing model of the mass media as an example. Since the electronic trade which deals in intellectual property will more and more become dominant in such enterprises, the digital library, which is to support the future digital university, will probably provide the information service on the basis of the electronic trade itself.
4. Which sectors of the information technology economy (consumer goods, information services, hardware, business computing, educational technology, etc.) will be the main beneficiaries of future Japanese digital library technologies?
[Shibukawa]
When the digital library realizes the prospects offered in the answer to item A.1, it will produce far-reaching benefits to every concerned area. As a university, however, we hope that the digital library will grow beneficial to academic research and education.
[Okude]
Educational and "research" technology.
5. How do you see internetworking, the convergence of communication and computation, and new distribution technologies for digital data as changing the nature of digital libraries?
[Shibukawa]
If the prospects offered in the answer to item A.1 prove to be right, the development of digital information technology as well as the creation of information contents determines the function and structure of the digital library. But we should note that the rate of the development of internetworking, the convergence of communication and computation, and new distribution technologies for digital data are closely connected with the demand of society and people for them.
[Okude]
When people and organizations all have computers and all these computers are interconnected, they will buy, sell and freely exchange information and information services. The digital libraries will become distributed information managers of the links to other digital libraries.
6. What are the main trends in content creation technologies?
a. How would you characterize the various market segments for content creation technologies (publishing, entertainment, consumer electronics, education, business, government)?[Okude]
The real market for digital technology is not the "information market" but the "information work" market. The technologies for information work let a person or a computer program take in information, transform it, and send it out. Today's content creation technologies do not fulfill these conditions.
b. Do you see the need for specialized content creation and management technologies for the separate sectors?[Okude]
No. What we need are interactive technologies for information work in general.
c. Which sectors do you think will drive the industry in 5 years? 10 years?[Okude]
Education and learning will be the huge market when the distributed digital libraries and the information work technologies are available to the content creators.
[Shibukawa]
Here we cannot enumerate all of the segments because of limited space, but it can be said that in Japan digital contents have recently been created in various fields including news, library, museum, and the medical industry. The Database Register (the Ministry of International Trade and Industry, annual) reports the details.
I myself strongly feel it necessary to construct the image information database as an enterprise of a public sector; for we now reach the point where we should reconsider the information of the past with the assistance of graphic images, and there must be a great amount of such graphic images. Nonetheless, graphic images, which have not been regarded as an information medium, are neither collected nor organized, and are therefore unavailable. Books indeed pass on to the next generation some of the past information, but not all. As texts with graphic images could probably convey information in full, preservation of the graphic images of the past would provide a new perspective for the present and the future.
In the future, the public and business sectors will cooperate in, or compete for, content creation, and so will industries and companies within the business sectors; but, it is the marketability, that is, people's needs, that will determine the direction.
1. What are the public policy drivers of digital library in Japan?
a. How are ministries and agencies tasked and funded to implement these policies?[Hosono]
National Center for Science Information System (NACSIS), Information Technology Promotion Agency (IPA), and National Diet Library (NDL) are taking initiatives to promote digital libraries.
NACSIS is distributing electronic journal articles directly to scholars and researchers, not via university libraries. These journals are limited to the ones published from learned societies. NACSIS's main aim is to provide academic information to end-users as effectively and efficiently as possible via the network.
IPA, which is an extra-governmental body of the Ministry of International Trade and Industry, has focused its emphasis on the technological aspect of digitization and has financially supported R&D projects carried out by computer/network companies.
NDL has tried to digitize its unique collection to make clear problems such as copyright, user-interface, efficiency of operation, etc. The project has been directed at operational systems.
b. How does the government stimulate or partner with industry in the definition, standardization or commercialization of new DL technologies?[Hosono]
IPA has a strong direct partnership with industry and tries to support technological development related to digital libraries since such technologies have wider influences on other fields. On the other hand, the role of the Ministry of Education is indirect regarding technologies, since it financially supports university libraries as a whole when they intend to construct digital libraries. So far, at least three national university libraries have embarked on digitization projects.
2. What are the current public sector priorities and programs (education, health, social services, the arts and culture, etc.) for digital libraries?
[Hosono]
Since the core of digital libraries is the "contents" themselves that are to be digitized and/or utilized, the main governmental activities or programs should be to create and foster a good environment where large volumes of digital information can be easily created and disseminated. Thus it is vital to establish new copyright laws or revise existing ones and strengthen network infrastructure. Introducing new concepts, atmosphere, customs and institutions to encourage digitization activities is also required.
[Shibukawa] (Answer to B.1 & B.2)
The national policy of Japan for founding the digital library is complicated under the conflicting jurisdictions of the Ministry of International Trade and Industry, the Ministry of Education, Science and Culture (MESC), and the Ministry of Posts and Telecommunications. Only MITI has secured the source of revenue and carried out a plan for promoting the digital library.
Although it is the MESC that controls academic research and education in universities, it only directs universities to develop the "electronic library" function as an improvement of the university library services. It has, however, supported model digital libraries in a few national universities (e.g., Nara Institute of Science and Technology).
As a private university, Keio University participates in the Demonstrative Experiment in the Pilot Electronic Library, which the Association for Promoting the Information Enterprise (a division of the MITI) runs in partnership with the National Diet Library and commercial publishers. Financially supported by MITI, it aims at becoming an incubator of digital information technologies, library technologies in particular; but, its technological level, based on the graphic image database, is no higher than that of the digital library of Nara Institute of Science and Technology (though the system was to be improved in 1998). In any case, the Japanese government has no clear policy on the digital library.
3. What is the expected role of digital library technology for public, school, research, technical libraries and museums in Japan?
[Hosono]
The expected roles are many. Following are examples: (1) saving spaces, that printed collections have occupied; (2) strengthening a library collection that is short in volume and variety; (3) expanding the service areas that are physically limited to the inside of a library and to the registered users of the library (this implies that not only will the service be provided to remote users, but also to new customers that formerly were not allowed to receive it); (4) increasing the variety of information that can be searched; and, (5) increasing the service menu (e.g., electronic reference service and online full-text document delivery).
[Shibukawa]
According to Barker's scenario (see the answer to A.1), it is the "traditional" library that will develop from the "book library" through the present "polymedia library" to the "electronic library," and this will also be the case with the museum. University libraries and national museums will play the role of leaders. The digital information and database produced by each library or museum will be rapidly organized.
Such digital information, however, is not composed of new intellectual content, but a legacy of the past, so to speak. New contents have been provided by publishers, newspaper publishing companies, and broadcasting stations. "Electronic publishing" will therefore take the leadership in the development from the "electronic library" to the "digital library" and the "digital museum". In addition, educational and academic research institutes, especially universities that are proposing the "digital university" projects, will play an important role in creative activities.
4. In what ways do you see the traditional skills of librarians, archivists, curators, and information specialists as being utilized or changed by the presence of increasing amounts of digital information?
[Hosono]
These skills are utilized fully for the cataloging, indexing and searching of digital information. Since digital information appears in different representation forms, easy and adequate identification of each item is crucial. Discussions about metadata imply the importance of know-how in traditional cataloging practice. Indexing and searching techniques, having been developed in the library and information science field, are also fundamental for managing digital information.
[Shibukawa]
Following Butler's opinion on the raison d'etre of the library (see the answer to A.1), it can be said that the library, as a device to convey information, must change in accordance with the change of the form of information from "book" to "digital material." Librarians, archivists, curators, and information specialists must also adapt themselves to the change. They ought to develop and acquire the professional skills to provide people with necessary information about books (museum pieces or art objects), digital contents, and computers at their command. Of course, this does not negate the present skills, with which librarians have long administered intellectual contents, since the age of the Alexandrian library or earlier.
Hereafter, however, professional education needs a new curriculum that goes beyond the traditional framework of "book and book library." It is most important to acquire the skills to produce digital information and databases and to manage the "cyber collections" which will come to existence on the cyber network. Such skills are also necessary for the librarians in active service.
1. Please discuss your approach to digital collection development end-to-end including:
a. Capture[Iwai]
We have been trying the following methods for capturing digital images of rare books: (1) Kodak PhotoCD Imaging Workstation and analogue films (4 x 5, 6 x 7, 35 mm); (2) crossfield drum scanner and analog film (4 x 5); (3) scanning camera (Dicomed Field Pro with Sinar 4 x 5 & Mamiya 6 x 7 on WindowsNT); (4) one shot one CCD digital camera (Kodak DCS460); (5) three shots one CCD digital camera (Leaf/Scitex with Mamiya 6 x 7); and, (6) one shot three CCD digital camera (NTT/Olympus SHD View-2 beta version).
In making a comparatively low resolution (approx. 2,048 x 2,048 pixels) but high-quality "digital facsimile" of rare books, we use "(6) one shot three CCD camera," whose advantage is a good balance between capturing speed and quality. At present, the master copy should be produced from big size analog film.
b. Catalog[Hosono]
Descriptive cataloging practice for printed books can be applied widely but should be expanded to include such information about technologies and/or methods used for capturing and representing digital information. Discussions about metadata are indispensable.
c. Index[Hosono]
Indexing of digital information is extremely difficult if we expect high retrieval performance, since targets to be indexed are too dispersed to specify and standardize. In particular, index terms that enable us to get access to digital information from its subjects or contents are difficult to determine. A possible way, although its performance is limited, may be to compile a special thesaurus consisting of controlled index terms and to use it as a guide.
d. Representation[Okude]
I use virtual reality (VR) technology for representation. Bit-mapped, graphics-based supercomputers can run high-speed graphics that track human movement. Immersion, interactivity and information intensity are the three main characteristics of this technology. In the next 10 years, we can expect a widespread and growing experience of virtual reality in a variety of everyday educational and learning environments.
While much of the humanities research community still has ears only for information engineering professionals who speak of being digital, a growing number of humanities scholars are beginning to look at the complex tradeoffs and theoretical shortcomings of the vision of computer professionals. Some scholars are starting to fight digital technology with a Luddite passion.
My approach makes a premise of balance. Virtual reality representation of human artifacts meditates the merger of virtual reality with humanities scholarship. A holographic and multidisciplinary reality will be possible using three-dimensional imaging.
e. Search[Okude]
Being digital in a research library requires designing a post-Gutenbergian research model of humanities. Contrary to a general assumption that a hypermedia obliterates the past, digital technology is radically reconfiguring our understanding of history. Digital technology forces the recognition that texts are not higher than images. Computers rid us of the assumption that sensory messages are incompatible with reflection. Once digitized, fleeting images become available to anyone who "reads" them on a graphic computer. Imaging becomes a rich and fascinating mode for communicating ideas. Diverse phenomenological performances, whether drawings, gestures, sounds, or scents, will be rescued from the past by scripturalist professions.
To make an image search for humanities professionally, a serious training in visual proficiency is needed. Image search is an activity of focusing on transdisciplinary problems across multiple and linear disciplines in arts, graphics, film, video, or media production as well as their different histories.
[Hosono]
Searching Japanese texts does not have severe physical problems. Since there are, however, several ways to divide Japanese texts (sentences) into words, there are lots of alternatives for search terms. This means exact match methods are not so functional. Sophisticated fuzzy or approximate matching mechanisms for Japanese texts should be developed.
There are "keywords" or "shapes" as access keys for image retrieval. In terms of keywords it is helpful to develop a list of keywords that not only describe objects, phenomena or events but also represent human feeling such as "passion," "peace," "violence," etc. On the other hand, searching by shapes needs pattern matching mechanisms that still have a way to go in their development. f. Other technologies that are necessary for managing digital collections:
[Hosono]
Technologies are needed that convert a particular digital collection into another irrespective of the lapse of time. Since technologies will change drastically as time goes by, digital information produced in past years must be easily converted to the newest version or accessed by the newest system.
[Okude]
Technologies for distributed libraries are desperately needed. Each library offers its collections in electronic form. To users, the collection of worldwide distributed libraries must look like one uniform library.
2. What are your current technologies and methodologies for preservation and archiving of digital information?
[Okude]
Texture mapping and 3D real-time computer graphics.
[Iwai]
For high-resolution rare book images, we use disk array, DAT tape, CD-R, magneto-optical disk, and DVD-RAM.
3. How do you deal with the many and frequently changing representation formats for digital data? What formats do you currently use?
[Armour]
As regards formats, the problems we face are fundamentally no different to those faced by any enterprise today. We thus tend to choose the most commonly accepted formats for binary data-Microsoft Word, RTF, TIFF, JPEG, etc. It could be argued that, in fact, there are fewer problems faced today by "many and frequently changing representation formats" than 10 years ago; there is, for example, less interest in format-conversion utilities than there used to be. Our concerns are perhaps more about the preservation of information, whether it be accented European characters (not available in Shift-JIS) or fine image detail (JPEG lossy compression is only used for Web delivery). Provided that this information is stored in one of today's common formats, we assume that it will still be accessible after, say, 10 years, when conversion to a new format may be called for.
4. How can technical materials be made useful to both experts and to the average citizen?
a. What do we need to do to make digital information useful for other communities ?b. How can collections of historical records or of scientific images be arranged in order to promote use by scholars and school children ?
[Armour]
An answer to these questions would perhaps require clear definitions of the terms "technical materials" and "made useful." However, the average citizen might be assumed to have an interest, say, in local history and be prepared to sit in front of a computer monitor for, say, 20 minutes in order to satisfy his/her curiosity. The interface should obviously be as intuitive and thus as invisible as possible. Fortunately, the growth of the Internet is rapidly leading to a familiarity with (if not a consensus on) such interfaces. The qualities one would look for in the presentation of such information are clarity, simplicity, visual appeal, etc., and therefore "technical materials" would be kept at a different level, which the user would be free to access by, say, clicking on a button labeled "Tell me more" (a common technique). There could be several layers, into which the user could "drill down." Of course, the scholar would require some kind of shortcut to jump to these more detailed layers.
Alternatively, the user could initially identify him/herself by logging on as "Student" or "Expert." Fortunately, these problems are also faced by businesses (not everyone in an enterprise is equally well informed about all topics), and we can reasonably hope for the appearance of new tools and approaches from both academic and commercial communities. Of especial interest to us is the rapid evolution of the computer-based encyclopedias (such as the Encyclopedia Britannica, with its natural language search facility), some of which must similarly cater to different levels of expertise.
5. Are you doing retrospective digital collections of historical material or are you focusing primarily on creation of new materials? Are you converting bibliographic data as it relates to historical digital materials?
[No Answer Provided.]
1. What are the expected breakthrough technologies in the areas of automated cataloging, indexing, search, and analysis of digital multimedia content?
[Armour]
As regards textual materials, we can perhaps say that future progress in the area of automatic indexing will be incremental, as there are already many powerful tools available, with the ability to conduct "fuzzy" searches, proximity searches, etc. The search tools available on the Internet are continually being refined and made available for use locally or over intranets. What still leaves room for improvement is OCR, especially for difficult fonts or, eventually, hand-written materials such as collections of letters; a breakthrough technology in this area is sorely needed. A compromise for the interim is some form of pattern-matching, though here too the tools are as yet somewhat rudimentary.
Far into the future, we might hope for pattern-matching software of such sophistication (and involving considerable "expert knowledge") that it could be used for indexing and accessing image collections.
[Okude]
Enhancement of capabilities of networking, VR interface design, object oriented database[s].
2. What are the emerging technologies for creating, administering, searching and providing access to virtual or federated collections?
[Armour]
For centuries, the world's libraries have used virtually the same technology for acquiring, storing, and organizing their collections. In contrast, the Web-surely key to any "virtual or federated collection"-is evolving so fast that it is not uncommon for a technology to be superceded before it has had time to be adopted. While the new opportunities presented by such developments as Java, Dynamic HTML, FlashPix and Digimarc are widely welcomed, there is no denying the danger of investing significant time and resources in something that may be superseded in a few years or even sooner.
Internet technologies come and go, but among them Java-championed by Sun Microsystems-looks set to play a key role in defining the future of the Internet, even though its viability has come into question.
Clearer perhaps is the future of HTML, the lingua franca of the Web. This has long been recognized as being incapable of furnishing a foundation for the future development of the Web. At the same time, SGML has proved too difficult for most people to implement, leading to a compromise solution known as XML, or Extensible Markup Language. But this is not a compromise in the sense of "falling between two stools." XML has many advantages: it is based on existing international standards; is fully extensible and does not suffer from tag limitations; is internationalized (based on Unicode); offers simpler system administration of Web sites, and so on. These advantages, combined with the possibility that XML services may soon be made available at the operating-system level, make this a very attractive course for future development of any digital library/museum projects.
The Internet itself is suffering growing pains, which may be partially alleviated by the Active Node Transport System (ANTS) currently under development. This is an active network architecture that in effect will perform like a meta-protocol allowing for spontaneously generated protocols and will make the network as flexible as XML.
[Okude]
Java and Corba. Object oriented database[s].
3. What new technologies are being developed especially for the preservation and archiving of multimedia digital information? Is there movement toward common preservation technologies and methodologies that serve business, academia, government and consumer oriented instances of digital library?
[Armour]
Once information is digital, questions of whether or not it is "multimedia" or business/ academia/government-oriented are irrelevant from the point of view of conservation. The only real issues are (1) is the physical medium sufficiently stable (cf. acid paper and early film stock), and (2) will it be possible to access ("read") the information in the future? Librarians are constantly aware of the problems posed by a particular medium (such as the 5.25" floppy disk) falling into disuse, so they take steps to transfer digital data to new media (such as DVD). This process is never-ending, as we will never find the perfect storage medium; there will always be room for improvement. The format of the data is less of a problem; in theory, any format can be converted at any time in the future, though there may be some cost involved. Format conversion can be put off (till funds are available, for instance); media conversion cannot be delayed.
Rather than preservation, we have to find technologies that will serve our needs as regards security (including intellectual property rights), distribution, and image scalability.
[Okude]
Preserving technology is now reaching its maturity. What we have to care about is a distribution technology and image scalability technology.
4. What emerging technologies and data formats are most likely to enhance interoperability of digital data at all levels:
[Armour]
Database schemata are the most important. We have to incorporate distributed object-oriented technology into global digital library projects. The next most important topic is language: Unicode has yet to be widely accepted, and already its shortcomings are being criticized. We can, however, hope for some "Super Unicode" to emerge at some time in the not too distant future. Printer control languages, compression algorithms, document and page description formats are all ephemeral and of little consequence when looking at the larger picture. They can and must be left to commercial interests and the competition of the open market.
[Okude]
Database schema are the most important. We have to incorporate distributed object oriented technology into the global digital library project.
5. What is the expected volume of data in digital libraries in the next 2, 5 and ten-year timeframes?
[Okude]
Each library's data volume should remain as small as possible.
a. What technologies are you developing or depending on to manage such volumes of multimedia data?[Okude]
Object oriented database technology is needed.
b. How critical is the need for such technologies?[Okude]
Without this distributed object-oriented technology, there is no future of the digital library for the scholars and the people who use the libraries for their creative activities.
6. What relationships do you see between information appliances and digital library?
[Okude]
The digital library's data structure should be isolated from any specific hardware and device. Under this condition, information appliances are very useful; however, if these appliances require us accepting a specific data format, they should not be used for a research environment.
7. Do the requirements for digital libraries imply any specific requirements as to capacity, coverage, quality of service, or standardization of national and international communications infrastructure?
[Okude]
They imply an open architecture and deployment of distributed object oriented technology. Every library around the world should communicate with each other and contribute a consolidation of diverse human knowledge and experience.
8. What are the key technologies regarding multi-lingual representation, search, and cataloging of digital data?
[Okude]
Keep ISO10646-1 UCS. Make clear the distinction between the code for (1) everyday communication and (2) that for a special purpose. Use the distributed network system to provide the group code (2). So, whenever the people have to or want to use a special character set, they can obtain the code from the net and see the representation on the screen. Instead of creating the huge standard character library and carrying it within the computer, distribute the code and create the code when it is needed, and use it when needed. This approach is the same as that of distributed digital libraries. Letters as well as knowledge belong to infinite databases, so it is impossible to create a single universal repository.
[Hosono]
Machine translation systems and multi-lingual thesauri will be promising ones.
9. What innovative, multi-modal interfaces seem most appropriate for digital libraries?
[Okude]
Computer-human interface should be a central research agenda for digital libraries. Besides keyboards and mice, trackballs and joysticks move an object on a computer screen; and there are many other interface devices developed, e.g., gloves, helmets, glasses, bodysuit, and so on. These multi-modal interfaces, however, are not only immature but also are not intelligent. Future interfaces will be intelligent and will mediate communication between man and distributed computer networks, and will be more responsive to researchers' wants and needs. Lowering the threshold for researchers to engage the data in the digital libraries, new multi-modal intelligent interfaces can span the continuum from passive reception of research data to active creation of research results.
10. Please share with us your views on the role of standards in the evolution of digital library capabilities. How, specifically, is your organization involved in utilizing or defining standards that specifically affect digital libraries?
[Armour]
The digital library should conform to various types of standards: some academic (continuing established practices), some administrative (defining good management), and some technical (ensuring that data can be exchanged and shared). Academic standards can and are being applied successfully in the Information Age, although problems remain, particularly as a result of the transient nature of digital information. Administrative standards can be similarly based on accepted principles, with necessary adjustment. In this area, lessons can be learned from the world of commerce. It has been pointed out that in an information-based economy, selling a "product" (data)-often with minimal distribution costs-actually results in the seller gaining more data (information regarding the buyer). The same applies to a digital library and its resources.
Technical standards are perhaps the most difficult to cope with: they are always changing and their adoption usually involves considerable cost. While some academic institutions, such as Keio University, are trying to contribute toward future standards, it is more realistic to think in terms of selecting, testing and perhaps finding new applications for standards that will be set by large corporate interests.
[Hosono]
Standards should be considered from the technological and bibliographical points of view. In terms of the former, because of rapid advancement and resulting obsolescence of information technologies, it is doubtful how long a particular standard can continue to function. Thus, it seems better to develop very powerful and efficient conversion software or techniques in order to transfer digital information among different systems. At any rate, the life expectancy of technical standards seems to be shorter than the bibliographical ones.
Bibliographical standards, in which Dublin Core could be included, should be definitely established to assure easy and effective retrieval and use of particular digital information. They may include information about technologies used for digitization as well as description about contents.
[Shibukawa] (Answers to D.1-10)
As Keio University has not decided yet on an electronic or a digital library, we cannot answer questions practically. Under the present situation, however, we have two problems in digitizing Japanese rare books, one of which lies in translation. While it is desirable to translate all lines in Japanese works into English, we have only bilingual bibliographical and explanatory notes. We need more translators and enough finance to get bilingual full texts. Automatic translation systems, which are not available yet, will be of great use to us. The other problem concerns a difficulty of making reference to Japanese works. It is hard to articulate and classify parts of speech automatically in Japanese texts. Therefore, it demands a lot of work to compile an index for reference.
1. In what ways will digital libraries fundamentally change the ways in which children (K-12, college) are educated? What are the major obstacles to making this happen?
2. Will digital libraries increase the costs of education (K-12, college)? If so, who will pay for this?
[Shibukawa] (Answers to E.1-2)
Any systematization of intellectual information based on digitization will change the way to utilize such information at any level of education. Even if the cost is covered by tax or commercial profits, everyone has to bear it. Therefore, the cost should be shared only by users under mutual agreement. Moreover, it is important to infer how the market for digitized information will grow.