Site: Nippon Telegraph and Telephone (NTT) Corporation
Yokosuka R&D Center
Hikarinooka, Yokosuka-Shi
Kanagawa, 239 Japan
http://www.ntt.co.jp
Date Visited: 25 March 1998
WTEC Attendess: R. Chellappa (report author), B. Davis-Brown, R. Larsen, J. Mendel, H. Morishita, R. Reddy
Hosts:
NTT, celebrating its 50th anniversary, is undergoing a transformation from a telecommunications company to an information communications business, and eventually to an information distribution business. Three major thrusts pursued to realize this transformation are "Electrum Cyber Society (ECS)" "Megamedia" and "Next Generation Infrastructure."
As all the hosts and research demonstrations were drawn from the ECS thrust area, this report addresses this area only. NTT's vision of ECS, eloquently expressed by Mr. Toshiharu Aoki, Senior Executive Vice President and Senior Executive Manager of R&D Headquarters (NTT n.d.(a)), is electronic exchange of information products and money through secure networks. NTT's activities are focused on becoming a center of excellence in multimedia research through R&D and active participation in several national and international collaborative consortia and standardization efforts. Some of the notable activities include involvement in the Asian multimedia forum Photonic Network Forum, creation of ECS test-beds, cyber-society open experiments, "An Open Lab," and contributions to national social projects, such as the medical information network.
The budget of the research and development headquarters is approximately 5% of net sales. Over the last seven years, R&D expenditures have been around ¥3 billion. Roughly half of R&D expenditures are allotted to research laboratories.
NTT R&D Headquarters is divided into three Laboratory Groups (NTT n.d.(b)):
The hosts, led by Mr. Shinichiro Yoshida, represent the Multimedia Systems Laboratory Group. This group is divided into seven laboratories:
The laboratories are split across different R&D centers. Researchers and engineers use video conferencing facilities to keep abreast of related activities. The total size of the workforce has been steady at 8,500 over the last seven years. Of these approximately 3,000 are engaged in research, the rest being in development. Approximately 150 new hires are made every year, replacing those that are lost to academia, other subsidiaries, and retirement.
The site visit team was shown several demonstrations representing ongoing work in some of the laboratories in the Multimedia Systems Laboratory Group. These are as follows:
The network library system provides multimedia services based on a broadband ATM network. The network is served by Hi-Fi music, MPEG-1, MPEG-2 and digital library servers. Processing engines for voice recognition, search, Japanese/English translation and text-to-speech are provided. A key component in this network is a super-high definition display, at a resolution of 2,048 x 2,048 pixels, 24 bits/pixel operating at 60 frames/sec for video. The network library is being used for doctors' viewing of medical images, sight- seeing tours, teleconferences and on-the-fly machine translation between Japanese and English.
Text and content-based retrieval of video is a critical component of a digital library for automatic indexing and retrieval. Two demonstrations in this area were shown. One involves reading the Japanese captions from TV broadcasts so that topic- or concept-based video retrieval can be accomplished. This work is expected to be commercially available by the end of 1998. Key algorithmic steps involved are detection of frames that contain text, extraction of text region, character segmentation and recognition. Details of these steps may be found in Kurakake et al. (1997). The other demonstration was on ExSight, a multimedia retrieval system (Yamamuro et al. 1998; Kon'ya and Kushima 1998) using object-based image matching and keyword-based retrieval. Unlike pixel- or impression-based approaches, object-based approaches, such as ExSight, search over a large data-base using content. The steps involved include automatic object extraction, feature extraction (color, shape, etc.) and high-speed similarity matching. Query fusion (as a union of image objects) and high-speed browsing are provided as Java applets. Potential commercialization applications are in electronic commerce, digital museums (show all the pictures of a boy with a dog), and digital photo albums. Although primarily image-content driven, the system can accommodate keyword-based retrieval.
Electronic commerce is viewed as being one of the promising opportunities in the ECS thrust area. Major concerns in making this feasible are guaranteeing security, copyrights and maintaining the timeline of transactions. Two demonstrations illustrating how electronic money can be securely moved around between interested parties and how copyrights can be protected in the sale and distribution of digital objects were the highlights of electronic commerce activities over the network. In the demonstration of moving electronic money around, a smart card is used for making purchases from anywhere as long as one is connected to the network. This demonstration illustrated how secure transactions can be achieved.
When digital objects are marketed over the network, the sellers need to ensure that their copyrights are protected. The project InfoProtect demonstrates the secure distribution of images. The owner of the digital content first creates a partial image (semi-disclosed) and its descrambling key. The descrambling key is registered with the system center and the partial image is transmitted to the potential buyer. The buyer decides to purchase by inspecting the scrambled image and buys the descrambling key via a secure key transmission protocol known as InfoKey developed at NTT. The key is used to descramble the image. The buyer ID is embedded using digital watermarking, providing protection against copyright violation.
The high presence video teleconference system is centered around two large projection displays (each 110 inches long along the diagonal). The resolution is four times that of high-definition TV and enables interaction with real-life sized humans. The quality of display performance was demonstrated using 2-D monocular and stereo still images. The monocular images were viewed at a resolution of 6 million pixels/frame and the stereo pairs each had about 3 million pixels/image, giving excellent quality to the stereo images. Although this system as a whole is expensive, key components of the display technology have been commercialized. Using sound localization, an enhanced multimedia presentation is possible with applications to remote museums and education.
When audio books and video are collected and bound as digital objects, it is critical to provide user-friendly interfaces to access them. In the CyberShelf project, books created from HTML documents are accessible using a book metaphor description language.
Another interesting demonstration was an image mosaicking system that produces a panoramic view from a sequence of translating images. User-friendly interfaces to the mosaicking algorithms have been provided. Details of the mosaicking algorithms are in Akutsu et al. (1995) and Taniguchi et al. (1997).
Akutsu, A., Y. Tonomura and H. Hamada. 1995. Videostyler: multidimensional video computing for eloquent media interface. In Proc. Intl. Conf. on Image Processing. Washington D.C. October.
Kon'ya, S. and K. Kushima. 1998. A rotation invariant shape representation based on wavelet transform. In Proc. Workshop on Image Retrieval. University of Northumbria at Newscastle. Feb: 1-9.
Kurakake, S., H. Kuwano and K. Odaka. 1997. Recognition and visual feature matching of text region in video for conceptual indexing. In Proc. SPIE on Storage and Retrieval for Image and Video Databases V. San Jose, CA. Feb: 368-379.
NTT. n.d.(a). Corporate Technology, Research and Development. (Brochure.)
NTT. n.d.(b). Yokosuka R and D Center Guide. (Brochure.)
Taniguchi, Y., A. Akutsu and Y. Tonomura. 1997. Panorama excerpts: extracting and packing panoramas for video browsing. In Proc. ACM Multimedia 97. Seattle, Washington: 429-436.
Yamamuro, M., K. Kushima, H. Kimoto, H. Akama, S. Konya, J. Nakagawa, K. Mii, N. Taniguchi and K. Curtis. 1998. Exsight-multimedia information retrieval system. In Proc. 20th Annual Pacific Telecommunications Conference. Honolulu, Hawaii. Jan: 734-739.