As the centerpiece of NASA's Mission To Planet Earth (MTPE), the Earth Observing System (EOS) will herald an unprecedented era of multi-disciplinary research to study the processes leading to global change. A constellation of remote-sensing satellites will scan the Earth for several years sending back hundreds of gigabytes of valuable data every day by the middle of 1999. Heavy processing and data product generation will require archives to store terabytes of data daily; over the lifetime of EOS, the archives will store petabytes (1 petabyte= 1,000 terabytes) of data. The EOS Data Information System (EOSDIS) will provide the computing and network facilities to support the EOS research activities, including processing, distributing, and archiving EOS data; exchanging research results among scientists; and commanding and controlling the spacecraft instrumentation. EOSDIS' infrastructure, the EOSDIS Core System (ECS), will provide scientists the computing architecture needed to accomplish these goals. ECS has been designed to enable evolution to support a broad range of data partners.
EOSDIS is an integrated system that supports multiple satellites and instruments. EOS includes instruments on satellites to be launched by NASA, the European Space Agency (ESA), and the Japanese National Space Agency (NASDA). The data from each EOS instrument will be sent to one of several designated Distributed Active Archive Centers (DAAC) responsible for processing, archiving, and distributing EOS and related data. These data centers will house the ECS computing facilities and operational staff needed to produce EOS Standard Products and to manage, store, and distribute EOSDIS data, as well as the associated metadata and browse data that allow effective use of the data holdings. The DAACs will exchange data via dedicated EOSDIS networks to support processing at one DAAC or Science Computing Facility (SCF) which requires data from another DAAC or SCF. NASA selected the DAACs based on their expertise in specific science disciplines and demonstrated long-term commitments to the corresponding user communities. SCFs, located at EOS investigators' home institutions, develop and maintain algorithms (for Standard and Special Products), calibrate the EOS instruments, validate data and algorithms, generate Special Products, provide data and services to other investigators, and analyze EOS and other data in pursuit of the overall science objectives.
The ECS is being developed along an incremental development track and a parallel formal development track that uses the traditional waterfall development methodology. The incremental track allows developers to address user-sensitive requirements at an early stage and to integrate and test how commercial-off-the-shelf (COTS) products perform in the system designs prior to formal system releases. Prototype Workshops (PW) and Evaluation Packages (EP) are part of the incremental development track. Each PW and EP builds upon and expands the capabilities of previous PWs and EPs based upon the comments received from selected evaluators. These capabilities are eventually migrated to the formal track for integration into the formal release software (i.e., Release A-D). The first formal release, Release A, is due at the end of 1996.
The ECS uses a client-server infrastructure that will allow users access to services and interfaces to services. This arrangement is similar to that of the World Wide Web (WWW) where users access web sites (servers) by way of browsers (clients), such as Netscape or Mosaic, on their computers. Figure 1 shows the ECS services-based architecture concept.

Note that there are 3 layers: the client services layer, the interoperability layer, and the service provider layer. The client layer is the user interface residing on the user's computer and includes all the tools for data search and retrieval. For example, users can locate and invoke an EOSDIS-wide service, such as cross-DAAC searching, or can choose local DAAC services, such as search of local data connections and DAAC-unique subsetting. A user can also query directly, using science-oriented forms or free-text for specific provider services. The interoperability layer acts as the "middleware" allowing the client layer to communicate with any or all of the service providers in a seamless and transparent manner. The service provider layer is composed of the DAACs, the SCFs, and any other external data provider, from individual scientists to other data centers. This layer is where the actual data and products reside.
An object oriented design methodology is being used to implement the ECS architectural design shown in Figure 2.

The subsystems comprising the architecture are described below.
The client subsystem, depicted in Figure 3, is the interface between the user and the ECS services through graphical user interfaces (GUIs), data/server access tools, and application program interface (API) libraries.

Some key features of the client subsystem include an Extensible Workbench that provides tools for data access, search capabilities, document access, etc.; Desktop management for the graphical interfaces; and access to Hypertext Markup Language (HTML)-based electronic documents on the WWW. The primary client subsystem tool is the Earth Science Search Tool (ESST). The ESST is an X-Window-Based Workbench tool that provides users with the capability to submit data searches. (Search results are automatically sent to the Product Request Tool which allows the user to browse or subset the data, perform a coincident search, or order the data.) ESST capabilities continue to grow as new enhancements, such as a Java-infused search tool, are tested in the PWs and EPs.
There are two types of client applications: interactive and non-interactive (processing). Interactive applications are those which drive some sort of user interface and require interaction with the user. These applications can support GUI-based tools for retrieving directory and inventory information, performing inventory searches and reviewing search results; tools for data analysis and visualization; hypertext-based electronic journal applications; and collaboration tools that support shared window environments, electronic white boards, textual and audio "chat" sessions, and video teleconferencing.
Non-interactive applications are for algorithmic transformations of data. This class of applications typically requires no user interaction once running, although some configuration may be required prior to execution. These applications can include automated quality control applications; subscription agents; data analysis and transformation applications; and data "mining" applications capable of monitoring new developments in the information repository and notifying users of items of interest.
The Interoperability subsystem, as shown in Figure 3 (above) and Figure 4, provides the connections between the client and the providers holding data collections and associated services.

This is accomplished through a collection of functions which ensure that requests are routed to the appropriate services and which provide a framework for the interaction of clients and servers. For example, the Advertising service manages information about the services available on the ECS network; provides an interface for querying this information via hypertext links; and allows results to be stored on the client desktop for future use in the form of clickable icons.
The client-server communication is supported through Open Software Foundation's (OSF) Distributed Computing Environment (DCE). The Advertising service is a database application built on Sybase relational Database Management System (DBMS) and employs a replication server to maintain database consistency.
The Data Management subsystem, as illustrated in Figure 4 (above), provides distributed search and access services with a science-oriented view of data collections. There are three levels at which data access requests can be processed: at the intersite level by the Distributed Information Manager (DIM); at the site level by the Local Information Manager (LIM); and at the data set or data-type level by a data server (described in the next section).
The DIM is an example of a service that is capable of executing requests requiring access to multiple sites. The DIM acts as an agent for the client: after the DIM accepts a distributed query or access request, it assumes responsibilities for its execution and for compilation of the results; the client can disconnect from the DIM, reconnect at a later point to determine the status of a query, obtain partial results, or cancel the query.
The LIM is an instance of a service that is capable of executing requests which require access to multiple data servers at a single site. LIM services can be requested by a DIM or directly by a client. In either case, the LIM will act as an agent of its client, just as the DIM. Note that a site may choose to provide several LIMs, perhaps supporting different data access and query languages.
A key element of the Data Management subsystem is the use of an Earth science query language. Initially the language will be based on a set of existing standard query parameters supporting access to a complete collection of Earth Science data. This will give the users an Earth science-based view of the data instead of a computer science-based view of the data, with which they may be unfamiliar. As the capabilities of the data servers and the LIMs improve, the language will be able to support more complex forms of data searching including coincident search and content-based search.
Like the Advertising service described above, the Data Management subsystem is a database application built on Sybase relational Database Management System (DBMS) and employs a replication server to maintain database consistency.
The Data Server subsystem provides management, search, access, long-term storage, and distribution facilities for individual data collections. ECS data collections consist of many types of logical data objects (Earth science data types) and use several types of physical data structures (computer science data types). Collections can be DAAC-and discipline-unique. Access to the collection is provided by interfaces for spatial, temporal, and attribute searches for science products; hypertext and full-text searches of document collections; product orders and retrievals through subscriptions; and for special services such as subsetting and subsampling. (See Figure 5 for an overview of the Data Server subsystem.)

The Data Server subsystem is implemented through an integration of Sybase DBMS with Vision International's Spatial Query Server. The file storage management system (FSMS) uses Archival Management and Storage System (AMASS) FSMS running on an SGI Challenge computer with an SGI redundant array of inexpensive disks (RAID) for on-line storage. The multi-terabyte near-line/on-line archives are built on E-Systems Modular Automated Storage Systems (EMASS) Automated Media Library (AML) silos and robotics with EMASS 3590 10 GB linear scan drives for Release A, migrating to 50 GB helical scan technology for Release B.
The Data Ingest subsystem provides the tools for importing data (e.g., science products, ancillary, correlative, and documents) into ECS repositories (Data Servers). (See Figure 6.)

The data can be ingested on a scheduled or an ad hoc basis and the subsystem provides the specifics for interfacing with external systems. The Data Ingest subsystem plans whatever processing (including metadata extraction or generation) and loading services are required to archive the data and make them available with the associated metadata. It then makes requests to the Data Processing subsystem (described below) for scheduling the plan into the processing capacity and to the Data Server to access the services for loading the data into the archives. The Data Ingest subsystem contains data format and translation tools as well as pre-processing capabilities.
The Data Ingest subsystem uses custom-developed interface code built on Sybase DBMS and runs on an SGI Challenge computer.
The Planning subsystem, also shown in Figure 6 (above), works with the Data Processing subsystem to pre-plan routine, ad hoc, and on-demand science data processing. It can also perform management functions for handling deviations from the operations plan for individual DAAC sites and for coordinating operations with other DAACs. GUI clients and client/server planning tools are available for users and operations personnel.
The Planning subsystem uses AutoSys and AutoXpert COTS technologies within its design.
The Data Processing subsystem is responsible for scheduling the requests for processing onto the processing resources. The subsystem provides the functions to host the science algorithm software, perform data processing, and process resource management. It also includes facilities and toolkits that offer true software portability across advanced computing platforms. (See Figure 6 above.)
The hardware for each site varies from workstations to supercomputers to massive parallel processors. High speed I/O and data staging hardware, as well as distributed and parallel processing environments, can also be implemented according to the site requirements. At all sites, resource optimization and management software is used to maximize site facilities.
ECS design and development are at the forefront of
emerging information system technology that emphasizes globally
distributed, autonomously managed, dynamically updated,
hypermedia-enabled systems to make it easier than ever before for users
to find the data and information they need. Candidate future technology
insertions for ECS development include: RISC processors, parallelization
software, optical tape, parallel databases, object oriented DBMS,
distributed computing, direct broadcast, Java, and smart agents.
Home | Search | Applied
Tech |
ECS Tools | Library
|
News & Info | Index |
Links
Contact/Curator : scoot@eos.hitc.com
Last Modified : April 15, 1998