Posted by: aank76 | August 19, 2009

Educational Resource Sharing in the Heterogeneous Environments using Data Grid

Aan Kurniawan 1 and Zainal A. Hasibuan 2
Faculty of Computer Science, University of Indonesia
email: aan.kurniawan@ui.ac.id 1, zhasibua@cs.ui.ac.id 2

ABSTRACT

Educational resources usually reside in the digital library, e-learning and e-laboratory systems. Many of the systems have been developed using different technologies, platforms, protocols and architectures. These systems maintain a large number of digital objects that are stored in many different storage systems and data formats with differences in: schema, access rights, metadata attributes, and ontologies. This study proposes a generic architecture for sharing educational resources in the heterogeneous environments using data grid. The architecture is designed based on the two common types of data: structured and unstructured data. This architecture will improve the accessibility, integration and management of those educational resources.

Keywords: resource sharing, data grid, digital library

1. INTRODUCTION

Currently, the increasing social demands on high quality educational resources of higher education cannot be fulfilled only by the available educators and conventional libraries. With the advances of information technology, many learning materials and academic journals created by universities have been converted into digital objects. The rapid growth of Internet infrastructure accelerates the transformation of conventional libraries and learning to the digital libraries and e-learning. This transformation greatly affects the way of people to get information and learn. Accessing information and learning now can be done from anywhere at any time.

Since many digital library, e-learning and e-laboratory systems have been developed using different technologies, platforms, protocols and architectures, they will potentially introduce the problem of information islands. In order to address this problem, some previous works [1][2][3][4] proposed the use of grid technology that has the capability of integrating the heterogeneous platforms. However, most of them considered that the shared resources are only files or unstructured data. Educational resources consist of not only unstructured data, but also structured data. Much information such as the metadata describing the shared digital objects and the XML formatted documents is stored in a database. This information also needs to be shared with other systems.

In this study, we propose a generic architecture for sharing educational resources in heterogeneous environment using data grid. We also show how this architecture applies in the digital libraries using Indonesian Higher Education Network (INHERENT) [5].

2. INHERENT

INHERENT (Indonesian Higher Education Network) [5] is a network backbone that is developed by Indonesian government to facilitate the interconnection among the higher education institutions (HEIs) in Indonesia. The project was proposed by the directorate of higher education. Started on July 2006, currently it connects 82 state HEIs, 12 regional offices of the coordination of private HEIs, and 150 private HEIs (see Figure 1).

All state HEIs in Java are connected by STM-1 National Backbone with the bandwidth of 155 Mbps. Other cities in the other islands use 8-Mbps leased line and 2-Mbps VSAT connections.

INHERENT
Figure 1. Indonesian Higher Education Network in 2009 [5]

This network has been used for various educational activities including video-conferencing and distance learning. Every university can build their own digital libraries and learning management systems (LMSs) and then publish their educational resources through the network. Although the resources can be shared to each another via FTP or web servers, the systems (digital libraries and LMSs) cannot provide an integrated view to users. Users still have to access every digital library systems in order to find the resources required by them. This network has a potential of sharing various educational resources using data grid as proposed in [6].

3. DATA GRID

Data grid is one of the types of grid technologies. The other types are computational and access grid. Originally, the emphasis of grid technology lay in the sharing of computational resources [7]. Technological and scientific advances have led to an ongoing data explosion in many fields. Data are stored in many different storage systems and data formats with different schema, access rights, metadata attributes, and ontologies. These data also need to be shared and managed. The need then introduces a new grid technology, namely data grid. There are some existing data grids. In the following, we will overview two of them (iRODS and OGSA-DAI) and highlight their features to address the need.

  • iRODS

iRODS (Integrated Rule-Oriented Data System) [8] is a second generation data grid system providing a unified view and seamless access to distributed digital objects across a wide area network. It is extended from the Storage Resource Broker (SRB) that is considered as the first generation data grid system. Both SRB and iRODS are developed by the San Diego Supercomputing Center (SDSC).

Classified as the first generation of the data grid, SRB is mainly focused on providing a unified view over distributed storages based on logical naming concepts using the client-server architecture. The concepts facilitated the naming and location transparency where users, resources, data objects and virtual directories were abstracted by logical names and mapped onto physical entities. The mapping is done at run time by the Virtualization sub-system. The information of the mapping from the logical name to physical name is maintained persistently in a database system called the Metadata Catalog. The database also maintains the metadata of the data objects that use the schema of attribute-value pair and the states of data and operations. Built upon this logical abstraction, iRODS takes one level higher by abstracting the data management process itself called policy abstraction.

iRODS Architecture

iRODS Architecture

Figure 2. iRODS Architecture [8]

Whilst the policies used for managing the data at the server level in SRB are hard-coded, iRODS uses another approach, Rule-oriented Programming (ROP), to make the customization of data management functionalities much easier. Rules are explicitly declared to control the operations performed when a rule is invoked by a particular task. In iRODS, these operations are called micro services and implemented as functions in C programming language.

Figure 2 displays the iRODS architecture with its main modules. The architecture differentiates between the administrative commands needed to manage the rules, and the rules that invoke data management modules. When a user invokes a service, it fires a rule that uses the information from the rule base, status, and metadata catalog to invoke micro-services [9]. The micro-services either change the metadata catalog or change the resource (read/write/create/etc).

Figure 3 illustrates a scenario when a client sends a query asking for a file from an iRODS zone. Firstly, he connects to one of iRODS servers (for example server A) using a client application and sends the criteria of the file needed (e.g. based on the metadata, filename, size, etc). The request is directed to server A that will find the file using information available in Metadata catalog. The query result is sent back to the client. If he/she wants to get the file, server A asks the catalog server which iRODS server that stores the file (for example in the server B). Server A then communicates with server B to request the file. Server B applies the rules related with the request. The rules can be the process of authorization (whether the client has a privilege to read the file) and sending the file to the client using iRODS native protocol. The client is not aware of the location of the file. This location transparency is handled by the grid.

iRODS-in-Action
Figure 3. A client asks for a file from an iRODS data grid [8]

Based on the explanation above, we conclude that iRODS focuses on managing unstructured data objects such as files. Although it can also access structured data resources, its orientation is mainly on distributed file management. However, it also uses structure data (relational database) to manage the metadata of the data objects, the states of data and the states of operations. The metadata can potentially be integrated using the OGSA-DAI data grid.

  • OGSA-DAI

OGSA-DAI (Open Grid Services Architecture – Data Access and Integration) [10] is a middleware software that allows structured data resources, such as relational or XML databases, from multiple, distributed, heterogeneous and autonomously managed data sources to be easily accessed via web services. It focuses on cases where the assembly of all the data into a single data warehouse is inappropriate [10].

OGSA-DAI
Figure
4. An overview of OGSA-DAI components [11]

OGSA-DAI is designed to enable sharing of data resources to make collaboration that supports:

  1. Data access service, which allows to access structured data in distributed heterogeneous data resources
  2. Data transformation service, which allows to expose data in schema X to users as data in schema Y
  3. Data integration service, which allows to expose multiple databases to users as a single virtual database
  4. Data delivery service, which allows to deliver data to where it’s needed by the most appropriate means, such as Web service, email, HTTP, FTP and GridFTP

OGSA-DAI has adopted a service oriented architecture (SOA) solution for integrating data and grids through the use of web services. The role of OGSA-DAI in a service-based Grid, illustrated in Figure 4, involves interactions between several following components [11]:

  1. OGSA-DAI data service: a web service that implements various port types allowing the submission of requests and data transport operations
  2. Client: an entity that submits a request to the OGSA-DAI data service. A request is in the form of a perform document that describes one or more activities to be carried out by the service.
  3. Consumer: a process, other than the client, to which an OGSA-DAI service delivers data.
  4. Producer: a process, other than the client, that sends data to an OGSA-DAI data service.

When a client wants to make a request to an OGSA-DAI data service, it invokes a web service operation on the data service using a perform document. A perform document is an XML document describing the request that the client wants to be executed, defined by linking together a sequence of activities. An activity is an OGSA-DAI construct corresponding to a specific task that should be performed. The output of one activity can be linked to the input of another to perform a number of tasks in sequence.  A range of activities is supported by OGSA-DAI, falling into the broad categories of relational activities, XML activities, delivery activities, transformation activities and file activities. Furthermore, the activity is an OGSA-DAI extensibility point, allowing third parties to define new activities and add them to the ones supported by an OGSA-DAI data service.

OGSA-DAI focuses on managing heterogeneous structured data resources. Although it can also access the unstructured data using file transfer, its orientation is mainly on structured data (such as relational and XML database) management.

According to the two existing data grid middlewares, there are two main orientations in handling data based on the nature of data: structured and unstructured data. In the following section, we propose a system architecture that accommodates these two kinds of data.

4. THE PROPOSED ARCHITECTURE

In this study, a generic architecture for sharing educational resources in heterogeneous environment using data grid middleware is proposed based on the two types of data: structured and unstructured data.

From the perspective of computer processing, the digital objects are merely data. Generally, data can be classified into two categories: unstructured and structured data. Unstructured data consists of any data stored in an unstructured format at an atomic level. There is no conceptual definition and no data type definition in the unstructured content. Furthermore, unstructured data can be divided into two basic categories: bitmap objects (such as video, image, and audio files) and textual objects (such as spreadsheets, presentations, documents, and email). Both of them can be treated as a string of bits. The unstructured data is usually managed by operating system. Structured data has schema information that describes its structure. The schema can be separated from the data (such as in relational database) or it can be mixed with the data (e.g.  XML format). The structured data is usually managed by a database management system. The system facilitates the processes of defining, constructing, manipulating, and sharing the data among various users and applications [12]. This differentiation of the nature of data brings into different treatment when the various formats of data and storage systems are handled in the middleware layer.

The Proposed Architecture

The Proposed Architecture

Figure 5. The proposed generic architecture for resource sharing

Figure 5 shows the proposed architecture that utilizes the similar hierarchy used in [13] but it is applied in managing heterogeneous data resources. The architecture consists of three layers: data layer, data grid middleware layer and application layer.

At the data layer, the various data resources in the heterogeneous file systems and storage systems can be joined into one large data collection. We distinguish between structured and unstructured data because of their different inherent characteristics. At the data grid middleware layer, the data virtualizations for each data type are separated. The unstructured data are virtualized by file-oriented data grid middleware, such as SRB and iRODS, while the structured data virtualization is handled by database-oriented data grid middleware, such as OGSA-DAI.

Based on the analysis of the file-orientated data grids (such as SRB and iRODS), the unstructured data virtualization provides the following basic services [14]:

  1. Data storage and replication service, which allows to store any type of digital object content and to replicate it into several other resources. The service is independent of the content type because only the clients need to be aware of the content internal format and structure.
  2. Composition and relation service, which allows to define various relations between digital objects and to define multiple groups of related objects. Those relations may be used to create complex digital objects, to show parent/child relationships between objects or to create collections of digital objects.
  3. Search service, which allows to search in previously defined sets of digital objects. The search can be based on the query matching with the metadata catalog.
  4. Metadata storage service, which allows to store metadata describing digital objects. One object can be described in many metadata records. The metadata also records information associated with replication. These records can be utilized for the search service. Usually, database systems are used to store and manage the metadata. Furthermore, some database systems containing metadata can be integrated using structured data virtualization components of the data grid middleware.

The structured data virtualization provides the four basic services as described in the section of OGSA-DAI.

At the application layer, data-intensive applications, such as e-learning management systems and digital libraries, can utilize the two data virtualizations in order to publish and share their digital content objects.

Typical-Implementation-web
Figure 6. A typical implementation of the proposed architecture for digital library

Figure 6 shows a typical implementation of the proposed architecture for digital library. Every digital library sites that are registered in the Integrated Higher Education Digital Library Portal manage their own data resources consisting of the collection of digital objects and structured data (relational database and XML). The digital objects are stored in various storage systems that are managed by iRODS storage servers. Since all of iRODS servers are registered in one zone, namely Zone IDL (Indonesian Digital Libraries), the digital objects can be replicated among the servers. Some files of site A can be replicated to the servers of site B, and vice versa. The metadata catalog servers in both sites will contain the same information of all collected educational resources.

Some digital library systems store index files for the use of searching in relational databases. The systems can also manage some kinds of educational resources formatted in XML (e.g. semi-structured documents) using native XML databases. All of this information can be accessed and integrated by the Integrated Higher Education Digital Library Portal using OGSA-DAI. Therefore, a user can do distributed searching for files stored in all resources of both sites. The integrated system also enables a user to get all resources closer to him if the resources are already replicated to some locations.

Since all sites are connected in INHERENT with high-speed bandwidth, there is no need for the Integrated Higher Education Digital Library Portal to harvest the metadata from all member sites such as proposed in [4]. No central metadata repository is required. This ensures that the query results of distributed searching will always be up to date because they come from the local query processing of each member sites.

5. CONCLUSION

In this study, we propose a generic architecture for sharing educational resources in the heterogeneous environment. The architecture distinguishes the managed data into two categories, namely structured and unstructured data. The data grid middleware used for virtualization is separated based on the two categories of data. In our design, we use iRODS and OGSA-DAI as the data grid middleware. iRODS is mainly file-oriented data grid and uses its own protocols. OGSA-DAI is mainly database-oriented and can only be accessed through web service mechanism. The combination of the two data grids completely handles all kinds of data types. Hence, this architecture can improve the accessibility, integration and management of those educational resources.

6. REFERENCES

[1] Yang, C.T., Hsin-Chuan Ho. Using Data Grid Technologies to Construct a Digital Library Environment. Proceedings of the 3rd International Conference on Information Technology: Research and Education (ITRE 05), pp. 388-392, NTHU, Hsinchu, Taiwan, June 27-30, 2005. (EI)
[2] Candela, L., Donatella Castelli, Pasquale Pagano, Manuele Simi. Moving Digital Library Service Systems to the Grid. Springer-Verlag. 2005.
[3] Sebestyen-Pal, G., Doina Banciu, Tunde Balint, Bogdan Moscaiuc, Agnes Sebetyen-Pal. Towards a GRID-based Digital Library Management System. In Distributed and Parallel Systems p77-90. Springer-Verlag. 2008.
[4] Pan, H. Research on the Interoperability Architecture of the Digital Library Grid. 2007. in IFIP International Federation for Information Processing, Volume 251, Integration and Innovation Orient to E-Society Volume l, Wang, W. (Eds), (Boston: Springer), pp. 147-154.
[5] Indonesian Higher Education Network (INHERENT). http://www.inherent-dikti.net
[6] Hasibuan, Z.A. The Use of Data Grid Technology to Support Distance Education Environment. The International Symposium on Open, Distance, and E-Learning, Bali, 13-15 November 2007.
[7] Foster, I., Carl Kesselman. The Grid: Blueprint for a New Computing Infrastructure. 2nd Edition. Morgan Kaufmann. 2006.
[8] iRODS (Integrated Rule-Oriented Data System). https://www.irods.org
[9] Weise, A., Mike Wan, Wayne Schroeder, Adil Hasan. Managing Groups of Files in a Rule Oriented Data Management System (iRODS). Proceedings of the 8th International Conference on Computational Science, Section: Workshop on Software Engineering for Large-Scale Computing, Krakow, Poland. 2008
[10] OGSA-DAI (Open Grid Services Architecture – Data Access and Integration). http://www.ogsadai.org.uk/index.php
[11] Chue Hong, N.P., Antonioletti, M., Karasavvas, K.A. and Atkinson, M. Accessing Data in Grids Using OGSA-DAI, in Knowledge and Data Management in GRIDs, p3-18, D. Talia, A. Bilas, M.D. Dikaiakos (Eds.), 2007, ISBN: 978-0-387-37830-5
[12] Elmashri, R., Shamkant B. Navathe. Fundamental of Database Systems. 5th Edition. Addison Wesley. 2006.
[13] Coulouris, G., Jean Dollimore, Tim Kindberg. Distributed Systems : Concepts and Design. 4th edition. Addison Wesley. 2005.
[14] Kosiedowski, M.,Mazurek, C., Stroinski, M., Werla, M. and Wolski, M. Federating Digital Library Services for Advanced Applications in Science and Education, Computational Methods in Science and Technology 13(2), pp. 101-112. December, 2007

*)  Proceedings of International Conference on Creative and Innovative Technology 2009 (ICCIT-09) (pp. 66-71). STMIK Rahardja, Tangerang: CCIT Journal, Indonesia


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Categories

Follow

Get every new post delivered to your Inbox.