Scalability is a fundamental problem for information systems when the amount of managed data increases. Peer to Peer systems are usually used to solve scalability problems as centralized approaches do not scale without large dedicated infrastructure. But most current Peer to Peer systems do not take into account that indexed data can be dynamic and change their values very often. Thus, we propose the Multi-set approach, which aims to find the best trade-off between DHT-based network and total replication. This approach is built over classical DHT Peer to Peer system. It can improve most of pure DHT Peer to Peer system by taking into account the dynamism of resources. Evaluation is done by modeling, simulation and experimentation on PlanetLab. This approach is more efficient than DHT Peer to Peer system and total replication whichever the dynamism of resources is.
Grids have emerged as wide-scale, distributed infrastructures providing enough resources for always more demanding scientific experiments. EGEE is one of the largest scientific Grids in production operation today, with over 220 sites and more than 30,000 CPU all over the world. A further evolution of EGEE needs to be based on knowledge of deficiencies and bottleneck of the current infrastructure and software. To provide this knowledge we analyzed nine months of job submissions on the South-East federation of EGEE. We provide information on how users submit their jobs:throughput, bursts, requirements, VO. We study the current behavior of EGEE middleware too, by evaluating its performance and the retry policy. We finally show that even if the middleware provides advanced functionality, most submissions are still embarrassingly parallel jobs.
This article addresses the problem of the study of the performance evaluation and behavior of the large scale Peer-to-Peer file sharing systems. In particular the impact of realistic workload is considered by evaluating the Freenet system. This evaluation is achieved by a simulation approach. A set of inputs is determined as well as their distribution law in order to generate a more realistic workload. One of them is an original characterization of user's requests. An other contribution is to show the impact of these more realistic inputs on the overall system performances. Notably new abrupt behaviors in the learning process are described.
The Grid resource manager is one of the fundamental Grid services. It has to manage the Grid state and to locate resources for users. With Grids becoming larger, this service needs to be efficient and scalable. Current centralized approach are unable to scale without large dedicated infrastructure. Thus, we propose the Multi-set approach, which aims to find the best tradeoff between DHT-based network and total replication. It is built over classical DHT P2P system. It improves most of current DHT P2P system by taking into account the dynamism of resources. Evaluation is done by simulations. This approach is more efficient than DHT P2P system and total replication whichever the dynamism of resources is.
Nowadays, Peer to Peer systems are largely studied. But in order to evaluate them in a realistic way, a better knowledge of their environments is needed. In this article we focus on the computers availability in these systems. We characterize this availability behind ADSL lines and we link it with the availability of Peer to Peer systems participants. We emphase on the methodology as generalized in other systems such as grids or ad-hoc systems. We finally show how users of ADSL lines are related to Peer to Peer users and we give some examples of the possible practical use of theses results. The results are based on trace datasets obtained over the first five months of 2003 with around 5000 hosts.
In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emphasizes on low software complexity by using high level tools. The global architecture is built upon the scripting language Perl and the relational database engine Mysql. The goal of the project OAR is to prove that it is possible today to build a complex system for ressource management using such tools without sacrificing efficiency and scalability. Currently, our system offers most of the important features implemented by other batch schedulers such as priority scheduling (by queues), reservations, backfilling and some global computing support. Despite the use of high level tools, our experiments show that our system has performances close to other systems. Furthermore, OAR is currently exploited for the management of 700 nodes (a metropolitan GRID) and has shown good efficiency and robustness.
The memory hierarchy becomes the bottleneck for multiprocessors systems as its evolution does not keep pace with processor technology. This study intends to identify the relationship between performance slow-down and memory pressure, using hardware performance counters. Based on this relationship, we propose an adaptive control system that improves the efficiency of load balancing among the computer resources. The DRAC system, our adaptive control system, observes the access requests on the memory bus. It then adapts its userlevel scheduling strategy to maximize the resource utilization. We describe the DRAC system and its mathematical model. We show experimental results that prove the DRAC system is nearly optimal with our model.
As Peer to Peer systems become wildly used, it becomes necessary to be able to evaluate and compare them. In order to simulate, emulate or even model such systems, it is necessary to understand the environment in which they will be used. In this paper we describe one characterization of such an environment. We describe an implementation of workload generation for Peer to Peer systems too.
Pour évaluer les systèmes pair à pair, il est nécessaire de comprendre les influences qui s'exercent sur eux. Dans cet article nous avons étudié certaines de ces influences d'un point de vue client contrairement au point de vue serveur usuel. Une caractérisation d'un certain nombre de ces facteurs influents tels que l'activité des utilisateurs ou la présence de différents types de fichiers ainsi que leurs distributions a été réalisée. En particulier, l'étude caractéristique des requêtes dans les systèmes de partage pair à pair et celle de la puissance de calcul dans les systèmes de calcul pair à pair sont abordées. Enfin, nous avons explicité la méthodologie suivie pour obtenir le profil de présence des utilisateurs.
Dans cet article nous présentons les choix de conception et l'évaluation d'un gestionnaire de travaux pour grappe de grande taille, baptisé OAR. Ce gestionnaire repose sur une conception originale qui réduit la complexité logicielle, permet une extension aisée ainsi qu'une bonne réponse au problème du passage à l'échelle. L'architecture globale repose principalement sur deux composants de haut niveau : un outil générique d'administration d'application (lancement, déploiement) passant à l'échelle et l'utilisation d'une base de données comme seul médium d'échange d'information entre les modules internes. Dans l'évaluation nous montrons à la fois le bon niveau de performance du système et sa capacité d'extension dans une utilisation de type Global Computing (utilisation des ressources inutilisées).
Pour évaluer les systèmes Pair à Pair, il est nécessaire de comprendre les influences qui s'exercent sur eux. Dans cet article nous avons étudié certaines de ces influences d'un point de vue client contrairement au point de vue serveur usuel. Une caractérisation d'un certain nombre de ces facteurs influents tels que l'activité des utilisateurs ou la présence de différentes types de fichiers ainsi que leurs distributions ont été réalisées. En particulier, les caractéristiques des requêtes dans les systèmes de partage Pair à Pair et la puissance de calcul dans les systèmes de calcul Pair à Pair sont abordées. Enfin, nous avons explicité la méthodologie suivie pour obtenir le profil de présence des utilisateurs.
Grid resources management systems are currently using a centralized approach. This method have several drawbacks, such as a single point of failure, an overloaded point or slow update, and administrative needs. In this article, we evaluate the use of currently wildly used Peer to peer systems to improve this situation, showing the lacks of these Peer to peer systems. We propose the architecture of a new system to handle efficiently the management of resources for grid, and a new Peer to peer system to fulfill those lacks.