Suchergebnisse
Filter
24 Ergebnisse
Sortierung:
SSRN
Working paper
Iteration-fusing conjugate gradient
This paper presents the Iteration-Fusing Conjugate Gradient (IFCG) approach which is an evolution of the Conjugate Gradient method that consists in i) letting computations from different iterations to overlap between them and ii) splitting linear algebra kernels into subkernels to increase concurrency and relax data-dependencies. The paper presents two ways of applying the IFCG approach: The IFCG1 algorithm, which aims at hiding the cost of parallel reductions, and the IFCG2 algorithm, which aims at reducing idle time by starting computations as soon as possible. Both IFCG1 and IFCG2 algorithms are two complementary approaches aiming at increasing parallel performance. Extensive numerical experiments are conducted to compare the IFCG1 and IFCG2 numerical stability and performance against four state-of-the-art techniques. By considering a set of representative input matrices, the paper demonstrates that IFCG1 and IFCG2 provide parallel performance improvements up to 42.9% and 41.5% respectively and average improvements of 11.8% and 7.1% with respect to the best state-of-the-art techniques while keeping similar numerical stability properties. Also, this paper provides an evaluation of the IFCG algorithms' sensitivity to system noise and it demonstrates that they run 18.0% faster on average than the best state-of-the-art technique under realistic degrees of system noise. ; This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316) , by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the IBM/BSC Deep Learning Center Initiative. ; Peer Reviewed ; Postprint (author's final draft)
BASE
Prediction of the impact of network switch utilization on application performance via active measurement
Although one of the key characteristics of High Performance Computing (HPC) infrastructures are their fast interconnecting networks, the increasingly large computational capacity of HPC nodes and the subsequent growth of data exchanges between them constitute a potential performance bottleneck. To achieve high performance in parallel executions despite network limitations, application developers require tools to measure their codes' network utilization and to correlate the network's communication capacity with the performance of their applications. This paper presents a new methodology to measure and understand network behavior. The approach is based in two different techniques that inject extra network communication. The first technique aims to measure the fraction of the network that is utilized by a software component (an application or an individual task) to determine the existence and severity of network contention. The second injects large amounts of network traffic to study how applications behave on less capable or fully utilized networks. The measurements obtained by these techniques are combined to predict the performance slowdown suffered by a particular software component when it shares the network with others. Predictions are obtained by considering several training sets that use raw data from the two measurement techniques. The sensitivity of the training set size is evaluated by considering 12 different scenarios. Our results find the optimum training set size to be around 200 training points. When optimal data sets are used, the proposed methodology provides predictions with an average error of 9.6% considering 36 scenarios. ; With the support of the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Expedient 2013BP_B00243). The research leading to these results has received funding from the European Research Council under the European Union's 7th FP (FP/2007-2013) /ERC GA n. 321253. Work partially supported by the Spanish Ministry of Science and Innovation (TIN2012-34557) ; Peer Reviewed ; Postprint (author's final draft)
BASE
Iteration-fusing conjugate gradient
This paper presents the Iteration-Fusing Conjugate Gradient (IFCG) approach which is an evolution of the Conjugate Gradient method that consists in i) letting computations from different iterations to overlap between them and ii) splitting linear algebra kernels into subkernels to increase concurrency and relax data-dependencies. The paper presents two ways of applying the IFCG approach: The IFCG1 algorithm, which aims at hiding the cost of parallel reductions, and the IFCG2 algorithm, which aims at reducing idle time by starting computations as soon as possible. Both IFCG1 and IFCG2 algorithms are two complementary approaches aiming at increasing parallel performance. Extensive numerical experiments are conducted to compare the IFCG1 and IFCG2 numerical stability and performance against four state-of-the-art techniques. By considering a set of representative input matrices, the paper demonstrates that IFCG1 and IFCG2 provide parallel performance improvements up to 42.9% and 41.5% respectively and average improvements of 11.8% and 7.1% with respect to the best state-of-the-art techniques while keeping similar numerical stability properties. Also, this paper provides an evaluation of the IFCG algorithms' sensitivity to system noise and it demonstrates that they run 18.0% faster on average than the best state-of-the-art technique under realistic degrees of system noise. ; This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493), by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316) , by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the IBM/BSC Deep Learning Center Initiative. ; Peer Reviewed ; Postprint (author's final draft)
BASE
Prediction of the impact of network switch utilization on application performance via active measurement
Although one of the key characteristics of High Performance Computing (HPC) infrastructures are their fast interconnecting networks, the increasingly large computational capacity of HPC nodes and the subsequent growth of data exchanges between them constitute a potential performance bottleneck. To achieve high performance in parallel executions despite network limitations, application developers require tools to measure their codes' network utilization and to correlate the network's communication capacity with the performance of their applications. This paper presents a new methodology to measure and understand network behavior. The approach is based in two different techniques that inject extra network communication. The first technique aims to measure the fraction of the network that is utilized by a software component (an application or an individual task) to determine the existence and severity of network contention. The second injects large amounts of network traffic to study how applications behave on less capable or fully utilized networks. The measurements obtained by these techniques are combined to predict the performance slowdown suffered by a particular software component when it shares the network with others. Predictions are obtained by considering several training sets that use raw data from the two measurement techniques. The sensitivity of the training set size is evaluated by considering 12 different scenarios. Our results find the optimum training set size to be around 200 training points. When optimal data sets are used, the proposed methodology provides predictions with an average error of 9.6% considering 36 scenarios. ; With the support of the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Expedient 2013BP_B00243). The research leading to these results has received funding from the European Research Council under the European Union's 7th FP (FP/2007-2013) /ERC GA n. 321253. Work partially supported by the Spanish Ministry of Science and Innovation (TIN2012-34557) ; Peer Reviewed ; Postprint (author's final draft)
BASE
Resilient gossip-inspired all-reduce algorithms for high-performance computing - Potential, limitations, and open questions
We investigate the usefulness of gossip-based reduction algorithms in a high-performance computing (HPC) context. We compare them to state-of-the-art deterministic parallel reduction algorithms in terms of fault tolerance and resilience against silent data corruption (SDC) as well as in terms of performance and scalability. New gossip-based reduction algorithms are proposed, which significantly improve the state-of-the-art in terms of resilience against SDC. Moreover, a new gossip-inspired reduction algorithm is proposed, which promises a much more competitive runtime performance in an HPC context than classical gossip-based algorithms, in particular for low accuracy requirements. ; This work has been partially funded by the Spanish Ministry of Science and Innovation [contract TIN2015-65316]; by the Government of Catalonia [contracts 2014-SGR-1051, 2014-SGR-1272]; by the RoMoL ERC Advanced Grant [grant number GA 321253] and by the Vienna Science and Technology Fund (WWTF) through project ICT15-113. ; Peer Reviewed ; Postprint (author's final draft)
BASE
Resilient gossip-inspired all-reduce algorithms for high-performance computing - Potential, limitations, and open questions
We investigate the usefulness of gossip-based reduction algorithms in a high-performance computing (HPC) context. We compare them to state-of-the-art deterministic parallel reduction algorithms in terms of fault tolerance and resilience against silent data corruption (SDC) as well as in terms of performance and scalability. New gossip-based reduction algorithms are proposed, which significantly improve the state-of-the-art in terms of resilience against SDC. Moreover, a new gossip-inspired reduction algorithm is proposed, which promises a much more competitive runtime performance in an HPC context than classical gossip-based algorithms, in particular for low accuracy requirements. ; This work has been partially funded by the Spanish Ministry of Science and Innovation [contract TIN2015-65316]; by the Government of Catalonia [contracts 2014-SGR-1051, 2014-SGR-1272]; by the RoMoL ERC Advanced Grant [grant number GA 321253] and by the Vienna Science and Technology Fund (WWTF) through project ICT15-113. ; Peer Reviewed ; Postprint (author's final draft)
BASE
Entrevista con Francisco J. Igualada
Francisco José Igualada Delgado obtuvo, en 1981, la licenciatura en Geología de la Universidad de Barcelona. Trabajó en Dames & Moore Ingenieros en Teruel y Madrid, y en 1983 se incorporó al South African Department of Economic Affairs (Servicios Geológicos de Sudáfrica) en Mafeking, en proyectos de geotecnia, geofísica y minería; posteriormente, en España trabajo por un corto tiempo en el ITGME. En 1986 cursó un diploma de posgrado-máster en Ingeniería Geológica y Teledetección, en el ITC, Universidad de Utrecht Enschede (Holanda). En 1987 fue responsable, en TRABAJOS CATASTRALES S.A. Pamplona, de teledetección y sistemas de información para estadísticas agrarias, medioambiente, cartografía y fotogrametría, participando en proyectos europeos, como MARS y Corine. A finales de 1989, en AURENSA (Madrid), fue jefe de proyectos multidisciplinares internacionales de geoinformación, ingeniería y medioambiente con satélites de observación de la Tierra y Sistemas de Información Geográficos (SIG). En 1993 completó un programa de investigación doctoral en Cranfield University, Silsoe (Reino Unido) iniciado años antes, obteniendo "M.Phil/PhD Information Systems (GIS-geomatics)" a través de un proyecto de la Comisión Europea (JRC). En agosto de 1993, empezó en el European Union Satellite Centre (EUSC) con base en Madrid y Bruselas. Durante los casi 9 años que estuvo en esta organización, utilizó todo tipo de imágenes ópticas y de radar, enfocándose hacia la gestión de crisis y temas de seguridad internacional. Fue manager de proyectos de SIG, seguridad medioambiental y analista senior de IMINT; al mismo tiempo que corresponsable de la explotación del satélite Helios en diversas crisis regionales. Paralelamente, en el año 2000 obtuvo un MBA "International Executive" por ESCP-EAP Business School de Madrid. Durante los años 2002-03, trabajó en el Departamento de Operaciones de Mantenimiento de la Paz (DOMP) de la ONU donde fue Jefe de Sistemas de Geoinformación en África (Etiopía, Eritrea, Sudán…) y Nueva York, contribuyendo a lanzar la nueva iniciativa de unidades de SIG para los "cascos azules" (Decision Support System), incluidas operaciones humanitarias, desminado y delimitación de fronteras, recursos naturales, etcétera. Desde Nueva York, estuvo a cargo, en 2005-07, de las infraestructuras geográficas (IM y mapping) en la República Democrática del Congo, región Oeste/Norte de África, y Oriente Medio. A finales del 2007 fue nombrado Director del Centro de Geoinformación de la United Nations Logistic Base (UNLB) en Bríndisi (Italia). El Centro es la parte más operativa de la Sección de Cartografía de la ONU, dando apoyo con información clave a los "cascos azules" en Darfur, Chad, Afganistán. Por otro lado, posee varias certificaciones profesionales como "EurGeol", así como evaluador y asesor de la Comisión Europea, y colaborador de CTBTO. ; Peer Reviewed
BASE
Entrevista con Francisco J. Igualada
In: Cuadernos Internacionales de Tecnología para el Desarrollo Humano, 2009, núm. 8
Francisco José Igualada Delgado obtuvo, en 1981, la licenciatura en Geología de la Universidad de Barcelona. Trabajó en Dames & Moore Ingenieros en Teruel y Madrid, y en 1983 se incorporó al South African Department of Economic Affairs (Servicios Geológicos de Sudáfrica) en Mafeking, en proyectos de geotecnia, geofísica y minería; posteriormente, en España trabajo por un corto tiempo en el ITGME. En 1986 cursó un diploma de posgrado-máster en Ingeniería Geológica y Teledetección, en el ITC, Universidad de Utrecht Enschede (Holanda). En 1987 fue responsable, en TRABAJOS CATASTRALES S.A. Pamplona, de teledetección y sistemas de información para estadísticas agrarias, medioambiente, cartografía y fotogrametría, participando en proyectos europeos, como MARS y Corine. A finales de 1989, en AURENSA (Madrid), fue jefe de proyectos multidisciplinares internacionales de geoinformación, ingeniería y medioambiente con satélites de observación de la Tierra y Sistemas de Información Geográficos (SIG). En 1993 completó un programa de investigación doctoral en Cranfield University, Silsoe (Reino Unido) iniciado años antes, obteniendo "M.Phil/PhD Information Systems (GIS-geomatics)" a través de un proyecto de la Comisión Europea (JRC). En agosto de 1993, empezó en el European Union Satellite Centre (EUSC) con base en Madrid y Bruselas. Durante los casi 9 años que estuvo en esta organización, utilizó todo tipo de imágenes ópticas y de radar, enfocándose hacia la gestión de crisis y temas de seguridad internacional. Fue manager de proyectos de SIG, seguridad medioambiental y analista senior de IMINT; al mismo tiempo que corresponsable de la explotación del satélite Helios en diversas crisis regionales. Paralelamente, en el año 2000 obtuvo un MBA "International Executive" por ESCP-EAP Business School de Madrid. Durante los años 2002-03, trabajó en el Departamento de Operaciones de Mantenimiento de la Paz (DOMP) de la ONU donde fue Jefe de Sistemas de Geoinformación en África (Etiopía, Eritrea, Sudán…) y Nueva York, contribuyendo a lanzar la nueva iniciativa de unidades de SIG para los "cascos azules" (Decision Support System), incluidas operaciones humanitarias, desminado y delimitación de fronteras, recursos naturales, etcétera. Desde Nueva York, estuvo a cargo, en 2005-07, de las infraestructuras geográficas (IM y mapping) en la República Democrática del Congo, región Oeste/Norte de África, y Oriente Medio. A finales del 2007 fue nombrado Director del Centro de Geoinformación de la United Nations Logistic Base (UNLB) en Bríndisi (Italia). El Centro es la parte más operativa de la Sección de Cartografía de la ONU, dando apoyo con información clave a los "cascos azules" en Darfur, Chad, Afganistán. Por otro lado, posee varias certificaciones profesionales como "EurGeol", así como evaluador y asesor de la Comisión Europea, y colaborador de CTBTO. ; Peer Reviewed
BASE
Approximating a Multi-Grid Solver
Multi-grid methods are numerical algorithms used in parallel and distributed processing. The main idea of multigrid solvers is to speedup the convergence of an iterative method by reducing the problem to a coarser grid a number of times. Multi-grid methods are widely exploited in many application domains, thus it is important to improve their performance and energy efficiency. This paper aims to reach this objective based on the following observation: Given that the intermediary steps do not require full accuracy, it is possible to save time and energy by reducing precision during some steps while keeping the final result within the targeted accuracy. To achieve this goal, we first introduce a cycle shape different from the classic V-cycle used in multi-grid solvers. Then, we propose to dynamically change the floating-point precision used during runtime according to the accuracy needed for each intermediary step. Our evaluation considering a state-of-the-art multi-grid solver implementation demonstrates that it is possible to trade temporary precision for time to completion without hurting the quality of the final result. In particular, we are able to reach the same accuracy results as with full double-precision while gaining between 15% and 30% execution time improvement. ; This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 708566 (DURO). The European Commission is not liable for any use that might be made of the information contained therein. This work has been supported by the Spanish Government (Severo Ochoa grant SEV2015-0493) ; Peer Reviewed ; Postprint (author's final draft)
BASE
Approximating a Multi-Grid Solver
Multi-grid methods are numerical algorithms used in parallel and distributed processing. The main idea of multigrid solvers is to speedup the convergence of an iterative method by reducing the problem to a coarser grid a number of times. Multi-grid methods are widely exploited in many application domains, thus it is important to improve their performance and energy efficiency. This paper aims to reach this objective based on the following observation: Given that the intermediary steps do not require full accuracy, it is possible to save time and energy by reducing precision during some steps while keeping the final result within the targeted accuracy. To achieve this goal, we first introduce a cycle shape different from the classic V-cycle used in multi-grid solvers. Then, we propose to dynamically change the floating-point precision used during runtime according to the accuracy needed for each intermediary step. Our evaluation considering a state-of-the-art multi-grid solver implementation demonstrates that it is possible to trade temporary precision for time to completion without hurting the quality of the final result. In particular, we are able to reach the same accuracy results as with full double-precision while gaining between 15% and 30% execution time improvement. ; This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 708566 (DURO). The European Commission is not liable for any use that might be made of the information contained therein. This work has been supported by the Spanish Government (Severo Ochoa grant SEV2015-0493) ; Peer Reviewed ; Postprint (author's final draft)
BASE
TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism
As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as parallel programming models are attracting a lot of attention. Task-based parallel programming models offer an appealing approach to utilize complex CMPs. However, the increasing number of cores on modern CMPs is pushing research towards the use of fine grained parallelism. Task-based programming models need to be able to handle such workloads and offer performance and scalability. Using specialized hardware for boosting performance of task-based programming models is a common practice in the research community. Our paper makes the observation that task creation becomes a bottleneck when we execute fine grained parallel applications with many task-based programming models. As the number of cores increases the time spent generating the tasks of the application is becoming more critical to the entire execution. To overcome this issue, we propose TaskGenX. TaskGenX offers a solution for minimizing task creation overheads and relies both on the runtime system and a dedicated hardware. On the runtime system side, TaskGenX decouples the task creation from the other runtime activities. It then transfers this part of the runtime to a specialized hardware. We draw the requirements for this hardware in order to boost execution of highly parallel applications. From our evaluation using 11 parallel workloads on both symmetric and asymmetric multicore systems, we obtain performance improvements up to 15×, averaging to 3.1× over the baseline. ; This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), and by the European Union's Horizon 2020 research and innovation programme under grant agreement No. 671697 and No. 779877. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104. Finally, the authors would like to thank Thomas Grass for his valuable help with the simulator. ; Peer Reviewed ; Postprint (author's final draft)
BASE
TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism
As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as parallel programming models are attracting a lot of attention. Task-based parallel programming models offer an appealing approach to utilize complex CMPs. However, the increasing number of cores on modern CMPs is pushing research towards the use of fine grained parallelism. Task-based programming models need to be able to handle such workloads and offer performance and scalability. Using specialized hardware for boosting performance of task-based programming models is a common practice in the research community. Our paper makes the observation that task creation becomes a bottleneck when we execute fine grained parallel applications with many task-based programming models. As the number of cores increases the time spent generating the tasks of the application is becoming more critical to the entire execution. To overcome this issue, we propose TaskGenX. TaskGenX offers a solution for minimizing task creation overheads and relies both on the runtime system and a dedicated hardware. On the runtime system side, TaskGenX decouples the task creation from the other runtime activities. It then transfers this part of the runtime to a specialized hardware. We draw the requirements for this hardware in order to boost execution of highly parallel applications. From our evaluation using 11 parallel workloads on both symmetric and asymmetric multicore systems, we obtain performance improvements up to 15×, averaging to 3.1× over the baseline. ; This work has been supported by the RoMoL ERC Advanced Grant (GA 321253), by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contracts TIN2015-65316-P), by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), and by the European Union's Horizon 2020 research and innovation programme under grant agreement No. 671697 and No. 779877. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104. Finally, the authors would like to thank Thomas Grass for his valuable help with the simulator. ; Peer Reviewed ; Postprint (author's final draft)
BASE
Asynchronous and exact forward recovery for detected errors in iterative solvers
© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. ; Current trends and projections show that faults in computer systems become increasingly common. Such errors may be detected, and possibly corrected transparently, e.g. by Error Correcting Codes (ECC). For a program to be fault-tolerant, it needs to also handle the Errors that are Detected and Uncorrected (DUE), such as an ECC encountering too many bit flips in a codeword. While correcting an error has an overhead in itself, it can also affect the progress of a program. The most generic technique, rolling back the program state to a previously taken checkpoint, sets back any progress done since then. Alternately, application specific techniques exist, such as restarting an iterative program with its latest iteration's values as initial guess. ; This manuscript is the journal extension of a previously published conference paper [25]. This work has been partially supported by the European Research Council under the European Union's 7th FP, ERC Advanced Grant 321253, and by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P. L. Jaulmes has been partially supported by the Spanish Ministry of Education, Culture and Sports under grant FPU2013/06982. M. Moretó has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship JCI-2012-15047. M. Casas has been partially supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Co-fund programme of the Marie Curie Actions of the European Union's 7th FP (contract 2013 BP B 00243). We would like to thank Nicolas Vidal for his contribution on using huge pages natively. ; Peer Reviewed ; Postprint (author's final draft)
BASE