Replica-exchange (RE) algorithms are used to understand physical phenomena--ranging from protein folding dynamics to binding affinity calculations. They represent a class of algorithms that involve a large number of loosely coupled ensembles, and are thus amenable to using distributed resources. We develop a framework for RE that supports different replica pairing (synchronous versus asynchronous) and exchange coordination mechanisms (centralized versus decentralized) and which can use a range of production cyberinfrastructures concurrently. We characterize the performance of both RE algorithms at an unprecedented number of cores employed--the number of replicas and the typical number of cores per replica--on the production distributed infrastructure. We find that the asynchronous algorithms outperform the synchronous algorithms, even though details of the specific implementations are important determinants of performance.