We develop a new implementation of coupled-cluster singles and doubles (CCSD) optimized for the most recent graphical processing unit (GPU) hardware. We find that a single node with 8 NVIDIA V100 GPUs is capable of performing CCSD computations on roughly 100 atoms and 1300 basis functions in less than 1 day. Comparisons against massively parallel implementations of CCSD suggest that more than 64 CPU-based nodes (each with 16 cores) are required to match this performance.