Chronic Helicobacter pylori infection is known to be associated with the development of peptic ulcer, gastric cancer and gastric lymphoma. Currently, the bacterial factors of H. pylori are reported to be important in the development of gastroduodenal diseases. CagA protein, encoded by the cagA, is the best studied virulence factor of H. pylori. The pathogenic CagA protein contains a highly polymorphic Glu-Pro-Ile-Tyr-Ala (EPIYA) repeat region in the C-terminal. This repeat region is reported to be involved in the pathogenesis of gastroduodenal diseases. The segments containing EPIYA motifs have been designated as segments A, B, C, and D; however the classification and disease relation are still unclear. This study used 560 unique CagA sequences containing 1,796 EPIYA motifs collected from public resources, including 274 Western and 286 East Asian strains with clinical data obtained from 433 entries. Fifteen types of EPIYA or EPIYA-like sequences are defined. In addition to four previously reported major segment types, several minor segment types (e.g., segment B', B'') and more than 30 sequence types (e.g., ABC, ABD) were defined using our classification method. We confirm that the sequences from Western and East Asian strains contain segment C and D, respectively. We also confirm that strains with two EPIYA segment C have a greater chance of developing gastric cancer than those with one segment C. Our results shed light on the relationships between the types of CagAs, the country of origin of each sequence type, and the frequency of gastric disease.