We have developed a machine-learning approach to identify 3537 discrete orthologue protein sequence groups distributed across all available archaeal genomes. We show that treating these orthologue groups as binary detection/non-detection data is sufficient to capture the majority of archaeal phylogeny. We subsequently use the sequence data from these groups to infer a method and substitution-model-independent phylogeny. By holding this phylogeny constrained and interrogating the intersection of this large dataset with both the Eukarya and the Bacteria using Bayesian and maximum-likelihood approaches, we propose and provide evidence for a methanogenic origin of the Archaea. By the same criteria, we also provide evidence in support of an origin for Eukarya either within or as sisters to the Thaumarchaea.