Vision transformers (ViTs) are rapidly evolving and are widely used in computer vision. However, high-performance ViTs require many computations, which limit their further development in the vision field. In this article, a novel evolutionary dual-stream transformer (E-DST) model is proposed to alleviate the computational resource demand problem. A hybrid attention mechanism structure is proposed for a DST model. The DST model uses a dual-branch structure to fuse convolutional and transformer features. Combining the features learned by the transformer and convolution effectively saves model computational resources. In addition, an evolutionary optimizer is proposed to optimize the parameters of the model. The excellent search ability of the evolutionary algorithm is utilized to optimize the transformer model parameters. The convergence of the evolutionary optimizer is proved in this article. In addition, the proposed E-DST model is experimentally compared with a variety of classic models and their deformations based on three datasets. And, the evolutionary optimizer proves its generality in convolutional and recurrent neural networks. The experimental results show that the E-DST model can effectively reduce computational resources and that the evolutionary optimizer can solve large-scale optimization problems. In conclusion, our proposed method is feasible and effective.