Functional Magnetic Resonance Imaging is a powerful tool for studying brain function but presents challenges due to high dimensionality and variability. We propose a self-supervised transformer-based foundation model using a masked autoencoder to learn generalizable representations of fMRI time series. Trained on the Human Connectome Project (HCP) S1200 dataset, the model is evaluated on cognitive task classification and neuroticism prediction using linear, MLP, and ConvLSTM probes under zero-shot and fine-tuning settings. Our model outperforms training from scratch, exceeding 90% accuracy in cognitive task classification and improving correlations in neuroticism prediction. Architectural enhancements, including contrastive loss and spatiotemporal attention, further refine representations. These results highlight the potential of self-supervised transformers for fMRI analysis, enabling scalable, generalizable models for neuroscience and clinical applications.