A central challenge in neuroscience is decoding brain activity to uncover mental content comprising multiple components and their interactions. Despite progress in decoding language-related information from human brain activity, generating comprehensive descriptions of complex mental content associated with structured visual semantics remains challenging. We present a method that generates descriptive text mirroring brain representations via semantic features computed by a deep language model. Constructing linear decoding models to translate brain activity induced by videos into semantic features of corresponding captions, we optimized candidate descriptions by aligning their features with brain-decoded features through word replacement and interpolation. This process yielded well-structured descriptions that accurately capture viewed content, even without relying on the canonical language network. The method also generalized to verbalize recalled content, functioning as an interpretive interface between mental representations and text and simultaneously demonstrating the potential for nonverbal thought-based brain-to-text communication, which could provide an alternative communication pathway for individuals with language expression difficulties, such as aphasia.