Modeling complex correlations on multiview data is still challenging, especially for high-dimensional features with possible noise. To address this issue, we propose a novel unsupervised multiview representation learning (UMRL) algorithm, termed autoencoder in autoencoder networks (AE 2 -Nets). The proposed framework effectively encodes information from high-dimensional heterogeneous data into a compact and informative representation with the proposed bidirectional encoding strategy. Specifically, the proposed AE 2 -Nets conduct encoding in two directions: the inner-AE-networks extract view-specific intrinsic information (forward encoding), while the outer-AE-networks integrate this view-specific intrinsic information from different views into a latent representation (backward encoding). For the nested architecture, we further provide a probabilistic explanation and extension from hierarchical variational autoencoder. The forward-backward strategy flexibly addresses high-dimensional (noisy) features within each view and encodes complementarity across multiple views in a unified framework. Extensive results on benchmark datasets validate the advantages compared to the state-of-the-art algorithms.