Data in many practical problems are acquired according to decisions or actions made by users or experts to achieve specific goals. For instance, policies in the mind of biologists during the intervention process in genomics and metagenomics are often reflected in available data in these domains, or data in cyber-physical systems are often acquired according to actions/decisions made by experts/engineers for purposes, such as control or stabilization. Quantification of experts' policies through available data, which is also known as reward function learning, has been discussed extensively in the literature in the context of inverse reinforcement learning (IRL). However, most of the available techniques come short to deal with practical problems due to the following main reasons: 1) lack of scalability: arising from incapability or poor performance of existing techniques in dealing with large systems and 2) lack of reliability: coming from the incapability of the existing techniques to properly learn the optimal reward function during the learning process. Toward this, in this brief, we propose a multifidelity Bayesian optimization (MFBO) framework that significantly scales the learning process of a wide range of existing IRL techniques. The proposed framework enables the incorporation of multiple approximators and efficiently takes their uncertainty and computational costs into account to balance exploration and exploitation during the learning process. The proposed framework's high performance is demonstrated through genomics, metagenomics, and sets of random simulated problems.