We present a differentiable formalism for learning free energies that is capable of capturing arbitrarily complex model dependencies on coarse-grained coordinates and finite-temperature response to variation of general system parameters. This is done by endowing models with explicit dependence on temperature and parameters and by exploiting exact differential thermodynamic relationships between the free energy, ensemble averages, and response properties. Formally, we derive an approach for learning high-dimensional cumulant generating functions using statistical estimates of their derivatives, which are observable cumulants of the underlying random variable. The proposed formalism opens ways to resolve several outstanding challenges in bottom-up molecular coarse graining dealing with multiple minima and state dependence. This is realized by using additional differential relationships in the loss function to significantly improve the learning of free energies, while exactly preserving the Boltzmann distribution governing the corresponding fine-grain all-atom system. As an example, we go beyond the standard force-matching procedure to demonstrate how leveraging the thermodynamic relationship between free energy and values of ensemble averaged all-atom potential energy improves the learning efficiency and accuracy of the free energy model. The result is significantly better sampling statistics of structural distribution functions. The theoretical framework presented here is demonstrated via implementations in both kernel-based and neural network machine learning regression methods and opens new ways to train accurate machine learning models for studying thermodynamic and response properties of complex molecular systems.