Complex phenomena of societal interest such as weather, seismic activity and urban crime, are often punctuated by rare and extreme events, which are difficult to model and predict. Evidence of long-range persistence of such events has underscored the need to learn deep stochastic structures in data for effective forecasts. Recently neural networks (NN) have emerged as a defacto standard for deep learning. However, key problems remain with NN inference, including a high sample complexity, a general lack of transparency, and a limited ability to directly model stochastic phenomena. In this study we suggest that deep learning and the NN paradigm are conceptually distinct -- and that it is possible to learn ``deep' associations without invoking the ubiquitous NN strategy of global optimization via back-propagation. We show that deep learning of stochastic phenomena is related to uncovering the emergent self-similarities in data, which avoids the NN pitfalls offering crucial insights into underlying mechanisms. Using the Fractal Net (FN) architecture introduced here, we actionably forecast various categories of rare weather and seismic events, and property and violent crimes in major US cities. Compared to carefully tuned NNs, we boost recall at 90% precision by 161.9% for extreme weather events, 191.3% for light-to-severe seismic events with magnitudes above the local third quartile, and 50.8% - 404.9% for urban crime, demonstrating applicability in diverse systems of societal interest. This study opens the door to precise prediction of rare events in spatio-temporal phenomena, adding a new tool to the data science revolution.