The performance of machine learning algorithms depends to a large extent on the amount and the quality of data available for training. Simulations are most often used as test-beds for assessing the performance of trained models on simulated environment before deployment in real-world. They can also be used for data annotation, i.e, assigning labels to observed data, providing thus background knowledge for domain experts. We want to integrate this knowledge into the machine learning process and, at the same time, use the simulation as an additional data source. Therefore, we present a framework that allows for the combination of real-world observations and simulation data at two levels, namely the data or the model level. At the data level, observations and simulation data are integrated to form an enriched data set for learning. At the model level, the models learned from observed and simulated data separately are combined using an ensemble technique. Based on the trade-off between model bias and variance, an automatic selection of the appropriate fusion level is proposed. Our framework is validated using two case studies of very different types. The first is an industry 4.0 use case consisting of monitoring a milling process in real-time. The second is an application in astroparticle physics for background suppression.