Patterns for amalgamation include and extend standards joins in databases and datasets (e.g., Pandas) to a probabilistic join .


You have found a data set that includes features you care about : they are relevant to your domain. You have many datasets that are potentially useful.


How do you enrich one starter, baseline dataset with additional (features coming from new) data sets so you can increase accuracy, precision and recall, optimize confusion matrices and AUC curves?

Solution Strategy

Start with a data set that captures features relevant to your domain. Often you need training data which means you may need a curated/labelled data set to start with.

Take a 3 datasets and enrich data using one of the datasets as a basis. Then add other datasets to the original to refine and enrich the data with new features/columns.

Sometimes the new the datasets may not have a clear unequivocal join: you have a name of an author who you have ranked for credibility in an NLP application. But in another data set you do not have authors that may not be in your base, ranked dataset.

You might do a probabilistic join by looking at the cosine similarity between the ones you have and the new one (or ones) that are new and you have not assessed credibility for (for example).


Take into account the balance of your data: does it overwhelmingly have one type / category versus equal amounts from all types you are considering?

Use SMOTE or IBM Fairness 360 or other APIs to find the balance or lack thereof of your datasets.




Deep context

Deep context is a combination of quantitative machine learning and qualitative analytical processes that include a wide-spectrum of datasets, network analysis, sentiment and empathy based analysis . Added to these we factor the event proximity to populations in need, the utilisation of local knowledge and the understanding of transnational threats and regional dynamics, which is especially vital in mitigating against the risk 

See deep-context.com

customer experience. Context is a make or break proposition to any customer experience program. insights that are specific to your domain are essential to providing input and driving decision-making and strategy execution throughout an organization:

  • Gain insight into intent themes that provide insight into the exact aspects of the conversation you want to understand
  • Gain insight into how your brand perception across industry specific KPIs.
  • Context for each industry . Industries include retail, telecommunications, healthy living, fashion, restaurant, automotive, finance and more
  • Grow your business by understanding the loyalty concerns your target customers have, and how to address those concerns for customer retention

Compare Algorithm Outputs to Reveal Underlying DataSet Structure

Problem: Data has noise. Data sets must be cleansed, normalized and curated. But you never really can tell for sure which algorithms to apply unless you compare results.


  1. Plan to generate different visualizations  of your data set,
  2. Plan to create and various transforms on your data set
  3. Run through a number of typical algorithms on each view of your dataset.

Purpose:  Clarify which data transformations will reveal the structure of the dataset that underlies your problem.