Submitted by jimliu741523 t3_114de9s in MachineLearning
Hi there,
I am a research data scientist, and excited to release a new feature engineering library, designed to help you streamline the process of machine learning even more than before. Headjack is an open library which provides a ML features transformation based on self-supervised learning models, similar to huggingface as a hub, but which currently focuses on exchanging features for tabular data models.
Compared to textual data, tabular data are different in that each data set has different column length and attributes, this means that it cannot be typed consistently unlike the token embedded in NLP tasks. Therefore, Headjack is different from NLP’s pre-trained model with single domain transformation, but by performing with two different domain transformations. In other words, we can perform features transform between two domains without the same key value. In addition, release the potential of data that is not typically used. For example, enhance the prediction of the Boston housing price task applied in the Titanic domain, or enhance the prediction of the customers churn task applied in the African traffic domain and so on.
​
The IRIS dataset with California House Price Feature Transformation
The IRIS dataset with Titanic Feature Transformation
The IRIS dataset with KPMG Customer Demorgraphy Feature Transformation
​
ekbravo t1_j8w5lir wrote
Interesting concept, not sure if a corporate dataset will be allowed to be released into the wild. Plus one has to create an account not only to register on their website but also use one’s account info every time the code runs. Not for business use.