Preparing your data for modelling will probably require some work, and in order to clean, homogenize and interpret your data correctly there's two kinds of operations that you might need to do:
- Parsing the data file content: assigning field types, setting up the language that data is written in, defining how text is analyzed (stemming, stop words...) or which tokens are associated to missing information.
- Creating new fields based on the ones existing in your data file: normalizing, scaling, computing ratios, aggregating, joining or merging datasets and feature engineering in general.
Parsing related operations can be done using the Source configuration action available in the Dashboard Source view
or through the source API endpoint. They will be applied on the Source that is created once you upload your data to BigML. You can learn more about these in the Dashboard Source's documentation and the Sources API documentation.
Feature engineering operations can be done using the transformations menu available in the Dashboard Dataset view
or through the dataset API endpoint. They will be applied to existing datasets and will end up generating a new dataset based on the previous one. You can learn more about these in the Dashboard Dataset's documentation and the Datasets API documentation.