Sources are the first step of any BigML workflow. A source is a collection of instances of the entity that you want to model stored in tabular format in a computer file. The main purpose of BigML sources is to make sure that BigML parses and interprets each instance in your source correctly. Sources contain the following parsing and data dictionary information, among other details:
- The fields that are detected in your data file
- The locale used to parse your decimal numbers
- The tokens considered as missing values
- Whether your file has a header row
- The character used as field separator in your file
- The character used as quote in your file
- How your text fields will be parsed
- The separator character in your items fields
BigML infers the answers for these questions by analyzing the first lines in your data file. However, if you need to change any of these choices, you will need to update your source object and your data will be reinterpreted according to the new configuration.
For further learning, please:
- Watch this video for a gentle introduction to sources.
- Read the detailed documentation about sources here if you are using the Dashboard, or here if you prefer the API.
- To continue building your Machine Learning workflows in BigML, please visit these related questions: What is a Dataset?, What is a BigML resource?