Pandas is important library in python language for data analytics.
In this post, we will use pandas for our machine learning task, and appreciate the ease with which things get done.
Let say we have data in text data, with no headers
Fig1: Sample data in text
After Features are identified, and net format that is given as input to classifier looks like this
Fig2:Format of Input file for classifier
For small input file, manually entering data and feeding it to classifier is easy. When when input data is too huge, one need to consider alternative easy way to do the same task in python way.
Step1: Load file in pandas and concatenate vertically
Vertically half data should be of class1, and rest half data should be of class o,
Step2: Create header
In my case i have ROI1 to ROI 113 as features
Empty list called col is initiated, and for every increment of i, ROI+str(i) is saved or appended in list column.
After loop of 113, new list with 113 instances is created.
Header list is created, But how to add it as data frame column name. Here’s how to do it, 🙂
Step3: Add class column.
In my case i have 196 class1, 188 class 0.
Similarly empty list normal and patients is created, and list is filled using while loop,
List normal has 197 ones which are class 1, and list patients have 189 zeros which are class 0.
By now, we have added, header to data, if class column is created, then format is ready for classifier.
For concatenating two list horizontally,
For adding class column to dataframe
Net dataframe looks like
Note: To access sections of data frame, pull data with column names
Likewise huge data frame can be splitted, with respect to column name.
The same work, can be done with matlab too, but with lengthy codes.