Feature selection is a process in which features from a dataset are selected with the characteristics such as face, fingerprint, iris, hand, voice, gait, and signature. Biometric systems are based on the premise that many of the physical or behavioral attributes of humans can be uniquely associated with an individual of improving classification accuracy and decreasing computational complexity. It is closely related to feature extraction, a process in which feature vectors are created from the original dataset through manipulations of the data space, and can be considered to be a superset of the feature selection techniques. Feature selection can be classified into three main groups: Filters, Wrappers, and Embedded.
Filter methods use independent techniques to select features from the feature space. These techniques are based on a number of different statistical tests, or on the information theoretic concept of Mutual Information (MI).
Wrapper methods embed the feature selection into a classifier, where the classification performance is used to measure the quality of the currently selected feature set.
The principle of wrapper methods is generally based on the optimization of the accuracy rate, estimated by one of the following methods: hold-out, cross-validation, or bootstrap.
Embedded methods are similar to wrappers, but instead of using the classifier as a black box, they integrate the classifier into the feature selection algorithm. The feature selection process is embedded into the construction of the classifier, as the classifier learns the appropriate weights for a given feature, and can remove the feature from consideration. The term embedded methods thus covers a wide range of different feature selection techniques, making it dicult to analyze them as a group beyond their dependence on a particular classification algorithm. Embedded methods dier from other feature selection methods in the way feature selection and learning interact. These three different methods have relative advantages and drawbacks
It means minimum-Redundancy-Maximum-Relevance feature/variable/attribute selection. The goal is to select a feature subset set that best characterizes the statistical property of a target classification variable, subject to the constraint that these features are mutually as dissimilar to each other as possible, but marginally as similar to the classification variable as possible. We showed several different forms of mRMR, where “relevance” and “redundancy” were defined using mutual information, correlation, t-test/F-test, distances, etc.
Importantly, for mutual information, we showed that the method to detect mRMR features also searches for a feature set of which features jointly have the maximal statistical “dependency” on the classification variable. This “dependency” term is defined using a new form of the high-dimensional mutual information.
The mRMR method was first developed as a fast and powerful feature “filter”. We then also showed a method to combine mRMR and “wrapper” selection methods. These methods have produced promising results on a range of datasets in many different areas.
For more information of mRMR, you may want to read one of the following papers. Details information of more related publications can be found here.
A.- “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” Hanchuan Peng, Fuhui Long, and Chris Ding, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp.1226-1238, 2005.
B.- “Minimum redundancy feature selection from microarray gene expression data,” Chris Ding, and Hanchuan Peng, Journal of Bioinformatics and Computational Biology, Vol. 3, No. 2, pp.185-205, 2005.
It means Normalized Mutual Information Feature Selections, Estevez et al. proposed an improved version of mRMR based on the normalized feature of mutual information; the between 2 random variables is bounded above by the minimum of their entropies. As the entropy of a feature could vary greatly, this measure must be normalized before applying it to a global set of features as:
where IN, is the normalized MI by the minimum entropy of both features, as defined in:
For more information of NMIFS, you may want to read one of the following papers:
A.- P. A. Estevez, M. Tesmer, C. A. Perez, and J. M. Zurada, “Normalized mutual information feature selection,” IEEE Trans. Neural Netw., vol. 20, no. 2, pp. 189–201, Feb. 2009.
It means Conditional Mutual Information Feature Selections.
For more information of CMIFS, you may want to read one of the following papers:
A.- Cheng, Hongrong, Zhiguang Qin, Weizhong Qian, and Wei Liu. ” Conditional mutual information based feature selection “. Proc. Int. Symp. Knowledge Acquisition and Modeling 103–107, 2008.
It means Conditional Mutual Information Maximization
For more information of CMIM, you may want to read one of the following papers:
A.- F. Fleuret and I. Guyon, “Fast binary feature selection with conditional mutual information,” J. Mach. Learning Res., vol. 5, pp. 1531–1555, 2004.
B.- G. Wang and F. H. Lochovsky, “Feature selection with conditional mutual information maximin in text categorization,” in Proc. 13th ACM Int. Conf. Information and Knowledge Management, 2004, pp. 342–349