Features Selection Methods

Feature selection is a process in which features from a dataset are selected with the characteristics such as face, 􏰁fingerprint, iris, hand, voice, gait, and signature. Biometric systems are based on the premise that many of the physical or behavioral attributes of humans can be uniquely associated with an individual of improving classi􏰁fication accuracy and decreasing computational complexity. It is closely related to feature extraction, a process in which feature vectors are created from the original dataset through manipulations of the data space, and can be considered to be a 􏰂superset􏰃 of the feature selection techniques. Feature selection can be classi􏰁fied into three main groups: Filters, Wrappers, and Embedded.

Filter methods use independent techniques to select features from the feature space. These techniques are based on a number of di􏰀fferent statistical tests, or on the information theoretic concept of Mutual Information (MI).


Wrapper methods embed the feature selection into a classifi􏰁er, where the classi􏰁fication performance is used to measure the quality of the currently selected feature set.


The principle of wrapper methods is generally based on the optimization of the accuracy rate, estimated by one of the following methods: hold-out, cross-validation, or bootstrap.


Embedded methods are similar to wrappers, but instead of using the classi􏰁fier as a black box, they integrate the classifi􏰁er into the feature selection algorithm. The feature selection process is embedded into the construction of the classifi􏰁er, as the classi􏰁fier learns the appropriate weights for a given feature, and can remove the feature from consideration. The term embedded methods thus covers a wide range of di􏰀fferent feature selection techniques, making it di􏰆cult to analyze them as a group beyond their dependence on a particular classi􏰁fication algorithm. Embedded methods di􏰀er from other feature selection methods in the way feature selection and learning interact. These three di􏰀fferent methods have relative advantages and drawbacks

1.- mRMR

It means minimum-Redundancy-Maximum-Relevance feature/variable/attribute selection. The goal is to select a feature subset set that best characterizes the statistical property of a target classification variable, subject to the constraint that these features are mutually as dissimilar to each other as possible, but marginally as similar to the classification variable as possible. We showed several different forms of mRMR, where “relevance” and “redundancy” were defined using mutual information, correlation, t-test/F-test, distances, etc.

Importantly, for mutual information, we showed that the method to detect mRMR features also searches for a feature set of which features jointly have the maximal statistical “dependency” on the classification variable. This “dependency” term is defined using a new form of the high-dimensional mutual information.


The mRMR method was first developed as a fast and powerful feature “filter”. We then also showed a method to combine mRMR and “wrapper” selection methods. These methods have produced promising results on a range of datasets in many different areas.

For more information of mRMR, you may want to read one of the following papers. Details information of more related publications can be found here.

A.-Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” Hanchuan Peng, Fuhui Long, and Chris Ding, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 27, No. 8, pp.1226-1238, 2005.

B.-Minimum redundancy feature selection from microarray gene expression data,”  Chris Ding, and Hanchuan Peng, Journal of Bioinformatics and Computational Biology, Vol. 3, No. 2, pp.185-205, 2005.


It means Normalized Mutual Information Feature Selections, Estevez et al. proposed an improved version of mRMR based on the normalized feature of mutual information; the between 2 random variables is bounded above by the minimum of their entropies. As the entropy of a feature could vary greatly, this measure must be normalized before applying it to a global set of features as:


where IN, is the normalized MI by the minimum entropy of both features, as defined in:


For more information of NMIFS, you may want to read one of the following papers:

A.- P. A. Estevez, M. Tesmer, C. A. Perez, and J. M. Zurada, “Normalized mutual information feature selection,” IEEE Trans. Neural Netw., vol. 20, no. 2, pp. 189–201, Feb. 2009.


It means Conditional Mutual Information Feature Selections.

For more information of CMIFS, you may want to read one of the following papers:

A.- Cheng, Hongrong, Zhiguang Qin, Weizhong Qian, and Wei Liu. ” Conditional mutual information based feature selection “. Proc. Int. Symp. Knowledge Acquisition and Modeling 103–107, 2008.

4.- CMIM

It means Conditional Mutual Information Maximization

For more information of CMIM, you may want to read one of the following papers:

A.- F. Fleuret and I. Guyon, “Fast binary feature selection with conditional mutual information,” J. Mach. Learning Res., vol. 5, pp. 1531–1555, 2004.

B.- G. Wang and F. H. Lochovsky, “Feature selection with conditional mutual information maximin in text categorization,” in Proc. 13th ACM Int. Conf. Information and Knowledge Management, 2004, pp. 342–349

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

Up ↑

%d bloggers like this: