Discover how Independent Component Analysis (ICA) can solve complex problems like the cocktail party problem and enhance data analysis. Learn about ICA’s mathematical foundations, compare it with PCA, and see real-world applications with EEG data. Dive into practical code examples and preprocessing tips in this comprehensive guide.
Table of Contents:
- Introduction to Independent Component Analysis (ICA)
- Overview of ICA in the Context of Data Science
- What You’ll Learn in This Article
- Understanding the Cocktail Party Problem
- The Cocktail Party Problem Explained
- How ICA Solves the Cocktail Party Problem
- Key Assumptions of Independent Component Analysis
- Statistical Independence of Components
- The Importance of Non-Gaussianity in ICA
- The Mathematical Foundations of ICA
- Transforming Measured Signals to Independent Components
- Mathematical Goals and Optimization in ICA
- Comparing PCA and ICA
- Principal Component Analysis (PCA) vs. Independent Component Analysis (ICA)
- How PCA Compresses Data and ICA Separates It
- Preprocessing Data for ICA
- The Role of Auto-Scaling in Data Preprocessing
- Why Apply PCA Before ICA?
- Real-World Application: Using ICA on EEG Data
- Understanding EEG and Dimensionality Reduction with PCA
- Improving EEG Signal Quality by Removing Artifacts with ICA
- Step-by-Step Example: Blink Artifact Removal with ICA
- Visualizing EEG Data and Blink Artifacts
- Applying PCA and ICA to Detect and Remove Blink Artifacts
- Practical Implementation and Code Examples
- Using MATLAB and Python for PCA and ICA
- Code Examples for ICA in EEG Data Analysis
- Conclusion, Next Steps, and Engagement
- Summary of ICA Key Takeaways and Further Reading
- Leave Feedback, Subscribe, and Stay Updated
Introduction to Independent Component Analysis (ICA)
Overview of ICA in the Context of Data Science
Hey guys. Welcome back. This is article number two in the two-part series on principal component analysis PCA and independent component analysis. ICA is the topic of this article so much like the last article.
What You’ll Learn in This Article
I'll start with a brief introduction to the technique and dive into the math a little bit. I will compare the two approaches and talk about their similarities and differences. Then I'll finish with a concrete example of how you can use ICA with some example code provided in the GitHub repository. So let's get right into it, okay.
Understanding the Cocktail Party Problem
The Cocktail Party Problem Explained
The standard problem for independent component analysis is the cocktail party problem. So in its simplest form, you can think of two people having a conversation at a cocktail party. Whatever reason, you happen to have two microphones kind of set up. Next to these two speakers, both microphones are going to pick up audio from both of the speakers, uh kind of like our purple microphone and pink microphones. Here the purple microphones are a little closer to the blue speaker. So it picks up more of the blue speaker's audio relative to the red speaker and vice versa.
The pink microphone picks up more audio from the red speaker than the blue speaker. So then the problem is, how can we take these audio recordings that have both speakers kind of side of the conversation mixed together and separated out um into two audio files uh each of which only contains audio from a single speaker. Well, that's exactly, what independent component analysis does it trans uh transforms a set of vectors.
How ICA Solves the Cocktail Party Problem
You can think of the raw or you can think of the recorded audio by these two microphones into a maximally independent set. So that's what's being represented here. Um So you have the purple and pink audio signals then they get translated uh to the original.
The sources of the audio which was uh the speech from the blue speaker and then the speech of the red speaker respectively. So again the purple and pink are your measured signals. The blue and the red are the independent components or the source of the information or the audio, okay.
Key Assumptions of Independent Component Analysis
Statistical Independence of Components
There are a couple key assumptions to independent component analysis. So assumption number one. Your independent components are statistically independent and that's defined in the typical way in statistics. The joint distribution of two variables x and y is equal to the probability distribution of x times the probability of y. The second key assumption is that your independent components are non-gaussian which might be a little strange.
The Importance of Non-Gaussianity in ICA
You know in statistics and science we love to say everything's Gaussian it makes things much nicer and it allows us to do a lot of rigorous analysis. But this is one of the instances where we actually need the independent components to be non-gaussian for this to work, okay.
The Mathematical Foundations of ICA
Transforming Measured Signals to Independent Components
We have our measured signals again, that's from your microphone example, and then the independent components which is what your speakers are saying in the cocktail party problem. So we can use our independent components. We can combine them in some way, so that's what's being represented by this expression to kind of recreate our measured signals x. You can think of the independent components as being sources.
Hence that's why this vector is an s. They are sources of information or audio that are being combined in some way to generate what's being measured at your microphone. For example, we have x1 and x2. Your measured signals and then your independent components or the sources of your signals s1 and s2.
You can also kind of turn this around and you can combine your measured signals to express your independent components. If this is the case, you can just have some linear combination of your measured signals to derive your independent components the set of values defined by w is all we need to do ica, okay.
Mathematical Goals and Optimization in ICA
Mathematically, the goal is as follows. So given some uh measured signals given some data x. We want to solve for the matrix w, such that the set of independent components or the set of source vectors s sub i are maximally independent. This concept of maximally independent, what does that mean? How do we quantify that?
There are two ways you can define w in such a way that it minimizes the mutual information between all your independent components or you can maximize the non-gaussianity of the independent components defined by uh this w matrix. Um, I'm not going to go any further than that if you're interested in more information check out the blog post linked in the description.
Comparing PCA and ICA
Principal Component Analysis (PCA) vs. Independent Component Analysis (ICA)
I kind of go into a bit more detail on the math pc and ica. They're similar techniques in a lot of ways but ultimately, they're distinct. They are different approaches that kind of aim at different tasks or they make different goals.
How PCA Compresses Data and ICA Separates It
PCA typically compresses information. If you saw the pca article, the example was hot dogs and hot dog buns. Those are two quantities that are heavily correlated. So instead of representing that information with two variables. You can represent it with just one. So that's where pca is a good thing to use because it will compress those variables into those two variables into a single variable.
On the other hand, ica separates information. It's going to take two variables. For example, your two speakers or the audio picked up by two microphones placed close to two speakers. It's going to separate out the independent components or the sources or the independent drivers of those measured signals.
Preprocessing Data for ICA
The Role of Auto-Scaling in Data Preprocessing
so kind of similar, but they are different goals and different final outcomes a commonality between pc and ica is auto-scaling. So this is a critical part of the preprocessing. So auto-scaling is for each variable, you have to subtract the average of that variable and divide each element by the standard deviation of that variable.
Why Apply PCA Before ICA?
That's one of the reasons, why it's typically advantageous to apply pca to your data set before applying ica. Because it kind of all the pre-processing is already handled for you. PCA will clump all the information together. The correlated variables and then ica will come in and separate out independent drivers if it's applicable, okay.
So as always, I'm going to include a concrete example. So here is something relevant to my research and this is where I kind of came across the whole technique of independent component analysis to solve this specific problem in my research.
Real-World Application: Using ICA on EEG Data
Understanding EEG and Dimensionality Reduction with PCA
We deal with uh eeg data. So what is eeg uh eeg is a technique of measuring brain activity uh by placing a set of electrodes on the head. You know eeg is a very powerful technique because it has a very good temporal resolution and it's also non-invasive. People can kind of move around with this cap on, but that kind of also leads to one of its more fundamental weaknesses.
Improving EEG Signal Quality by Removing Artifacts with ICA
Since the electrical signals that it's trying to measure from the brain are so weak. EEG has to be very sensitive to this kind of fluctuations and voltage which makes it very prone to artifacts or perturbations oscillations in your signal that do not come from brain activity.
Step-by-Step Example: Blink Artifact Removal with ICA
Visualizing EEG Data and Blink Artifacts
This could be like blinking which is what we're trying to resolve in this problem or motion artifacts people talking or other kinds of noise that can kind of get injected into the data. So here we have a plot of the voltage versus time. I should have had labels on these axes, but the y-axis is voltage in millivolts the x-axis is a time index essentially and this is for the fp1 electrode.
Applying PCA and ICA to Detect and Remove Blink Artifacts
On the left side, on your left forehead. This electrode is particularly prone to blink artifacts. Because it's one of the closest electrodes to your eye and then you can actually see the blinks occurring because you'll have these giant spikes in the signal.
So we're trying to get rid of that. Because with EEG, we're trying to measure brain activity not blink activity, okay, So the first step here is applying pca. So, here we have 64 electrodes on our eeg. So that translates to 64 variables. We can use pca to kind of clump that down to just 21 variables and I just did some trial and error to find the right number of principal components to go down. At the bottom here, you can see the explained variation is 99.5 percent.
Practical Implementation and Code Examples
Using MATLAB and Python for PCA and ICA
Matlab it's really simple. You may notice. I don't explicitly auto-scale the data that's because the pca function in Matlab does this automatically which is pretty nice. But it's all done in one line in Matlab and it can be done in one line in Psychic or a couple lines in Psychic Learn.
Code Examples for ICA in EEG Data Analysis
If you didn't check out the previous article on pca that'll share some example code on how to do that, okay. Again we can apply ica to the set of principal components that we got from pca. So that's what's being done here. So now we can just plot all the independent components, okay. Again we had 64 electrodes on our eeg cap which translates to 64 variables. We then use pca to reduce the dimensionality from 64 variables to just 21.
Then finally, we applied ica to those 21 variables to kind of separate out the independent components, and then just looking at this visually uh kind of independent components 10, 5, and 12 are reminiscent of those blank artifacts.
Conclusion, Next Steps, and Engagement
Summary of ICA Key Takeaways and Further Reading
We saw in that initial plot and again these aren't just the independent components themselves or the raw independent components. I actually squared them so that all the values would be positive and then the blank artifacts would be a bit more prominent, okay.
So I just used a rough heuristic. Basically, I picked out the independent components which had four prominent peaks. To this isn't a robust way to do it i was doing something fast and wanted it to be repeatable so it picked out independent components 10 and 12 to correspond to the blanks, okay.
So 10 and 12, I'll buy that maybe five should have been included, but we'll see how it turns out okay and then we can essentially just drop independent components 10 and 12. Because they contain blink information which we're not interested in. We only want brain information, so we can drop those two components and then just work backward. We'll reconstruct our score matrix.
Basically the output of pca and then we can reconstruct our original 64 variables by going backwards in pca. So doing that and plotting everything before the blink removal, fp1 had these four prominent peaks corresponding to blinks.
Leave Feedback, Subscribe, and Stay Updated
Afterward uh they went away. So it's kind of like magic and this is a rough way to do it. So just as an example of what ica can be used for that concludes the two-part series on principle component analysis and independent component analysis.
If you found this article helpful, please like subscribe comment share. I would very much appreciate that and I would love to hear your feedback. I look forward to seeing you guys in the next article and thanks for reading.