### Stats with Python: Sample Correlation Coefficient is Biased

February 24, 2021 | 8 min readIs the sample correlation coefficient an unbiased estimator? No! This post visualizes how large its bias is and shows how to fix it.

Under the sea, in the hippocampus's garden...

30 posts tagged with "en"

Is the sample correlation coefficient an unbiased estimator? No! This post visualizes how large its bias is and shows how to fix it.

A golang implementation of dispatcher-worker pattern with errgroup. It immediately cancels the other jobs when an error occurs in any goroutine.

The correlation coefficient is a familiar statistic, but there are several variations whose differences should be noted. This post recaps the definitions of these common measures.

When you sample from a finite population without replacement, beware the finite population correction. The samples are not independent of each other.

What is unbiased sample variance? Why divide by n-1? With a little programming with Python, it's easier to understand.

Let's re-inplement face swapping in 10 minutes! This post shows a naive solution using a pre-trained CNN and OpenCV.

NeurIPS 2020 virtual conference was full of exciting presentations! Here I list some notable ones with brief introductions.

Let's look back on the machine learning papers published in 2020! This post covers 10 representative papers that I found interesting and worth reading.

Lightweight GAN has opened the way for generating fine images with ~100 training samples and affordable computing resources. This post presents "This Sushi Does Not Exist" and how I built it with GAE.

If you want to use a custom loss function with a modern GBDT model, you'll need the first- and second-order derivatives. This post shows how to implement them, using LightGBM as an example

How come ROC-AUC is equal to the probability of a positive sample ranked higher than negative ones? This post provides an answer with a fun example.

Transformer has undergone various application studies, model enhancements, etc. This post aims to provide an overview of these studies.

This post introduces how to sample groups from a dataset, which is helpful when you want to avoid data leakage.

This post compares the behaviors of different feature importance measures in tricky situations.

This post introduces the Pandas method of `query`, which allows us to query dataframes in an SQL-like manner.

This is a golang sample code that calls some function periodically for a specified amount of time.

This post introduces PFRL, a new reinforcement learning library, and uses it to learn to play the Slime Volleyball game on Colaboratory.

Causal inference is becoming a hot topic in ML community. This post formulates one of its important concepts called doubly robust estimator with simple notations.

This post introduces how to count page views and show popular posts in the sidebar of Gatsby Blog. Google Analytics saves you the trouble of preparing databases and APIs.

This post summarizes how to group data by some variable and draw boxplots on it using Pandas and Seaborn.

Double descent is one of the mysteries of modern machine learning. I reproduced the main results of the recent paper by Nakkiran et al. and posed some questions that occurred to me.

This post explains how MobileBERT succeeded in reducing both model size and inference time and introduce its implementation in TensorFlow.js that works on web browsers.

Stuck in an error when enabling Gatsby incremental builds on Netlify? This post might help it.

Have you ever confused Pandas methods `loc`, `at`, and `iloc` with each other? It's no more confusing when you have this table in mind.

"Representation" is a way AIs understand the world. This post is a short introduction to the representation learning in the "deep learning era."

This post introduces how to put arranged SNS share buttons for Gatsby blog posts.

How does Google's PageRank work? Its theory and algorithm are explained, followed by numerical experiments.

Want to generate realistic images with a single GPU? This post demonstrates how to downsize StyleGAN2 with slight performance degradation.

Citation counts shouldn't be the only measurement of the impact of academic papers. I applied Google's PageRank to evaluating academic papers's importance.

This post introduces how to make top navigation bar with background image for Gatsby blog.