Dataset statistics
Number of variables | 3 |
Number of observations | 37 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 1016.0 B |
Average record size in memory | 27.5 B |
Variable types
Categorical | 3 |
message is highly correlated with name and 1 other fields | High correlation |
name is highly correlated with message and 1 other fields | High correlation |
time is highly correlated with message and 1 other fields | High correlation |
name is highly correlated with message and 1 other fields | High correlation |
message is highly correlated with name and 1 other fields | High correlation |
time is highly correlated with name and 1 other fields | High correlation |
name is uniformly distributed | Uniform |
message is uniformly distributed | Uniform |
time is uniformly distributed | Uniform |
time has unique values | Unique |
Analysis started | 2022-06-09 14:06:23.736523 |
Analysis finished | 2022-06-09 14:06:25.549025 |
Duration | 1.81 second |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 30 |
Distinct (%) | 81.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 424.0 B |
dog | |
everything will be gone | 2 |
Slim Shady | 2 |
Charles | 2 |
Chainyo | 2 |
Other values (25) |
Max length | 23 |
Median length | 12 |
Mean length | 8.378378378 |
Min length | 2 |
Characters and Unicode
Total characters | 310 |
Distinct characters | 34 |
Distinct categories | 3 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
Unique | 24 ? |
Unique (%) | 64.9% |
1st row | Julien |
2nd row | Someone else |
3rd row | A friend |
4th row | A friend |
5th row | A stranger |
Common Values
Value | Count | Frequency (%) |
dog | 3 | 8.1% |
everything will be gone | 2 | 5.4% |
Slim Shady | 2 | 5.4% |
Charles | 2 | 5.4% |
Chainyo | 2 | 5.4% |
A friend | 2 | 5.4% |
Chris Emezue | 1 | 2.7% |
chef boyardee | 1 | 2.7% |
aa | 1 | 2.7% |
meow | 1 | 2.7% |
Other values (20) | 20 |
Value | Count | Frequency (%) |
dog | 3 | 5.2% |
a | 3 | 5.2% |
will | 2 | 3.4% |
be | 2 | 3.4% |
gone | 2 | 3.4% |
slim | 2 | 3.4% |
shady | 2 | 3.4% |
charles | 2 | 3.4% |
chainyo | 2 | 3.4% |
friend | 2 | 3.4% |
Other values (35) | 36 |
Most occurring characters
Value | Count | Frequency (%) |
e | 31 | 10.0% |
i | 23 | 7.4% |
a | 22 | 7.1% |
21 | 6.8% | |
h | 19 | 6.1% |
l | 18 | 5.8% |
o | 17 | 5.5% |
r | 16 | 5.2% |
n | 15 | 4.8% |
d | 14 | 4.5% |
Other values (24) | 114 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 264 | |
Uppercase Letter | 25 | 8.1% |
Space Separator | 21 | 6.8% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 31 | 11.7% |
i | 23 | 8.7% |
a | 22 | 8.3% |
h | 19 | 7.2% |
l | 18 | 6.8% |
o | 17 | 6.4% |
r | 16 | 6.1% |
n | 15 | 5.7% |
d | 14 | 5.3% |
s | 10 | 3.8% |
Other values (14) | 79 |
Uppercase Letter
Value | Count | Frequency (%) |
S | 7 | |
A | 6 | |
C | 5 | |
L | 2 | 8.0% |
J | 1 | 4.0% |
Y | 1 | 4.0% |
N | 1 | 4.0% |
E | 1 | 4.0% |
K | 1 | 4.0% |
Space Separator
Value | Count | Frequency (%) |
21 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 289 | |
Common | 21 | 6.8% |
Most frequent character per script
Value | Count | Frequency (%) |
e | 31 | 10.7% |
i | 23 | 8.0% |
a | 22 | 7.6% |
h | 19 | 6.6% |
l | 18 | 6.2% |
o | 17 | 5.9% |
r | 16 | 5.5% |
n | 15 | 5.2% |
d | 14 | 4.8% |
s | 10 | 3.5% |
Other values (23) | 104 |
Value | Count | Frequency (%) |
21 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 310 |
Most frequent character per block
Value | Count | Frequency (%) |
e | 31 | 10.0% |
i | 23 | 7.4% |
a | 22 | 7.1% |
21 | 6.8% | |
h | 19 | 6.1% |
l | 18 | 5.8% |
o | 17 | 5.5% |
r | 16 | 5.2% |
n | 15 | 4.8% |
d | 14 | 4.5% |
Other values (24) | 114 |
Distinct | 36 |
Distinct (%) | 97.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 424.0 B |
🔥🔥🔥🔥 | 2 |
How are you? | 1 |
Hello everyone | 1 |
The link to have access to the dataset seems to be down | 1 |
i'm good :) | 1 |
Other values (31) |
Max length | 55 |
Median length | 34 |
Mean length | 14.45945946 |
Min length | 2 |
Characters and Unicode
Total characters | 535 |
Distinct characters | 45 |
Distinct categories | 7 ? |
Distinct scripts | 2 ? |
Distinct blocks | 2 ? |
Unique | 35 ? |
Unique (%) | 94.6% |
1st row | How are you? |
2nd row | good good |
3rd row | 🔥🔥🔥🔥 |
4th row | 🔥🔥🔥🔥 |
5th row | interesting! |
Common Values
Value | Count | Frequency (%) |
🔥🔥🔥🔥 | 2 | 5.4% |
How are you? | 1 | 2.7% |
Hello everyone | 1 | 2.7% |
The link to have access to the dataset seems to be down | 1 | 2.7% |
i'm good :) | 1 | 2.7% |
I m Lucas | 1 | 2.7% |
I love cats | 1 | 2.7% |
hello | 1 | 2.7% |
I need a text to image like looking glass | 1 | 2.7% |
how are you | 1 | 2.7% |
Other values (26) | 26 |
Value | Count | Frequency (%) |
you | 6 | 5.5% |
hello | 5 | 4.5% |
to | 4 | 3.6% |
i | 4 | 3.6% |
woof | 3 | 2.7% |
good | 3 | 2.7% |
are | 3 | 2.7% |
the | 3 | 2.7% |
love | 3 | 2.7% |
my | 2 | 1.8% |
Other values (65) | 74 |
Most occurring characters
Value | Count | Frequency (%) |
74 | ||
e | 62 | 11.6% |
o | 45 | 8.4% |
l | 33 | 6.2% |
a | 31 | 5.8% |
t | 31 | 5.8% |
s | 27 | 5.0% |
i | 22 | 4.1% |
y | 18 | 3.4% |
r | 15 | 2.8% |
Other values (35) | 177 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 406 | |
Space Separator | 74 | 13.8% |
Uppercase Letter | 21 | 3.9% |
Other Punctuation | 14 | 2.6% |
Decimal Number | 11 | 2.1% |
Other Symbol | 8 | 1.5% |
Close Punctuation | 1 | 0.2% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
e | 62 | |
o | 45 | |
l | 33 | 8.1% |
a | 31 | 7.6% |
t | 31 | 7.6% |
s | 27 | 6.7% |
i | 22 | 5.4% |
y | 18 | 4.4% |
r | 15 | 3.7% |
n | 15 | 3.7% |
Other values (13) | 107 |
Uppercase Letter
Value | Count | Frequency (%) |
H | 7 | |
I | 4 | |
S | 3 | |
T | 2 | 9.5% |
N | 1 | 4.8% |
M | 1 | 4.8% |
G | 1 | 4.8% |
L | 1 | 4.8% |
W | 1 | 4.8% |
Decimal Number
Value | Count | Frequency (%) |
2 | 4 | |
1 | 2 | |
4 | 2 | |
3 | 2 | |
9 | 1 | 9.1% |
Other Punctuation
Value | Count | Frequency (%) |
! | 4 | |
' | 4 | |
. | 3 | |
? | 2 | |
: | 1 | 7.1% |
Space Separator
Value | Count | Frequency (%) |
74 |
Other Symbol
Value | Count | Frequency (%) |
🔥 | 8 |
Close Punctuation
Value | Count | Frequency (%) |
) | 1 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 427 | |
Common | 108 | 20.2% |
Most frequent character per script
Value | Count | Frequency (%) |
e | 62 | |
o | 45 | 10.5% |
l | 33 | 7.7% |
a | 31 | 7.3% |
t | 31 | 7.3% |
s | 27 | 6.3% |
i | 22 | 5.2% |
y | 18 | 4.2% |
r | 15 | 3.5% |
n | 15 | 3.5% |
Other values (22) | 128 |
Value | Count | Frequency (%) |
74 | ||
🔥 | 8 | 7.4% |
2 | 4 | 3.7% |
! | 4 | 3.7% |
' | 4 | 3.7% |
. | 3 | 2.8% |
1 | 2 | 1.9% |
4 | 2 | 1.9% |
? | 2 | 1.9% |
3 | 2 | 1.9% |
Other values (3) | 3 | 2.8% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 527 | |
None | 8 | 1.5% |
Most frequent character per block
Value | Count | Frequency (%) |
74 | ||
e | 62 | 11.8% |
o | 45 | 8.5% |
l | 33 | 6.3% |
a | 31 | 5.9% |
t | 31 | 5.9% |
s | 27 | 5.1% |
i | 22 | 4.2% |
y | 18 | 3.4% |
r | 15 | 2.8% |
Other values (34) | 169 |
Value | Count | Frequency (%) |
🔥 | 8 |
Distinct | 37 |
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 424.0 B |
2021-10-15 19:33:29.506399 | 1 |
2021-12-15 18:01:20.248871 | 1 |
2021-12-20 07:43:13.477264 | 1 |
2021-12-20 07:44:50.373990 | 1 |
2022-03-10 12:38:44.469142 | 1 |
Other values (32) |
Max length | 26 |
Median length | 26 |
Mean length | 26 |
Min length | 26 |
Characters and Unicode
Total characters | 962 |
Distinct characters | 14 |
Distinct categories | 4 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
Unique | 37 ? |
Unique (%) | 100.0% |
1st row | 2021-10-15 19:33:29.506399 |
2nd row | 2021-10-15 19:36:21.837263 |
3rd row | 2021-10-15 19:38:08.592406 |
4th row | 2021-10-15 19:38:14.693492 |
5th row | 2021-11-05 20:48:09.082644 |
Common Values
Value | Count | Frequency (%) |
2021-10-15 19:33:29.506399 | 1 | 2.7% |
2021-12-15 18:01:20.248871 | 1 | 2.7% |
2021-12-20 07:43:13.477264 | 1 | 2.7% |
2021-12-20 07:44:50.373990 | 1 | 2.7% |
2022-03-10 12:38:44.469142 | 1 | 2.7% |
2022-03-10 13:51:10.874795 | 1 | 2.7% |
2022-03-10 18:24:27.541837 | 1 | 2.7% |
2022-03-18 21:32:19.477479 | 1 | 2.7% |
2022-04-07 12:41:08.938456 | 1 | 2.7% |
2022-04-07 17:12:20.197251 | 1 | 2.7% |
Other values (27) | 27 |
Value | Count | Frequency (%) |
2021-11-05 | 6 | 8.1% |
2021-10-15 | 4 | 5.4% |
2021-11-09 | 4 | 5.4% |
2022-03-10 | 3 | 4.1% |
2021-11-06 | 3 | 4.1% |
2022-04-07 | 3 | 4.1% |
2022-05-10 | 2 | 2.7% |
2021-12-20 | 2 | 2.7% |
2022-05-07 | 2 | 2.7% |
20:48:58.531611 | 1 | 1.4% |
Other values (44) | 44 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 150 | |
0 | 144 | |
1 | 123 | |
- | 74 | |
: | 74 | |
4 | 62 | |
3 | 49 | 5.1% |
8 | 48 | 5.0% |
5 | 44 | 4.6% |
6 | 41 | 4.3% |
Other values (4) | 153 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 740 | |
Other Punctuation | 111 | 11.5% |
Dash Punctuation | 74 | 7.7% |
Space Separator | 37 | 3.8% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
2 | 150 | |
0 | 144 | |
1 | 123 | |
4 | 62 | |
3 | 49 | 6.6% |
8 | 48 | 6.5% |
5 | 44 | 5.9% |
6 | 41 | 5.5% |
9 | 40 | 5.4% |
7 | 39 | 5.3% |
Other Punctuation
Value | Count | Frequency (%) |
: | 74 | |
. | 37 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 74 |
Space Separator
Value | Count | Frequency (%) |
37 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 962 |
Most frequent character per script
Value | Count | Frequency (%) |
2 | 150 | |
0 | 144 | |
1 | 123 | |
- | 74 | |
: | 74 | |
4 | 62 | |
3 | 49 | 5.1% |
8 | 48 | 5.0% |
5 | 44 | 4.6% |
6 | 41 | 4.3% |
Other values (4) | 153 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 962 |
Most frequent character per block
Value | Count | Frequency (%) |
2 | 150 | |
0 | 144 | |
1 | 123 | |
- | 74 | |
: | 74 | |
4 | 62 | |
3 | 49 | 5.1% |
8 | 48 | 5.0% |
5 | 44 | 4.6% |
6 | 41 | 4.3% |
Other values (4) | 153 |
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
name | message | time | |
0 | Julien | How are you? | 2021-10-15 19:33:29.506399 |
1 | Someone else | good good | 2021-10-15 19:36:21.837263 |
2 | A friend | 🔥🔥🔥🔥 | 2021-10-15 19:38:08.592406 |
3 | A friend | 🔥🔥🔥🔥 | 2021-10-15 19:38:14.693492 |
4 | A stranger | interesting! | 2021-11-05 20:48:09.082644 |
5 | Shubham Singh | Hello you are you. | 2021-11-05 20:48:42.430647 |
6 | dog | I love dogs | 2021-11-05 20:48:58.531611 |
7 | Abubakar Abid | Test | 2021-11-05 20:49:10.729872 |
8 | Charles | Hello | 2021-11-05 21:59:58.126933 |
9 | Charles | Hello2 | 2021-11-05 22:00:17.768448 |
Last rows
name | message | time | |
27 | micole | I need a text to image like looking glass | 2022-04-07 12:41:08.938456 |
28 | Alex | Hello everyone | 2022-04-07 17:12:20.197251 |
29 | tomriddle | how are you | 2022-04-07 18:22:01.721690 |
30 | meow | great persistence example. cant figure mine out yet. | 2022-04-16 00:01:26.027707 |
31 | chef boyardee | have you tried my raviolis? | 2022-04-24 03:15:00.496292 |
32 | Chris Emezue | Hello there | 2022-04-26 18:16:45.273903 |
33 | aa | test123 | 2022-05-07 16:42:26.706482 |
34 | bb | test234 | 2022-05-07 16:42:36.803281 |
35 | dog | woof | 2022-05-10 03:58:24.036197 |
36 | dog | woof woof | 2022-05-10 03:58:38.850884 |