Overview

Dataset statistics

Number of variables3
Number of observations37
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1016.0 B
Average record size in memory27.5 B

Variable types

Categorical3

Alerts

message is highly correlated with name and 1 other fieldsHigh correlation
name is highly correlated with message and 1 other fieldsHigh correlation
time is highly correlated with message and 1 other fieldsHigh correlation
name is highly correlated with message and 1 other fieldsHigh correlation
message is highly correlated with name and 1 other fieldsHigh correlation
time is highly correlated with name and 1 other fieldsHigh correlation
name is uniformly distributed Uniform
message is uniformly distributed Uniform
time is uniformly distributed Uniform
time has unique values Unique

Reproduction

Analysis started2022-06-09 14:06:23.736523
Analysis finished2022-06-09 14:06:25.549025
Duration1.81 second
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

name
Categorical

HIGH CORRELATION
HIGH CORRELATION
UNIFORM

Distinct30
Distinct (%)81.1%
Missing0
Missing (%)0.0%
Memory size424.0 B
dog
everything will be gone
 
2
Slim Shady
 
2
Charles
 
2
Chainyo
 
2
Other values (25)
26 

Length

Max length23
Median length12
Mean length8.378378378
Min length2

Characters and Unicode

Total characters310
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)64.9%

Sample

1st rowJulien
2nd rowSomeone else
3rd rowA friend
4th rowA friend
5th rowA stranger

Common Values

ValueCountFrequency (%)
dog3
 
8.1%
everything will be gone2
 
5.4%
Slim Shady2
 
5.4%
Charles2
 
5.4%
Chainyo2
 
5.4%
A friend2
 
5.4%
Chris Emezue1
 
2.7%
chef boyardee1
 
2.7%
aa1
 
2.7%
meow1
 
2.7%
Other values (20)20
54.1%

Length

2022-06-09T14:06:25.604138image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
dog3
 
5.2%
a3
 
5.2%
will2
 
3.4%
be2
 
3.4%
gone2
 
3.4%
slim2
 
3.4%
shady2
 
3.4%
charles2
 
3.4%
chainyo2
 
3.4%
friend2
 
3.4%
Other values (35)36
62.1%

Most occurring characters

ValueCountFrequency (%)
e31
 
10.0%
i23
 
7.4%
a22
 
7.1%
21
 
6.8%
h19
 
6.1%
l18
 
5.8%
o17
 
5.5%
r16
 
5.2%
n15
 
4.8%
d14
 
4.5%
Other values (24)114
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter264
85.2%
Uppercase Letter25
 
8.1%
Space Separator21
 
6.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e31
 
11.7%
i23
 
8.7%
a22
 
8.3%
h19
 
7.2%
l18
 
6.8%
o17
 
6.4%
r16
 
6.1%
n15
 
5.7%
d14
 
5.3%
s10
 
3.8%
Other values (14)79
29.9%
Uppercase Letter
ValueCountFrequency (%)
S7
28.0%
A6
24.0%
C5
20.0%
L2
 
8.0%
J1
 
4.0%
Y1
 
4.0%
N1
 
4.0%
E1
 
4.0%
K1
 
4.0%
Space Separator
ValueCountFrequency (%)
21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin289
93.2%
Common21
 
6.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e31
 
10.7%
i23
 
8.0%
a22
 
7.6%
h19
 
6.6%
l18
 
6.2%
o17
 
5.9%
r16
 
5.5%
n15
 
5.2%
d14
 
4.8%
s10
 
3.5%
Other values (23)104
36.0%
Common
ValueCountFrequency (%)
21
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII310
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e31
 
10.0%
i23
 
7.4%
a22
 
7.1%
21
 
6.8%
h19
 
6.1%
l18
 
5.8%
o17
 
5.5%
r16
 
5.2%
n15
 
4.8%
d14
 
4.5%
Other values (24)114
36.8%

message
Categorical

HIGH CORRELATION
HIGH CORRELATION
UNIFORM

Distinct36
Distinct (%)97.3%
Missing0
Missing (%)0.0%
Memory size424.0 B
🔥🔥🔥🔥
 
2
How are you?
 
1
Hello everyone
 
1
The link to have access to the dataset seems to be down
 
1
i'm good :)
 
1
Other values (31)
31 

Length

Max length55
Median length34
Mean length14.45945946
Min length2

Characters and Unicode

Total characters535
Distinct characters45
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)94.6%

Sample

1st rowHow are you?
2nd rowgood good
3rd row🔥🔥🔥🔥
4th row🔥🔥🔥🔥
5th rowinteresting!

Common Values

ValueCountFrequency (%)
🔥🔥🔥🔥2
 
5.4%
How are you?1
 
2.7%
Hello everyone1
 
2.7%
The link to have access to the dataset seems to be down1
 
2.7%
i'm good :)1
 
2.7%
I m Lucas1
 
2.7%
I love cats1
 
2.7%
hello1
 
2.7%
I need a text to image like looking glass1
 
2.7%
how are you1
 
2.7%
Other values (26)26
70.3%

Length

2022-06-09T14:06:25.720644image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
you6
 
5.5%
hello5
 
4.5%
to4
 
3.6%
i4
 
3.6%
woof3
 
2.7%
good3
 
2.7%
are3
 
2.7%
the3
 
2.7%
love3
 
2.7%
my2
 
1.8%
Other values (65)74
67.3%

Most occurring characters

ValueCountFrequency (%)
74
13.8%
e62
 
11.6%
o45
 
8.4%
l33
 
6.2%
a31
 
5.8%
t31
 
5.8%
s27
 
5.0%
i22
 
4.1%
y18
 
3.4%
r15
 
2.8%
Other values (35)177
33.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter406
75.9%
Space Separator74
 
13.8%
Uppercase Letter21
 
3.9%
Other Punctuation14
 
2.6%
Decimal Number11
 
2.1%
Other Symbol8
 
1.5%
Close Punctuation1
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e62
15.3%
o45
11.1%
l33
 
8.1%
a31
 
7.6%
t31
 
7.6%
s27
 
6.7%
i22
 
5.4%
y18
 
4.4%
r15
 
3.7%
n15
 
3.7%
Other values (13)107
26.4%
Uppercase Letter
ValueCountFrequency (%)
H7
33.3%
I4
19.0%
S3
14.3%
T2
 
9.5%
N1
 
4.8%
M1
 
4.8%
G1
 
4.8%
L1
 
4.8%
W1
 
4.8%
Decimal Number
ValueCountFrequency (%)
24
36.4%
12
18.2%
42
18.2%
32
18.2%
91
 
9.1%
Other Punctuation
ValueCountFrequency (%)
!4
28.6%
'4
28.6%
.3
21.4%
?2
14.3%
:1
 
7.1%
Space Separator
ValueCountFrequency (%)
74
100.0%
Other Symbol
ValueCountFrequency (%)
🔥8
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin427
79.8%
Common108
 
20.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e62
14.5%
o45
 
10.5%
l33
 
7.7%
a31
 
7.3%
t31
 
7.3%
s27
 
6.3%
i22
 
5.2%
y18
 
4.2%
r15
 
3.5%
n15
 
3.5%
Other values (22)128
30.0%
Common
ValueCountFrequency (%)
74
68.5%
🔥8
 
7.4%
24
 
3.7%
!4
 
3.7%
'4
 
3.7%
.3
 
2.8%
12
 
1.9%
42
 
1.9%
?2
 
1.9%
32
 
1.9%
Other values (3)3
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII527
98.5%
None8
 
1.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
74
14.0%
e62
 
11.8%
o45
 
8.5%
l33
 
6.3%
a31
 
5.9%
t31
 
5.9%
s27
 
5.1%
i22
 
4.2%
y18
 
3.4%
r15
 
2.8%
Other values (34)169
32.1%
None
ValueCountFrequency (%)
🔥8
100.0%

time
Categorical

HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct37
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size424.0 B
2021-10-15 19:33:29.506399
 
1
2021-12-15 18:01:20.248871
 
1
2021-12-20 07:43:13.477264
 
1
2021-12-20 07:44:50.373990
 
1
2022-03-10 12:38:44.469142
 
1
Other values (32)
32 

Length

Max length26
Median length26
Mean length26
Min length26

Characters and Unicode

Total characters962
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique37 ?
Unique (%)100.0%

Sample

1st row2021-10-15 19:33:29.506399
2nd row2021-10-15 19:36:21.837263
3rd row2021-10-15 19:38:08.592406
4th row2021-10-15 19:38:14.693492
5th row2021-11-05 20:48:09.082644

Common Values

ValueCountFrequency (%)
2021-10-15 19:33:29.5063991
 
2.7%
2021-12-15 18:01:20.2488711
 
2.7%
2021-12-20 07:43:13.4772641
 
2.7%
2021-12-20 07:44:50.3739901
 
2.7%
2022-03-10 12:38:44.4691421
 
2.7%
2022-03-10 13:51:10.8747951
 
2.7%
2022-03-10 18:24:27.5418371
 
2.7%
2022-03-18 21:32:19.4774791
 
2.7%
2022-04-07 12:41:08.9384561
 
2.7%
2022-04-07 17:12:20.1972511
 
2.7%
Other values (27)27
73.0%

Length

2022-06-09T14:06:25.837115image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2021-11-056
 
8.1%
2021-10-154
 
5.4%
2021-11-094
 
5.4%
2022-03-103
 
4.1%
2021-11-063
 
4.1%
2022-04-073
 
4.1%
2022-05-102
 
2.7%
2021-12-202
 
2.7%
2022-05-072
 
2.7%
20:48:58.5316111
 
1.4%
Other values (44)44
59.5%

Most occurring characters

ValueCountFrequency (%)
2150
15.6%
0144
15.0%
1123
12.8%
-74
7.7%
:74
7.7%
462
6.4%
349
 
5.1%
848
 
5.0%
544
 
4.6%
641
 
4.3%
Other values (4)153
15.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number740
76.9%
Other Punctuation111
 
11.5%
Dash Punctuation74
 
7.7%
Space Separator37
 
3.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2150
20.3%
0144
19.5%
1123
16.6%
462
8.4%
349
 
6.6%
848
 
6.5%
544
 
5.9%
641
 
5.5%
940
 
5.4%
739
 
5.3%
Other Punctuation
ValueCountFrequency (%)
:74
66.7%
.37
33.3%
Dash Punctuation
ValueCountFrequency (%)
-74
100.0%
Space Separator
ValueCountFrequency (%)
37
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common962
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2150
15.6%
0144
15.0%
1123
12.8%
-74
7.7%
:74
7.7%
462
6.4%
349
 
5.1%
848
 
5.0%
544
 
4.6%
641
 
4.3%
Other values (4)153
15.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII962
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2150
15.6%
0144
15.0%
1123
12.8%
-74
7.7%
:74
7.7%
462
6.4%
349
 
5.1%
848
 
5.0%
544
 
4.6%
641
 
4.3%
Other values (4)153
15.9%

Correlations

2022-06-09T14:06:25.924577image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-06-09T14:06:26.032074image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-09T14:06:25.396016image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-09T14:06:25.508007image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

namemessagetime
0JulienHow are you?2021-10-15 19:33:29.506399
1Someone elsegood good2021-10-15 19:36:21.837263
2A friend🔥🔥🔥🔥2021-10-15 19:38:08.592406
3A friend🔥🔥🔥🔥2021-10-15 19:38:14.693492
4A strangerinteresting!2021-11-05 20:48:09.082644
5Shubham SinghHello you are you.2021-11-05 20:48:42.430647
6dogI love dogs2021-11-05 20:48:58.531611
7Abubakar AbidTest2021-11-05 20:49:10.729872
8CharlesHello2021-11-05 21:59:58.126933
9CharlesHello22021-11-05 22:00:17.768448

Last rows

namemessagetime
27micoleI need a text to image like looking glass2022-04-07 12:41:08.938456
28AlexHello everyone2022-04-07 17:12:20.197251
29tomriddlehow are you2022-04-07 18:22:01.721690
30meowgreat persistence example. cant figure mine out yet.2022-04-16 00:01:26.027707
31chef boyardeehave you tried my raviolis?2022-04-24 03:15:00.496292
32Chris EmezueHello there2022-04-26 18:16:45.273903
33aatest1232022-05-07 16:42:26.706482
34bbtest2342022-05-07 16:42:36.803281
35dogwoof2022-05-10 03:58:24.036197
36dogwoof woof2022-05-10 03:58:38.850884