Dataset statistics
| Number of variables | 3 |
|---|---|
| Number of observations | 37 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1016.0 B |
| Average record size in memory | 27.5 B |
Variable types
| Categorical | 3 |
|---|
message is highly correlated with name and 1 other fields | High correlation |
name is highly correlated with message and 1 other fields | High correlation |
time is highly correlated with message and 1 other fields | High correlation |
name is highly correlated with message and 1 other fields | High correlation |
message is highly correlated with name and 1 other fields | High correlation |
time is highly correlated with name and 1 other fields | High correlation |
name is uniformly distributed | Uniform |
message is uniformly distributed | Uniform |
time is uniformly distributed | Uniform |
time has unique values | Unique |
Reproduction
| Analysis started | 2022-06-09 14:06:23.736523 |
|---|---|
| Analysis finished | 2022-06-09 14:06:25.549025 |
| Duration | 1.81 second |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 30 |
|---|---|
| Distinct (%) | 81.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 424.0 B |
| dog | |
|---|---|
| everything will be gone | 2 |
| Slim Shady | 2 |
| Charles | 2 |
| Chainyo | 2 |
| Other values (25) |
Length
| Max length | 23 |
|---|---|
| Median length | 12 |
| Mean length | 8.378378378 |
| Min length | 2 |
Characters and Unicode
| Total characters | 310 |
|---|---|
| Distinct characters | 34 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 24 ? |
|---|---|
| Unique (%) | 64.9% |
Sample
| 1st row | Julien |
|---|---|
| 2nd row | Someone else |
| 3rd row | A friend |
| 4th row | A friend |
| 5th row | A stranger |
Common Values
| Value | Count | Frequency (%) |
| dog | 3 | 8.1% |
| everything will be gone | 2 | 5.4% |
| Slim Shady | 2 | 5.4% |
| Charles | 2 | 5.4% |
| Chainyo | 2 | 5.4% |
| A friend | 2 | 5.4% |
| Chris Emezue | 1 | 2.7% |
| chef boyardee | 1 | 2.7% |
| aa | 1 | 2.7% |
| meow | 1 | 2.7% |
| Other values (20) | 20 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| dog | 3 | 5.2% |
| a | 3 | 5.2% |
| will | 2 | 3.4% |
| be | 2 | 3.4% |
| gone | 2 | 3.4% |
| slim | 2 | 3.4% |
| shady | 2 | 3.4% |
| charles | 2 | 3.4% |
| chainyo | 2 | 3.4% |
| friend | 2 | 3.4% |
| Other values (35) | 36 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 31 | 10.0% |
| i | 23 | 7.4% |
| a | 22 | 7.1% |
| 21 | 6.8% | |
| h | 19 | 6.1% |
| l | 18 | 5.8% |
| o | 17 | 5.5% |
| r | 16 | 5.2% |
| n | 15 | 4.8% |
| d | 14 | 4.5% |
| Other values (24) | 114 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 264 | |
| Uppercase Letter | 25 | 8.1% |
| Space Separator | 21 | 6.8% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 31 | 11.7% |
| i | 23 | 8.7% |
| a | 22 | 8.3% |
| h | 19 | 7.2% |
| l | 18 | 6.8% |
| o | 17 | 6.4% |
| r | 16 | 6.1% |
| n | 15 | 5.7% |
| d | 14 | 5.3% |
| s | 10 | 3.8% |
| Other values (14) | 79 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 7 | |
| A | 6 | |
| C | 5 | |
| L | 2 | 8.0% |
| J | 1 | 4.0% |
| Y | 1 | 4.0% |
| N | 1 | 4.0% |
| E | 1 | 4.0% |
| K | 1 | 4.0% |
Space Separator
| Value | Count | Frequency (%) |
| 21 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 289 | |
| Common | 21 | 6.8% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 31 | 10.7% |
| i | 23 | 8.0% |
| a | 22 | 7.6% |
| h | 19 | 6.6% |
| l | 18 | 6.2% |
| o | 17 | 5.9% |
| r | 16 | 5.5% |
| n | 15 | 5.2% |
| d | 14 | 4.8% |
| s | 10 | 3.5% |
| Other values (23) | 104 |
Common
| Value | Count | Frequency (%) |
| 21 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 310 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 31 | 10.0% |
| i | 23 | 7.4% |
| a | 22 | 7.1% |
| 21 | 6.8% | |
| h | 19 | 6.1% |
| l | 18 | 5.8% |
| o | 17 | 5.5% |
| r | 16 | 5.2% |
| n | 15 | 4.8% |
| d | 14 | 4.5% |
| Other values (24) | 114 |
| Distinct | 36 |
|---|---|
| Distinct (%) | 97.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 424.0 B |
| 🔥🔥🔥🔥 | 2 |
|---|---|
| How are you? | 1 |
| Hello everyone | 1 |
| The link to have access to the dataset seems to be down | 1 |
| i'm good :) | 1 |
| Other values (31) |
Length
| Max length | 55 |
|---|---|
| Median length | 34 |
| Mean length | 14.45945946 |
| Min length | 2 |
Characters and Unicode
| Total characters | 535 |
|---|---|
| Distinct characters | 45 |
| Distinct categories | 7 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 35 ? |
|---|---|
| Unique (%) | 94.6% |
Sample
| 1st row | How are you? |
|---|---|
| 2nd row | good good |
| 3rd row | 🔥🔥🔥🔥 |
| 4th row | 🔥🔥🔥🔥 |
| 5th row | interesting! |
Common Values
| Value | Count | Frequency (%) |
| 🔥🔥🔥🔥 | 2 | 5.4% |
| How are you? | 1 | 2.7% |
| Hello everyone | 1 | 2.7% |
| The link to have access to the dataset seems to be down | 1 | 2.7% |
| i'm good :) | 1 | 2.7% |
| I m Lucas | 1 | 2.7% |
| I love cats | 1 | 2.7% |
| hello | 1 | 2.7% |
| I need a text to image like looking glass | 1 | 2.7% |
| how are you | 1 | 2.7% |
| Other values (26) | 26 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| you | 6 | 5.5% |
| hello | 5 | 4.5% |
| to | 4 | 3.6% |
| i | 4 | 3.6% |
| woof | 3 | 2.7% |
| good | 3 | 2.7% |
| are | 3 | 2.7% |
| the | 3 | 2.7% |
| love | 3 | 2.7% |
| my | 2 | 1.8% |
| Other values (65) | 74 |
Most occurring characters
| Value | Count | Frequency (%) |
| 74 | ||
| e | 62 | 11.6% |
| o | 45 | 8.4% |
| l | 33 | 6.2% |
| a | 31 | 5.8% |
| t | 31 | 5.8% |
| s | 27 | 5.0% |
| i | 22 | 4.1% |
| y | 18 | 3.4% |
| r | 15 | 2.8% |
| Other values (35) | 177 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 406 | |
| Space Separator | 74 | 13.8% |
| Uppercase Letter | 21 | 3.9% |
| Other Punctuation | 14 | 2.6% |
| Decimal Number | 11 | 2.1% |
| Other Symbol | 8 | 1.5% |
| Close Punctuation | 1 | 0.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 62 | |
| o | 45 | |
| l | 33 | 8.1% |
| a | 31 | 7.6% |
| t | 31 | 7.6% |
| s | 27 | 6.7% |
| i | 22 | 5.4% |
| y | 18 | 4.4% |
| r | 15 | 3.7% |
| n | 15 | 3.7% |
| Other values (13) | 107 |
Uppercase Letter
| Value | Count | Frequency (%) |
| H | 7 | |
| I | 4 | |
| S | 3 | |
| T | 2 | 9.5% |
| N | 1 | 4.8% |
| M | 1 | 4.8% |
| G | 1 | 4.8% |
| L | 1 | 4.8% |
| W | 1 | 4.8% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 4 | |
| 1 | 2 | |
| 4 | 2 | |
| 3 | 2 | |
| 9 | 1 | 9.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| ! | 4 | |
| ' | 4 | |
| . | 3 | |
| ? | 2 | |
| : | 1 | 7.1% |
Space Separator
| Value | Count | Frequency (%) |
| 74 |
Other Symbol
| Value | Count | Frequency (%) |
| 🔥 | 8 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 427 | |
| Common | 108 | 20.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 62 | |
| o | 45 | 10.5% |
| l | 33 | 7.7% |
| a | 31 | 7.3% |
| t | 31 | 7.3% |
| s | 27 | 6.3% |
| i | 22 | 5.2% |
| y | 18 | 4.2% |
| r | 15 | 3.5% |
| n | 15 | 3.5% |
| Other values (22) | 128 |
Common
| Value | Count | Frequency (%) |
| 74 | ||
| 🔥 | 8 | 7.4% |
| 2 | 4 | 3.7% |
| ! | 4 | 3.7% |
| ' | 4 | 3.7% |
| . | 3 | 2.8% |
| 1 | 2 | 1.9% |
| 4 | 2 | 1.9% |
| ? | 2 | 1.9% |
| 3 | 2 | 1.9% |
| Other values (3) | 3 | 2.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 527 | |
| None | 8 | 1.5% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 74 | ||
| e | 62 | 11.8% |
| o | 45 | 8.5% |
| l | 33 | 6.3% |
| a | 31 | 5.9% |
| t | 31 | 5.9% |
| s | 27 | 5.1% |
| i | 22 | 4.2% |
| y | 18 | 3.4% |
| r | 15 | 2.8% |
| Other values (34) | 169 |
None
| Value | Count | Frequency (%) |
| 🔥 | 8 |
| Distinct | 37 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 424.0 B |
| 2021-10-15 19:33:29.506399 | 1 |
|---|---|
| 2021-12-15 18:01:20.248871 | 1 |
| 2021-12-20 07:43:13.477264 | 1 |
| 2021-12-20 07:44:50.373990 | 1 |
| 2022-03-10 12:38:44.469142 | 1 |
| Other values (32) |
Length
| Max length | 26 |
|---|---|
| Median length | 26 |
| Mean length | 26 |
| Min length | 26 |
Characters and Unicode
| Total characters | 962 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 4 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 37 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 2021-10-15 19:33:29.506399 |
|---|---|
| 2nd row | 2021-10-15 19:36:21.837263 |
| 3rd row | 2021-10-15 19:38:08.592406 |
| 4th row | 2021-10-15 19:38:14.693492 |
| 5th row | 2021-11-05 20:48:09.082644 |
Common Values
| Value | Count | Frequency (%) |
| 2021-10-15 19:33:29.506399 | 1 | 2.7% |
| 2021-12-15 18:01:20.248871 | 1 | 2.7% |
| 2021-12-20 07:43:13.477264 | 1 | 2.7% |
| 2021-12-20 07:44:50.373990 | 1 | 2.7% |
| 2022-03-10 12:38:44.469142 | 1 | 2.7% |
| 2022-03-10 13:51:10.874795 | 1 | 2.7% |
| 2022-03-10 18:24:27.541837 | 1 | 2.7% |
| 2022-03-18 21:32:19.477479 | 1 | 2.7% |
| 2022-04-07 12:41:08.938456 | 1 | 2.7% |
| 2022-04-07 17:12:20.197251 | 1 | 2.7% |
| Other values (27) | 27 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2021-11-05 | 6 | 8.1% |
| 2021-10-15 | 4 | 5.4% |
| 2021-11-09 | 4 | 5.4% |
| 2022-03-10 | 3 | 4.1% |
| 2021-11-06 | 3 | 4.1% |
| 2022-04-07 | 3 | 4.1% |
| 2022-05-10 | 2 | 2.7% |
| 2021-12-20 | 2 | 2.7% |
| 2022-05-07 | 2 | 2.7% |
| 20:48:58.531611 | 1 | 1.4% |
| Other values (44) | 44 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 150 | |
| 0 | 144 | |
| 1 | 123 | |
| - | 74 | |
| : | 74 | |
| 4 | 62 | |
| 3 | 49 | 5.1% |
| 8 | 48 | 5.0% |
| 5 | 44 | 4.6% |
| 6 | 41 | 4.3% |
| Other values (4) | 153 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 740 | |
| Other Punctuation | 111 | 11.5% |
| Dash Punctuation | 74 | 7.7% |
| Space Separator | 37 | 3.8% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 150 | |
| 0 | 144 | |
| 1 | 123 | |
| 4 | 62 | |
| 3 | 49 | 6.6% |
| 8 | 48 | 6.5% |
| 5 | 44 | 5.9% |
| 6 | 41 | 5.5% |
| 9 | 40 | 5.4% |
| 7 | 39 | 5.3% |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 74 | |
| . | 37 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 74 |
Space Separator
| Value | Count | Frequency (%) |
| 37 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 962 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 150 | |
| 0 | 144 | |
| 1 | 123 | |
| - | 74 | |
| : | 74 | |
| 4 | 62 | |
| 3 | 49 | 5.1% |
| 8 | 48 | 5.0% |
| 5 | 44 | 4.6% |
| 6 | 41 | 4.3% |
| Other values (4) | 153 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 962 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 150 | |
| 0 | 144 | |
| 1 | 123 | |
| - | 74 | |
| : | 74 | |
| 4 | 62 | |
| 3 | 49 | 5.1% |
| 8 | 48 | 5.0% |
| 5 | 44 | 4.6% |
| 6 | 41 | 4.3% |
| Other values (4) | 153 |
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| name | message | time | |
|---|---|---|---|
| 0 | Julien | How are you? | 2021-10-15 19:33:29.506399 |
| 1 | Someone else | good good | 2021-10-15 19:36:21.837263 |
| 2 | A friend | 🔥🔥🔥🔥 | 2021-10-15 19:38:08.592406 |
| 3 | A friend | 🔥🔥🔥🔥 | 2021-10-15 19:38:14.693492 |
| 4 | A stranger | interesting! | 2021-11-05 20:48:09.082644 |
| 5 | Shubham Singh | Hello you are you. | 2021-11-05 20:48:42.430647 |
| 6 | dog | I love dogs | 2021-11-05 20:48:58.531611 |
| 7 | Abubakar Abid | Test | 2021-11-05 20:49:10.729872 |
| 8 | Charles | Hello | 2021-11-05 21:59:58.126933 |
| 9 | Charles | Hello2 | 2021-11-05 22:00:17.768448 |
Last rows
| name | message | time | |
|---|---|---|---|
| 27 | micole | I need a text to image like looking glass | 2022-04-07 12:41:08.938456 |
| 28 | Alex | Hello everyone | 2022-04-07 17:12:20.197251 |
| 29 | tomriddle | how are you | 2022-04-07 18:22:01.721690 |
| 30 | meow | great persistence example. cant figure mine out yet. | 2022-04-16 00:01:26.027707 |
| 31 | chef boyardee | have you tried my raviolis? | 2022-04-24 03:15:00.496292 |
| 32 | Chris Emezue | Hello there | 2022-04-26 18:16:45.273903 |
| 33 | aa | test123 | 2022-05-07 16:42:26.706482 |
| 34 | bb | test234 | 2022-05-07 16:42:36.803281 |
| 35 | dog | woof | 2022-05-10 03:58:24.036197 |
| 36 | dog | woof woof | 2022-05-10 03:58:38.850884 |