Add documentation about data exploration

Signed-off-by: Akashdeep Dhar <akashdeep.dhar@gmail.com>
This commit is contained in:
Akashdeep Dhar 2023-05-22 13:24:50 +05:30
parent 589b2ec10f
commit 90d38d58a8
2 changed files with 121 additions and 0 deletions

View file

@ -99,5 +99,6 @@ Index
creation_gram
creation_fail
solution_datanote
solution_dataeplt
solution_examples
solution_techtool

View file

@ -0,0 +1,120 @@
.. _solution_dataeplt.rst:
Data Exploration and Significance
====
The following is a set of information that would be looked into by the said
service whenever it would be deployed. Please note that this list consists
of both - the information that would be available for consumption by the
service users as well as the information that would be available for
computation and analysis to the service itself but not the service users, and
there can be more such information apart from the ones listed below.
1. Activity entry from Datanommer (For computation only)
2. Username of the "subject" i.e. owner of the contribution (For computation only)
3. Username of the "object" i.e. involved in the contribution (For computation only)
4. Datetime data of a specific contribution activity (For computation only)
5. Datetime data of a grouped contribution activity (For consumption only)
6. Service where a specific contribution activity happened (For computation only)
7. Service where a grouped contribution activity happened (For consumption only)
8. Activity trends per username (For computation only)
Activity Entry from Datanommer
----
This data forms the most basic functional entity of a "contribution record". An
occurrence of an activity means that a contribution was made by the "subject"
member on the "object" member and/or service with the "predicament" nature of
the contribution at the "time" of it happening. A computed collection of these
data can help form wider statistics for example - trend of contribution by a
certain "subject" member, trend of contribution on a certain "service" etc.
allowing us to answer questions like "which services are most active (and why)
and least active (any why)?", "what period of time attracts most contributions
(and why)?" etc. As this data is intricate, it only serves its purpose when a
computed group of those form statistics and not when it is singled out - and
that is why this data is only used for computational purposes only.
Username of the "subject"
----
Alternatively, owner of the contribution.
This data is a part of the previously-stated "activity entry from Datanommer"
data. In order to protect the privacy of the members involved in the
aforementioned data, this information is anonymized as a hash and due to the
fact that this data serves its purpose when a computed group of those form
statistics and not when it is singled out - this data is only used for
computational purposes only.
Username of the "object"
----
Alternatively, involved in the contribution.
This data is a part of the previously-stated "activity entry from Datanommer"
data. In order to protect the privacy of the members involved in the
aforementioned data, this information is anonymized as a hash and due to the
fact that this data serves its purpose when a computed group of those form
statistics and not when it is singled out - this data is only used for
computational purposes only.
Datetime data of a specific contribution activity
----
This data is a part of the previously-stated "activity entry from Datanommer"
data. Due to the fact that this data serves its purpose when a computed group
of those form statistics and not when it is singled out - this data is only
used for computational purposes only.
Datetime data of a grouped contribution actvitity
----
Being a derivative statistic obtained from a computed group of the previously
stated "datetime of a specific contribution activity", this can be used to
understand the trend of contribution over a period of "time" for contributions
of a certain "nature", contributions over a period of "time" for contributions
on a certain "service" etc. This understanding would help us answer questions
like what timelines attract most contributions, what timelines do not have much
of contributions etc. and gauge the success of activities such as events and
workshops by helping answer if those were able to bring in contributions right
after their commencement time. As a result, this data is available for user
consumption by the service.
Service where a specific contribution activity happened
----
This data is a part of the previously-stated "activity entry from Datanommer"
data. As this data is intricate, it only serves its purpose when a computed
group of those form statistics and not when it is singled out - and that is why
this data is only used for computational purposes only.
Service where a grouped contribution activity happened
----
Being a derivative statistic obtained from a computed group of the previously
stated "service where a specific grouped contribution activity happened", this
can be used to understand the trend of contribution on a certain service and
create comparisons of those against another to see how they fare in the
contribution activities. This understanding would help us answer questions like
what services are most active in terms of contributons and what services are
not and gauge the usability of those services by knowing what makes those
services desirable (i.e. inferred from favourable contribution statistics) and
undesirable (i.e. inferred from unfavourable contribution statistics) to direct
what service to be contributed to. As a result, this data is available for user
consumptions by the service.
Activity trends per username
----
Being a derivative statistic obtained from a computed group of the previously
stated "activity entry from Datanommer", this can be used to understand the
trend of contribution for a certain user. This understanding would help us
answer questions like what fields a certain member contributes to and if they
are transitioning from one field to another, what reasons have led them to do
that. In order to protect the privacy of the members involved in the
aforementioned data, this information is anonymized as a hash and due to the
fact that this data serves its purpose when a computed group of those form
statistics and not when it is singled out - this data is only used for
computational purposes only.