mpd_talktodata
Module Group¶
src/pd1
Project Stage ID¶
4[^2]
Purpose¶
The purpose of this library is to allow the user to get to know the data stored in the dataframe using natural language
Location¶
Here are the locations of the relevant files associated with the module
module information:
/src/pd/mpd_talktodata.json
module activation functions:
/src/pd/mpd_talktodata.py
Requirements¶
Required module import information
import numpy as np
import pandas as pd
from collections import OrderedDict
from mllibs.nlpi import nlpi
from mllibs.nlpm import parse_json
import pkg_resources
import json
Selection¶
Activation functions need to be assigned a unique label. Here's the process of label & activation function selection
def sel(self,args:dict):
self.select = args['pred_task']
self.args = args
if(self.select == 'dfcolumninfo'):
self.dfgroupby(self.args)
if(self.select == 'dfsize'):
self.dfsize(self.args)
if(self.select == 'dfcolumn_distr'):
self.dfcolumn_distr(self.args)
if(self.select == 'dfcolumn_na'):
self.dfcolumn_na(self.args)
if(self.select == 'dfall_na'):
self.dfall_na(self.args)
if(self.select == 'show_stats'):
self.show_statistics(args)
if(self.select == 'show_info'):
self.show_info(args)
if(self.select == 'show_dtypes'):
self.show_dtypes(args)
if(self.select == 'show_feats'):
self.show_features(args)
if(self.select == 'show_corr'):
self.show_correlation(args)
Activation Functions¶
Here you will find the relevant activation functions available in class mpd_talktodata
dfcolumninfo¶
data: pd.DataFrame
targ:None
The method is used to print the dataframe columns
code:
dfsize¶
data: pd.DataFrame
targ:None
The method is used to print the dataframe size
code:
dfcolumn_distr¶
data: pd.DataFrame
targ:col
|column
The method is used to print count the unique dataframe column values using value_counts
code:
def dfcolumn_distr(self,args:dict):
if(args['column'] != None):
display(args['data'][args['column']].value_counts())
elif(args['col'] != None):
display(args['data'][args['col']].value_counts())
else:
print('[note] please specify the column name')
dfcolumn_na¶
data: pd.DataFrame
targ:col
|column
The method is used to store the missing data rows found in the dataframe column in memory_output
code:
dfall_na¶
data: pd.DataFrame
targ:None
The method is used to print the statistics of the ammount of data missing in all columns & store the missing rows in memory_output
code:
show_info¶
data: pd.DataFrame
targ:None
Method is used to print a concise summary of a pandas DataFrame. It provides information such as the number of rows and columns, the data types of each column, the memory usage, and the number of non-null values in each column. This method is useful for quickly understanding the structure and content of a DataFrame, especially when working with large datasets. Additionally, it can help identify missing or null values that may need to be addressed in data cleaning or preprocessing.
code:
show_missing¶
data: pd.DataFrame
targ:None
Method is used to print a concise summary of a pandas DataFrame. It provides information such as the number of rows and columns, the data types of each column, the memory usage, and the number of non-null values in each column. This method is useful for quickly understanding the structure and content of a DataFrame, especially when working with large datasets. Additionally, it can help identify missing or null values that may need to be addressed in data cleaning or preprocessing.
code:
show_stats¶
data: pd.DataFrame
targ:None
pandas.DataFrame.describe()
is a method that provides a summary of the statistical properties of each column in a DataFrame. By default, it calculates the count, mean, standard deviation, minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum for each numeric column.
code:
show_dtypes¶
data: pd.DataFrame
targ:None
Attribute of a pandas DataFrame that returns the data types of each column in the DataFrame. This attribute is useful for understanding the data types of each column and can be used to convert columns to different data types if necessary.
code:
show_corr¶
data: pd.DataFrame
targ:None
Method that calculates the correlation between columns in a DataFrame. Correlation is a statistical measure that indicates the degree to which two variables are related
code:
@staticmethod
def show_correlation(args:dict):
corr_mat = pd.DataFrame(np.round(args['data'].corr(),2),
index = list(args['data'].columns),
columns = list(args['data'].columns))
corr_mat = corr_mat.dropna(how='all',axis=0)
corr_mat = corr_mat.dropna(how='all',axis=1)
display(corr_mat)
-
Reference to the sub folder in src ↩