beta_rec.data package¶
beta_rec.data.auxiliary_data module¶
beta_rec.data.base_data module¶
-
class
beta_rec.data.base_data.
BaseData
(split_dataset, intersect=True, binarize=True, bin_thld=0.0, normalize=False)[source]¶ Bases:
object
A plain DataBase object modeling general recommendation data. Re_index all the users and items from raw dataset.
Parameters: - split_dataset (train,valid,test) – the split dataset, a tuple consisting of training (DataFrame), validate/list of validate (DataFrame), testing/list of testing (DataFrame).
- intersect (bool, optional) – remove users and items of test/valid sets that do not exist in the train set. If the
model is able to predict for new users and new items, this can be
False
. (default:True
). - binarize (bool, optional) – binarize the rating column of train set 0 or 1, i.e. implicit feedback.
(default:
True
). - bin_thld (int, optional) – the threshold of binarization (default:
0
) normalize (bool, optional): normalize the rating column of train. set into [0, 1], i.e. explicit feedback. (default:False
).
-
get_adj_mat
(config)[source]¶ Get the adjacent matrix, if not previously stored then call the function to create.
This method is for NGCF model.
Returns: Different types of adjacment matrix.
-
get_constraint_mat
(config)[source]¶ Get the adjacent matrix, if not previously stored then call the function to create.
This method is for NGCF model.
Returns: Different types of adjacment matrix.
-
instance_bce_loader
(batch_size, device, num_negative)[source]¶ Instance a train DataLoader that have rating.
-
instance_bpr_loader
(batch_size, device)[source]¶ Instance a pairwise Data_loader for training.
Sample ONE negative items for each user-item pare, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss. # todo implement the item popularity-biased sampling
beta_rec.data.data_loaders module¶
beta_rec.data.deprecated_data module¶
-
class
beta_rec.data.deprecated_data.
GroceryData
(config)[source]¶ Bases:
beta_rec.data.base_data.BaseData
Grocery dataset class for all the model.
-
cmn_train_loader
(batch_size: int, neighborhood: bool, neg_count: int)[source]¶ Load train data for CMN.
Batch data together as (user, item, negative item), pos_neighborhood, length of neighborhood, negative_neighborhood, length of negative neighborhood.
If neighborhood is False returns only user, item, negative_item so we can reuse this for non-neighborhood-based methods.
Parameters: - batch_size – size of the batch.
- neighborhood – return the neighborhood information or not.
- neg_count – number of negative samples to uniformly draw per a pos example.
Returns: generator.
-
generate_sparse_train_data
()[source]¶ Generate a sparse matrix for interactions.
Returns: coo_matrix.
-
generate_train_data
()[source]¶ Generate a rating matrix for interactions.
Returns: (sigma_matrix, rating_matrix) sigma_matrix with rating being 1 rating_matrix with rating being the real rating
-
get_adj_mat
()[source]¶ Get the adjacent matrix.
If not previously stored then call the function to create. This method is for NGCF model.
Returns: Different types of adjacent matrix.
-
make_fea_sim_mat
()[source]¶ Make feature similarity matrix.
Note that the first column is the user/item ID.
Returns: normalized_adj_single
-
sample
(batch_size)[source]¶ Sample users, their positive items and negative items.
Parameters: batch_size – the size of a batch for sampling. Returns: users (list) pos_items (list) neg_items (list)
-
-
beta_rec.data.deprecated_data.
calc_sim
(A)[source]¶ Fastest way to calculate the cosine similarity.
See reference: https://stackoverflow.com/questions/17627219/
-
beta_rec.data.deprecated_data.
check_adj_if_equal
(adj)[source]¶ Missing docs.
Parameters: adj – adjacent matrix. Returns: a lapacian matrix.
beta_rec.data.deprecated_data_base module¶
-
class
beta_rec.data.deprecated_data_base.
DataLoaderBase
(ratings)[source]¶ Bases:
object
Construct dataset for NCF.
-
evaluate_data
¶ Create evaluation data.
-
get_adj_mat
(config)[source]¶ Get the adjacent matrix, if not previously stored then call the function to create.
This method is for NGCF model.
Returns: Different types of adjacment matrix.
-
get_graph_embeddings
(config)[source]¶ Get the graph embedding, if not previously stored then call the function to create.
This method is for LCFN model.
Returns: eigsh of the graph matrix
-
instance_a_train_loader
(num_negatives, batch_size)[source]¶ Instance train loader for one training epoch.
-
pairwise_negative_train_loader
(batch_size, device)[source]¶ Instance a pairwise Data_loader for training.
Sample ONE negative items for each user-item pare, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss. # todo implement the item popularity-biased sampling
-
uniform_negative_train_loader
(num_negatives, batch_size, device)[source]¶ Instance a Data_loader for training.
Sample ‘num_negatives’ negative items for each user, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss. # todo implement the item popularity-biased sampling
-
-
class
beta_rec.data.deprecated_data_base.
PairwiseNegativeDataset
(user_tensor, pos_item_tensor, neg_item_tensor)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Wrapper, convert <user, pos_item, neg_item> Tensor into Pytorch Dataset.
beta_rec.data.grocery_data module¶
-
class
beta_rec.data.grocery_data.
GroceryData
(split_dataset, config=None, intersect=True, binarize=True, bin_thld=0.0, normalize=False)[source]¶ Bases:
beta_rec.data.base_data.BaseData
,beta_rec.data.auxiliary_data.Auxiliary
A Grocery Data object, which consist one more order/basket column than the BaseData.
Re-index all the users and items from raw dataset.
Parameters: - split_dataset (train,valid,test) – the split dataset, a tuple consisting of training (DataFrame), validate/list of validate (DataFrame), testing/list of testing (DataFrame).
- intersect (bool, optional) – remove users and items of test/valid sets that do not exist in the train set.
If the model is able to predict for new users and new items, this can be
False
(default:True
). - binarize (bool, optional) – binarize the rating column of train set 0 or 1, i.e. implicit feedback.
(default:
True
). - bin_thld (int, optional) – the threshold of binarization (default:
0
). - normalize (bool, optional) – normalize the rating column of train set into [0, 1], i.e. explicit feedback.
(default:
False
).
Module contents¶
Data Module.
-
class
beta_rec.data.
BaseData
(split_dataset, intersect=True, binarize=True, bin_thld=0.0, normalize=False)[source]¶ Bases:
object
A plain DataBase object modeling general recommendation data. Re_index all the users and items from raw dataset.
Parameters: - split_dataset (train,valid,test) – the split dataset, a tuple consisting of training (DataFrame), validate/list of validate (DataFrame), testing/list of testing (DataFrame).
- intersect (bool, optional) – remove users and items of test/valid sets that do not exist in the train set. If the
model is able to predict for new users and new items, this can be
False
. (default:True
). - binarize (bool, optional) – binarize the rating column of train set 0 or 1, i.e. implicit feedback.
(default:
True
). - bin_thld (int, optional) – the threshold of binarization (default:
0
) normalize (bool, optional): normalize the rating column of train. set into [0, 1], i.e. explicit feedback. (default:False
).
-
get_adj_mat
(config)[source]¶ Get the adjacent matrix, if not previously stored then call the function to create.
This method is for NGCF model.
Returns: Different types of adjacment matrix.
-
get_constraint_mat
(config)[source]¶ Get the adjacent matrix, if not previously stored then call the function to create.
This method is for NGCF model.
Returns: Different types of adjacment matrix.
-
instance_bce_loader
(batch_size, device, num_negative)[source]¶ Instance a train DataLoader that have rating.
-
instance_bpr_loader
(batch_size, device)[source]¶ Instance a pairwise Data_loader for training.
Sample ONE negative items for each user-item pare, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss. # todo implement the item popularity-biased sampling