beta_rec.data package¶

beta_rec.data.auxiliary_data module¶

class beta_rec.data.auxiliary_data.Auxiliary(config, n_users, n_items)[source]¶

Bases: object

A Auxiliary Data object, which is able read various feature for users and items.

Parameters:	config (dict) – configs dict.

init_item_fea()[source]¶: Initialize item feature.

init_user_fea()[source]¶: Initialize user feature for VBCAR model.

beta_rec.data.base_data module¶

class beta_rec.data.base_data.BaseData(split_dataset, intersect=True, binarize=True, bin_thld=0.0, normalize=False)[source]¶

Bases: object

A plain DataBase object modeling general recommendation data. Re_index all the users and items from raw dataset.

Parameters:

split_dataset (train,valid,test) – the split dataset, a tuple consisting of training (DataFrame), validate/list of validate (DataFrame), testing/list of testing (DataFrame).
intersect (bool, optional) – remove users and items of test/valid sets that do not exist in the train set. If the model is able to predict for new users and new items, this can be False. (default: True).
binarize (bool, optional) – binarize the rating column of train set 0 or 1, i.e. implicit feedback. (default: True).
bin_thld (int, optional) – the threshold of binarization (default: 0) normalize (bool, optional): normalize the rating column of train. set into [0, 1], i.e. explicit feedback. (default: False).

create_adj_mat()[source]¶: Create adjacent matirx from the user-item interaction matrix.

create_constraint_mat()[source]¶: Create adjacent matirx from the user-item interaction matrix.

create_sgl_mat(config)[source]¶: Create adjacent matirx from the user-item interaction matrix.

get_adj_mat(config)[source]¶

Get the adjacent matrix, if not previously stored then call the function to create.

This method is for NGCF model.

Returns:	Different types of adjacment matrix.

get_constraint_mat(config)[source]¶

Get the adjacent matrix, if not previously stored then call the function to create.

This method is for NGCF model.

Returns:	Different types of adjacment matrix.

instance_bce_loader(batch_size, device, num_negative)[source]¶: Instance a train DataLoader that have rating.

instance_bpr_loader(batch_size, device)[source]¶

Instance a pairwise Data_loader for training.

Sample ONE negative items for each user-item pare, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss. # todo implement the item popularity-biased sampling

instance_mul_neg_loader(batch_size, device, num_negative)[source]¶

Instance a pairwise Data_loader for training.

Sample multiples negative items for each user-item pare, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss.

instance_vae_loader(device)[source]¶: Instance a train DataLoader that have rating.

randint_choice(high, size=None, replace=True, p=None, exclusion=None)[source]¶: Return random integers from 0 (inclusive) to high (exclusive).

beta_rec.data.data_loaders module¶

class beta_rec.data.data_loaders.PairwiseNegativeDataset(user_tensor, pos_item_tensor, neg_item_tensor)[source]¶

Bases: torch.utils.data.dataset.Dataset

Wrapper, convert <user, pos_item, neg_item> Tensor into Pytorch Dataset.

class beta_rec.data.data_loaders.RatingDataset(user_tensor, item_tensor, target_tensor)[source]¶

Bases: torch.utils.data.dataset.Dataset

Wrapper, convert <user, item, rating> Tensor into Pytorch Dataset.

beta_rec.data.deprecated_data module¶

class beta_rec.data.deprecated_data.GroceryData(config)[source]¶

Bases: beta_rec.data.base_data.BaseData

Grocery dataset class for all the model.

cmn_train_loader(batch_size: int, neighborhood: bool, neg_count: int)[source]¶

Load train data for CMN.

Batch data together as (user, item, negative item), pos_neighborhood, length of neighborhood, negative_neighborhood, length of negative neighborhood.

If neighborhood is False returns only user, item, negative_item so we can reuse this for non-neighborhood-based methods.

Parameters:	batch_size – size of the batch. neighborhood – return the neighborhood information or not. neg_count – number of negative samples to uniformly draw per a pos example.
Returns:	generator.

create_adj_mat()[source]¶: Create adjacent matrix from the user-item interaction.

epoch_sample()[source]¶: Missing Doc.

generate_sparse_train_data()[source]¶

Generate a sparse matrix for interactions.

Returns:	coo_matrix.

generate_train_data()[source]¶

Generate a rating matrix for interactions.

Returns:	(sigma_matrix, rating_matrix) sigma_matrix with rating being 1 rating_matrix with rating being the real rating

get_adj_mat()[source]¶

Get the adjacent matrix.

If not previously stored then call the function to create. This method is for NGCF model.

Returns:	Different types of adjacent matrix.

init_item_fea()[source]¶: Initialize item feature.

init_train_items()[source]¶: Missing Doc.

init_user_fea()[source]¶: Initialize user feature for VBCAR model.

load_user_item_fea()[source]¶: Load user and item features from datasets.

make_fea_sim_mat()[source]¶

Make feature similarity matrix.

Note that the first column is the user/item ID.

Returns:	normalized_adj_single

negative_pool()[source]¶: Missing Doc.

neighbour_process()[source]¶: Missing Doc.

sample(batch_size)[source]¶

Sample users, their positive items and negative items.

Parameters:	batch_size – the size of a batch for sampling.
Returns:	users (list) pos_items (list) neg_items (list)

sample_all_users_pos_items()[source]¶: Missing Doc.

sample_triple(dump=True, load_save=False)[source]¶

Sample triples or load triples samples from files.

This method is only applicable for basket based Recommender.

Returns:	None

sample_triple_time(dump=True, load_save=False)[source]¶

Sample triples or load triples samples from files.

This method is only applicable for basket based Recommender.

Returns:	None

beta_rec.data.deprecated_data.calc_sim(A)[source]¶

Fastest way to calculate the cosine similarity.

See reference: https://stackoverflow.com/questions/17627219/

beta_rec.data.deprecated_data.check_adj_if_equal(adj)[source]¶

Missing docs.

Parameters:	adj – adjacent matrix.
Returns:	a lapacian matrix.

beta_rec.data.deprecated_data.get_D_inv(adj)[source]¶

Missing docs.

Parameters:	adj – adjacent matrix.

beta_rec.data.deprecated_data.get_feat_dic(fea_array)[source]¶: Get feature dictionary.

beta_rec.data.deprecated_data.intersect_train_test(train, test)[source]¶

Get the intersect lists of users and items that exist in both train and test.

Parameters:	train (DataFrame) – test (DataFrame) –
Returns:	users list items (list): items list
Return type:	users (list)

beta_rec.data.deprecated_data_base module¶

class beta_rec.data.deprecated_data_base.DataLoaderBase(ratings)[source]¶

Bases: object

Construct dataset for NCF.

create_adj_mat()[source]¶: Create adjacent matirx from the user-item interaction matrix.

create_graph_embeddings(config)[source]¶: Create graph embeddings from the user and item hypergraph.

evaluate_data¶: Create evaluation data.

get_adj_mat(config)[source]¶

Get the adjacent matrix, if not previously stored then call the function to create.

This method is for NGCF model.

Returns:	Different types of adjacment matrix.

get_graph_embeddings(config)[source]¶

Get the graph embedding, if not previously stored then call the function to create.

This method is for LCFN model.

Returns:	eigsh of the graph matrix

instance_a_train_loader(num_negatives, batch_size)[source]¶: Instance train loader for one training epoch.

pairwise_negative_train_loader(batch_size, device)[source]¶

Instance a pairwise Data_loader for training.

Sample ONE negative items for each user-item pare, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss. # todo implement the item popularity-biased sampling

uniform_negative_train_loader(num_negatives, batch_size, device)[source]¶

Instance a Data_loader for training.

Sample ‘num_negatives’ negative items for each user, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss. # todo implement the item popularity-biased sampling

class beta_rec.data.deprecated_data_base.PairwiseNegativeDataset(user_tensor, pos_item_tensor, neg_item_tensor)[source]¶

Bases: torch.utils.data.dataset.Dataset

Wrapper, convert <user, pos_item, neg_item> Tensor into Pytorch Dataset.

class beta_rec.data.deprecated_data_base.RatingNegativeDataset(user_tensor, item_tensor, rating_tensor)[source]¶

Bases: torch.utils.data.dataset.Dataset

RatingNegativeDataset.

Wrapper, convert <user, item, rating> Tensor into Pytorch Dataset, which contains negative items with rating being 0.0.

class beta_rec.data.deprecated_data_base.UserItemRatingDataset(user_tensor, item_tensor, target_tensor)[source]¶

Bases: torch.utils.data.dataset.Dataset

Wrapper, convert <user, item, rating> Tensor into Pytorch Dataset.

beta_rec.data.grocery_data module¶

class beta_rec.data.grocery_data.GroceryData(split_dataset, config=None, intersect=True, binarize=True, bin_thld=0.0, normalize=False)[source]¶

Bases: beta_rec.data.base_data.BaseData, beta_rec.data.auxiliary_data.Auxiliary

A Grocery Data object, which consist one more order/basket column than the BaseData.

Re-index all the users and items from raw dataset.

Parameters:

split_dataset (train,valid,test) – the split dataset, a tuple consisting of training (DataFrame), validate/list of validate (DataFrame), testing/list of testing (DataFrame).
intersect (bool, optional) – remove users and items of test/valid sets that do not exist in the train set. If the model is able to predict for new users and new items, this can be False (default: True).
binarize (bool, optional) – binarize the rating column of train set 0 or 1, i.e. implicit feedback. (default: True).
bin_thld (int, optional) – the threshold of binarization (default: 0).
normalize (bool, optional) – normalize the rating column of train set into [0, 1], i.e. explicit feedback. (default: False).

sample_triple(dump=True, load_save=False)[source]¶

Sample triples or load triples samples from files.

This method is only applicable for basket based Recommender.

Returns:	None

sample_triple_time(dump=True, load_save=False)[source]¶

Sample triples or load triples samples from files.

This method is only applicable for basket based Recommender.

Returns:	None

Module contents¶

Data Module.

class beta_rec.data.BaseData(split_dataset, intersect=True, binarize=True, bin_thld=0.0, normalize=False)[source]¶

Bases: object

A plain DataBase object modeling general recommendation data. Re_index all the users and items from raw dataset.

Parameters:

split_dataset (train,valid,test) – the split dataset, a tuple consisting of training (DataFrame), validate/list of validate (DataFrame), testing/list of testing (DataFrame).
intersect (bool, optional) – remove users and items of test/valid sets that do not exist in the train set. If the model is able to predict for new users and new items, this can be False. (default: True).
binarize (bool, optional) – binarize the rating column of train set 0 or 1, i.e. implicit feedback. (default: True).
bin_thld (int, optional) – the threshold of binarization (default: 0) normalize (bool, optional): normalize the rating column of train. set into [0, 1], i.e. explicit feedback. (default: False).

create_adj_mat()[source]¶: Create adjacent matirx from the user-item interaction matrix.

create_constraint_mat()[source]¶: Create adjacent matirx from the user-item interaction matrix.

create_sgl_mat(config)[source]¶: Create adjacent matirx from the user-item interaction matrix.

get_adj_mat(config)[source]¶

Get the adjacent matrix, if not previously stored then call the function to create.

This method is for NGCF model.

Returns:	Different types of adjacment matrix.

get_constraint_mat(config)[source]¶

Get the adjacent matrix, if not previously stored then call the function to create.

This method is for NGCF model.

Returns:	Different types of adjacment matrix.

instance_bce_loader(batch_size, device, num_negative)[source]¶: Instance a train DataLoader that have rating.

instance_bpr_loader(batch_size, device)[source]¶

Instance a pairwise Data_loader for training.

Sample ONE negative items for each user-item pare, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss. # todo implement the item popularity-biased sampling

instance_mul_neg_loader(batch_size, device, num_negative)[source]¶

Instance a pairwise Data_loader for training.

Sample multiples negative items for each user-item pare, and shuffle them with positive items. A batch of data in this DataLoader is suitable for a binary cross-entropy loss.

instance_vae_loader(device)[source]¶: Instance a train DataLoader that have rating.

randint_choice(high, size=None, replace=True, p=None, exclusion=None)[source]¶: Return random integers from 0 (inclusive) to high (exclusive).