External

API details: Module for defining config and downloading data from various sources like kaggle, fastai, github
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Extend Configuration


source

aiking_cfg

 aiking_cfg ()

Config object for fastai’s config.ini


source

aiking_path

 aiking_path (folder)

Path to folder in aiking_cfg

Extend URL


source

URLs

 URLs ()

Global constants for dataset and model URLs.

Kaggle Dataset Utilities

This is going to be a helper class for downloading Kaggle dataset in required structure

# !rm -rf '/AIKING_HOME/data/spooky.zip'

source

KAGGLEs

 KAGGLEs ()

Initialize self. See help(type(self)) for accurate signature.


source

download_kaggle2

 download_kaggle2 (url, dest, overwrite=False)

Download url to dest unless it exists and not overwrite

# download_kaggle2("kaggle_competitions::spaceship-titanic", cfg.path('data')/"spaceship-titanic")
# !ls -l {cfg.path('data')/"spaceship-titanic"}

source

download_kaggle

 download_kaggle (url, dest, overwrite=False)

Download url to dest unless it exists and not overwrite

Utilities for archive extraction


source

unzip_file

 unzip_file (dest, arch_path)

source

untar_data

 untar_data (url, archive=None, data=None, c_key='data',
             force_download=False)

Download url to fname if dest doesn’t exist, and extract to folder dest

Type Default Details
url
archive NoneType None
data NoneType None
c_key str data
force_download bool False , extract_func=file_extract, timeout=4):

source

list_checked_data

 list_checked_data ()

Image Dataset Utilities(Bing / ddg)

Middleman Downloader

Remote api configured by me for


source

search_images_middleman

 search_images_middleman (key, max_n=150)
# result_L.attrgot('image')
# bad_images = verify_images(get_image_files(dest))
# # bad_images = result_L.attrgot('image').map(lambda url: Path(url).name)
# result_L.filter(lambda a: Path(a['image']).name not in bad_images)
# result_L.attrgot('image').map(lambda url: Path(url).name)
# print(bad_images)
# pd.DataFrame(result_L)

Duck Duck Go Downloader


source

search_images_ddg

 search_images_ddg (key, max_n=150)

Bing Downloader


source

search_images_bing

 search_images_bing (key, term, min_sz=128, max_images=150)

Dataset constructor


source

add_search_term

 add_search_term (row, srch_term)

source

get_clsdict

 get_clsdict (clstypes, prefixes=None, sep=' ')

source

get_search_terms

 get_search_terms (o, prefixes=None, sep=' ')

source

drop_duplicates_L

 drop_duplicates_L (results, subset=['image'])

source

construct_image_dataset

 construct_image_dataset (clsdots, dest, key=None, loc=None, count=150,
                          engine='middleman', image_fname='image.csv',
                          prefixes=None, sep=' ')
Type Default Details
clsdots dict or list
dest
key NoneType None
loc NoneType None
count int 150
engine str middleman
image_fname str image.csv
prefixes NoneType None
sep str

Datasette Downloader


source

data_frm_datasette

 data_frm_datasette (dsname, datasette_base_url,
                     data_dir='~/.aiking/data', table='image',
                     url_col='image', index_col=0, get_fname=<function
                     get_fname>, label_col='label')

source

get_label

 get_label (url, df)

source

get_fname

 get_fname (url, df)

Utilities to review datasets folder


source

list_ds

 list_ds (loc=None)

source

get_ds

 get_ds (name, loc=None)

Create or Update Dataset folder


source

push_ds

 push_ds (url, dsname, name=None, subfolder='.')

Download url to dsname dataset as name. Creates dsname if doesnot exists