Loading a .npy dataset file in perceptilabs which requires .csv files

Allegiance · 16 November 2021 11:22

Hello, I am requesting any help to convert a .npy file to .csv so that I can load my dataset into perceptilabs
while I try to upload my dataset it tells me “couldn’t get data types because the kernel responded with an error”
it gives me the following:

Traceback (most recent call last):
File “perceptilabs\endpoints\type_inference\base.py”, line 19, in perceptilabs.endpoints.type_inference.base.TypeInference.dispatch_request
File “perceptilabs\data\type_inference.py”, line 58, in perceptilabs.data.type_inference.TypeInferrer.get_valid_and_default_datatypes_for_csv
File “c:\users\zawad\anaconda3\envs\myenv\lib\site-packages\pandas\util_decorators.py”, line 311, in wrapper
return func(*args, **kwargs)
File “c:\users\zawad\anaconda3\envs\myenv\lib\site-packages\pandas\io\parsers\readers.py”, line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File “c:\users\zawad\anaconda3\envs\myenv\lib\site-packages\pandas\io\parsers\readers.py”, line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File “c:\users\zawad\anaconda3\envs\myenv\lib\site-packages\pandas\io\parsers\readers.py”, line 811, in init
self._engine = self._make_engine(self.engine)
File “c:\users\zawad\anaconda3\envs\myenv\lib\site-packages\pandas\io\parsers\readers.py”, line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File “c:\users\zawad\anaconda3\envs\myenv\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py”, line 69, in init
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File “pandas_libs\parsers.pyx”, line 542, in pandas._libs.parsers.TextReader.cinit
File “pandas_libs\parsers.pyx”, line 642, in pandas._libs.parsers.TextReader._get_header
File “pandas_libs\parsers.pyx”, line 843, in pandas._libs.parsers.TextReader._tokenize_rows
File “pandas_libs\parsers.pyx”, line 1917, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x93 in position 0: invalid start byte

Any help is appreciated.

JulianSMoore · 16 November 2021 13:36

Hi @Allegiance

OK here’s a rough outline of what you need to do - but note that I don’t know the details of your npy so it’s just an outline - npy files persists a single numpy array - what the dimensions are you’ll have to deal with…

import pandas as pd
import numpy as np

others of course

do:

load your npy file using e.g. np.load
create a pandas dataframe data from the array - giving suitable labels to the columns (here’s a random example - plenty more out there!)
use pandas dataframe.to_csv to write it out in CSV format

(If you get stuck, I’m sure we can sort it out, but that should get you going - but if you do need more input please include your code up to that point and the npy file (or a small version of it) and some info about how the array is structured - maybe upload it all as a zip if necessary.)

UPDATE

You should also either omit the index column from your CSV export from the dataframe or use index_label = 'your_label_here' because PL doesn’t like unlabelled columns, if I recall correctly (I was just doing this myself and saw the index label was missing in the CSV file!)