The data structure for unstructured data
copied from cf-staging / docarrayDocArray is a library for nested, unstructured data such as text, image, audio, video, 3D mesh. It allows deep learning engineers to efficiently process, embed, search, recommend, store, transfer the data with Pythonic API.
🌌 All data types: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data.
🐍 Pythonic experience: designed to be as easy as Python list. If you know how to Python, you know how to DocArray. Intuitive idioms and type annotation simplify the code you write.
🧑🔬 Data science powerhouse: greatly accelerate data scientists work on embedding, matching, visualizing, evaluating via Torch/Tensorflow/ONNX/PaddlePaddle on CPU/GPU.
🚡 Portable: ready-to-wire at anytime with efficient and compact serialization from/to Protobuf, bytes, JSON, CSV, dataframe.