Foot 3D Reconstruction and Measurement using Depth Data

Background: The need for shoes with non-standard sizes is increasing, but this is not followed by the competence to measure the foot effectively. The high cost of such an instrument in the market has led to the development of a precise yet affordable measurement system. Objective: This research attempts to solve the measuring problem by employing an automatic instrument utilizing a depth image sensor that is available on the market at an affordable price. Methods: Data from several Realsense sensors that have been preprocessed are combined using transformation techniques and noise cleaning is performed afterward. Finally the 3D model of the foot is ready and hence the length and width can be obtained. Results: The experimental results show that the proposed method produces a measurement error of 0.351 cm in foot length, and 0.355 cm in foot width. Conclusion: The result shows that multiple angles of a static Realsense sensor can produce a good 3D foot model automatically. This proposed system configuration can reduce complexity as well as being an affordable solution.


I. INTRODUCTION
There are varieties of shoes in the market. From flat shoes to killer heels, there are many options that people can choose based on their needs or merely personal preferences. But often we do not feel comfortable when using our shoes, be it formal, casual, or even sports shoes while based on research by [1], beside shoe design and price, comfort is also one factor that influences customers' decisions in choosing shoes. One reason that triggers this issue is the mismatching between our feet shape with the shoe design. Sometimes people simply can't fit properly in his shoes because his feet shape is different than the majority [2]. In this case they need a customized shoe where the size, shape, and even the material can be configured to the smallest detail according to one's need.
The rising living standard and awareness of health has also contributed to the increasing need of a custom shoe. Generally, there are two kinds of custom shoes, namely 'style centered custom-shoes'; and 'function centered custom-shoe [3]. A style centered custom-shoes is made specifically to create a specific design to one's dream shoes, while a function centered custom-shoe, namely a bespoke shoe, is tailored for a specific need. A bespoke shoe is a requirement for those with special foot problems like a bunion, hammer toe, degenerative disorders [4], or those who are diabetic [5][6] [7], or other foot deformities that are either acquired from newborn or caused by a recent accident, this type of shoe is often termed orthopedic shoe.
In producing a custom shoe, high-level accuracy is essential. The key lies in the measurement of the model foot to be able to make a perfect shoe last which will then be used as the model to make the shoes. Custom shoemakers usually measure the feet manually making the process not only long but also not standardized [8]. It takes around sixty hours to produce a pair of custom shoes [9]. The work-intensive process is required to make one pair of bespoke shoes have pushed the cost causing it to be less affordable [10]. Hence in this study we propose an automatic method to conduct the measurement of the foot for making the precise shoe last.

II. RELATED WORKS
Sikyung Kim and his team developed a new approach in making footwear molds known as last. This approach is called conformal mapping which uses 3D foot data which is then analyzed by the Limb Line Free Form method Deformation (LFFD) and the Schwarz-Christoffel algorithm [11]. In the shoemaking industry, shoe lasts are very important to ensure the accuracy of the model and size of shoes, especially for health shoes made specifically for patients with doctor's prescriptions. But so far, the manufacturing cost is large and time-consuming. Troy A. Chase and his team developed an automated method for creating shoe lasts at a cheaper price and a faster process [12]. The system is directly connected with a milling machine to produce the model shoe last. Less accurate graphical display is produced before the original and more accurate data from the scanner is sent to the milling machine for shoe last production. The reduction of visual display's accuracy is due to a time-consuming rendering process hence the picture quality needs to be compromised.
In contrast to previous studies, Wu Lu-Shen and his team developed a last modeling system with 3D measurements using several foot images combined into one. Foot image data is taken from six shooting positions in a stable position and the foot being measured is required to wear special socks that have been marked at certain positions as a sign to help the process of merging images into a 3D model [13].
The method for reconstructing the 3D foot model is no different from the reconstruction of the other 3D models. There are two popular approaches used in 3D reconstruction, namely an active approach using a rangefinder sensor and passive one using computer vision techniques(camera) [14]. To do the active approach, it is either the sensor that needs to be moved (generally rotating / sweeping) or several sensors that are used in the form of an array. The passive methods with computer vision consist of various ways, namely a single image, stereo image, and structure from motion. In the active approach, in which it will be explored by the proposed system, the first stage is the point cloud generation where the system generates point clouds from each side being scanned, then the point cloud is combined into a 3D model. The target object measurement is obtained from this model.
Min-Jae Lee and his team used a 3D scanner that can rotate 360° degrees to produce a 3D model of the foot. Their architecture consists of a set of 3D rotary type sensors and a set of 3D linear type sensors. The rotary type sensor system consists of two line lasers and a camera. With 2 cameras and 3 line lasers, this is a cheaper alternative than other foot scanner systems. The combination of the liner and rotating laser coupled with the triangulation method can minimize errors to a maximum of 0.58 mm [15].
Lasers are not only used for direct measurements but can also be used for illuminating foot surfaces so that when an object is scanned using a camera, the depth of the surface can be observed [16]. Lasers can be replaced by projectors. Projecting patterns onto scanned objects utilizing a combination of multiple cameras and a phase-shifting algorithm is proven to reduce errors below 1 mm [17]. Applying the same methodology while using the CSL (Coded Structured Light) pattern produces errors under 2.8 mm [18].
Foot reconstruction with a depth image using a mechanical device to rotate Kinect v2 with an angle of 0.9° combined with a modified ICP algorithm is capable of producing measurement results with errors below 1 mm [8]. Kinect has been discontinued since 2017, but variants of other RGB-D cameras have begun to emerge, including Intel Realsense. Measuring the sole with Realsense using a sensor placed under the acrylic glass can minimize errors below 3mm [19]. Besides all those efforts, there is a different way of thinking about creating a custom shoe. Utilizing 3d Printing technology, a group of people have succeeded in making shoe last printed based on a foot model. This way is claimed to be more accurate than other more conventional ways [20].

III. METHODS
3D foot model reconstruction is a mandatory step in measuring foot features in the proposed method. This method utilizes depth data from the Realsense sensor to obtain depth images on one side of the foot. To get all sides of the foot it requires at least 4 angles with a difference of 90° namely 0°, 90°, 180° and 270°. The four results from the depth sensor are then merged by using transformation and based on the result, the foot features measurement can be obtained afterward.
The algorithm which is shown by Fig.1 consists of two steps, namely 1) the cleaning and 2) the transformation. The cleaning process is done because the data obtained from the Realsense sensor does not only contains part of the foot, but also surrounding noises. The cleaning process consists of a) depth filter, b) outliers filter, and c) blob filter, all of which are applied to separate desirable data from noise. Depth filter is applied to clear the data from errors while each of the Outlier filter and Blob filter is used to separate small scattered data and ensures that the data is taken as a single unit.
The second part of the algorithm is the transformation where data is combined using transformation and smoothing. This is different from the merging of data using the ICP method since merging with ICP requires data that has an insignificant difference in angle. Consequently, the ICP method requires devices/components with a high degree of precision which is usually costly and needs recalibration which must be routinely carried out.

Fig. 1 Flow of proposed method
A. Sensor Configuration Data collection at the four predetermined angles can be done using one sensor or more. Fig. 2 (a) shows the one sensor approach using a mechanical drive to rotate the sensor around the foot vertically to take data at certain angular intervals (1). This method of data retrieval with one sensor is not easy to be implemented since it requires a design that can place the sensor precisely at pre-determined angles and hence it needs to be re-calibrated from time to time. Fig. 2   The proposed method utilizes a multiple sensor approach from 4 points of view. When using the 1 sensor approach with precisely adjusted angles, merging the obtained readings into a 3D model can be done by implementing the ICP (Iterative Closest Point) algorithm [14]. Unfortunately, multiple sensor approach can not apply the ICP algorithm due to the sizable angular differences (90o) and the fact that foot has no texture while ICP works best when the object has some obvious textures. If ICP algorithm is applied in this case, it produces the wrong result because the sole will be transformed by the algorithm and in the resulted model it will look as if it sticks to the side of the foot while it should be merged in a way that the two sides will become perpendicular instead. The proposed method uses a predetermined transformation that results from the calibration of the position of the sensor used. A configuration of fixed angles with a difference of 90° between the sensors is used. The calibration process determines the amount of translation needed by each camera to produce a precise 3D model. The foot is placed with a minimum distance of 40 cm from all sensors because the sensor must be able to cover the entire foot area. In the preliminary experiments, it is found that the best minimum distance is in the position of 40 cm from the sensor. When the shorter distance is applied, there are many pixels with holes in the depth model. Fig. 3 shows the raw data obtained from each sensor. There is a lot of noises that must be removed before it can be combined. The noise comes from positioning errors, which are at depth 0 or very far depth. Noise also emerges from the mapping results of the surrounding environment. The position of the foot scan to be used is indicated by the arrow. The data other than the area pointed by the arrow is considered as noise. Depth filtering based on (1) is performed to remove the noise, where t1 and t2 are the depth threshold. Since the optimal distance of the foot from the sensor used is 40 cm, hence considering the width and length of the foot, the values set for thresholds are t1 = 25 cm and t2 = 70 cm where 40 cm falls nearly in the middle of the range. The values setting of t1 and t2 has successfully eliminated most of the noise, but it is still possible for the foot to deviate from the average position as shown in Fig. 4.

C. Statistical Outliers Filter
Before further processing, filtering is carried out to eliminate noise in the form of dots or groups of dots which is small in numbers and scattered. A second filtering process is run by calculating the standard deviation of n nearest neighbor points. If the standard deviation exceeds the threshold then that point is considered an outlier. Fig. 5 shows the results of the statistical outlier filter, from part (b) it can be seen clearly that the filter can eliminate noise in the form of dots or collections of dots scattered over the point cloud.

D. Blob Filter
The filtering results eliminate most of the noise. At this stage only a few blobs remain and one of them is the blob to be combined. The next step is to determine which blob is foot data and which one is not. The position of the target foot is always around the center point of the camera. Fig. 6 (a) shows the position of the center point of the camera. Viewed from any angle, the center point of the camera is always on the blob to be joined, so the starting point of the search is at the midpoint. From the midpoint, the neighboring point(s) are searched by (2) where Euclidean distance is used as the distance function, c is the point whose neighbor(s) is searched, and n is the neighbor being evaluated. Neighbors whose distances to surrounding points are less than the threshold (td) are considered as one blob that is going to be combined as shown in Fig. 6 (b). Fig. 7 shows the results of the blob filtering. This data is used for further processing, namely transformation and point cloud merging. The next step after the filtering process in the proposed algorithm is the transformation where data is combined using transformation and smoothing.

E. Point Cloud Merging
Data from the four angles that have gone through the process of eliminating noise are then combined to form a complete 3D model of the foot from all sides. The merging is done with rotation and translation which has been calibrated previously. Fig. 8 (a) shows the merging of 2 points clouds from the data 0° and 90°. From Fig. 8 it can be observed that there is an excess of point outside the resulting combined object. Those points are the result of a scan that overlaps between the two angles. The point causes inaccuracies in the reconstruction process due to the lack of texture in the foot hence producing an inaccurate depth in that section. The cleaning process on the transformation result is done by removing the excess points. A point is considered as an excess point if: 1. It is data from the 90° data (blue points) having a depth (z) > z of the 0° data (yellow) but residing in the same x and y, 2. It is data from the 0o (yellow) having y > y of the 90° data, These principles are similarly applied to data from other angles too (180 o and 270 o ). The cleaning result is shown in Fig. 8 (b).  Fig. 9 shows the result of the reconstruction process. After the 3D model is successfully constructed, measurement of the resulted foot model is done. Several foot features are usually being measured. Manually the shoe designer would usually use tape to measure some foot features namely foot length, foot width, and metatarsal girth [8]. There are weaknesses in the results of the reconstruction as seen in Fig. 9 (b) where there are parts that are not reconstructed because they are not covered by the sensors.

A. Hardware Specification
The hardware used in the test consists of sensors and processing units. The processing unit can be replaced with Raspberry PI 4 or PC. In Fig. 10 the sensor used is Realsense D435 with the specifications presented in Table 1.

B. Sensor Position
This test is carried out to determine the optimal distance of the sensor from the sole being measured. Several trials were carried out by looking at the results of the depth image at a certain distance. Fig. 11 shows the results of the depth image at a distance of 10 to 40 cm, at (a) a distance of 10 cm the sole cannot be reconstructed at all shown by black which means that the part is empty and only a small part can be reconstructed (the colored points). This problem is due to the sensor specifications used. It is said that the minimum distance that can be reconstructed by the sensor is 10.5 cm. Besides, the feet are not visible in all parts. Only the middle of the foot is visible while the heel and toes are not visible. Part (b) shows results when applying 20 cm. The sole has been completely covered but still cannot be reconstructed. It is marked with black color. In Fig. 11 (c), the applied distance is 30 cm. Here the foot can be reconstructed, but the results obtained sometimes still show fluctuation especially in the area near the edge of the foot (marked in black). In Fig. 11 (d) 40 cm is used and the resulting depth image is consistent with good results. Hence the method adopts 40 cm between the sensor and the target foot. Pambudi,& Hidayah Journal of Information Systems Engineering and Business Intelligence, 2020, 6 (1), 37-45 44 C. Accuracy Test Accuracy testing is done by comparing the length and width of the foot between manual measurements with measurements made by the proposed method of 10 different people. Only the sole of the right foot is measured in this trial. From the test results in Table 2, it was found that the average difference is 0.351 cm for foot length, and 0.355 cm for foot width.

V. DISCUSSION
There has been no method for producing shoe last which is pretty accurate yet is simple and affordable at the same time especially for the home industry. The existing approaches are either expensive, due to the high-end hardware used, or too complex for small-scale industries.
The proposed automatic foot measurement system consists of more economical components and simpler configuration has been proven capable of reconstructing a foot 3D model with data from various angles. The data is being cleaned and transformed to form the complete model. The accuracy test shows a good result with a small deviation between actual measurement and the one produced by the proposed algorithm. One of the key concepts that distinguishes this from other approaches is that it doesn't use the ICP method. The algorithm requires data that has an insignificant difference in angle which consequently requires devices with a high degree of precision that is usually costly and needs recalibration that must be routinely carried out.
For future work it is recommended to use The Realsense D415 sensor which has a standard field of view with a rolling shutter sensor to produce a higher resolution. This results in more accurate measurements of small objects. Furthermore, this sensor is also cheaper because of the use of a rolling shutter.

VI. CONCLUSIONS
An affordable automatic foot measurement system using Intel Realsense is proposed. A configuration of 4 angles sensor input is applied. The system has shown a good result when measuring the length and width of the target foot with a deviation of 0.351 cm and 0.355 cm respectively as well as being simpler in configuration and more economical that is suitable for small-scale industries. It laid a good basis for the next development of an economical foot measurement system.